Phoenix logo
PHOENIX

AI Observability
and Evaluation

Evaluate, troubleshoot, and fine tune your LLM, CV, and NLP models in a notebook

Start Now

Shoutouts and accolades

Jerry Liu, LlamaIndex
Jerry Liu CEO and Co-Founder, LlamaIndex

As LLM-powered applications increase in sophistication and new use cases emerge, deeper capabilities around LLM observability are needed to help debug and troubleshoot. We’re pleased to see this open-source solution from Arize, along with a one-click integration to LlamaIndex, and recommend any AI engineers or developers building with LlamaIndex check it out.

Harrison Chase
Harrison Chase Co-Founder of LangChain

A huge barrier in getting LLMs and Generative Agents to be deployed into production is because of the lack of observability into these systems. With Phoenix, Arize is offering an open source way to visualize complex LLM decision-making.

Chris Brown
Christopher Brown CEO and Co-Founder of Decision Patterns and a former UC Berkeley Computer Science lecturer

Phoenix is a much-appreciated advancement in model observability and production. The integration of observability utilities directly into the development process not only saves time but encourages model development and production teams to actively think about model use and ongoing improvements before releasing to production. This is a big win for management of the model lifecycle.

Pietro Bolcato Lead ML Engineer, Kling Klang Klong

This is a library to LLMs and RNs that provides visual clustering analysis and model interpretability, super useful to help understand how a model works, and to demystify the black-box phenomena!

Yuki Waka
Yuki Waka Application Developer, Klick

Phoenix integrated into our team’s existing data science workflows and enabled the exploration of unstructured text data to identify root causes of unexpected user inputs, problematic LLM responses, and gaps in our knowledge base.

Lior Sinclair
Lior Sinclair AI Researcher

Just came across Arize-phoenix, a new library for LLMs and RNs that provides visual clustering and model interpretability. Super useful.

Tom Matthews
Tom Matthews Machine Learning Engineer at Unitary.ai

This is something that I was wanting to build at some point in the future, so I’m really happy to not have to build it. This is amazing.

Erick Siavichay
Erick Siavichay Project Mentor, Inspirit AI

We are in an exciting time for AI technology including LLMs. We will need better tools to understand and monitor an LLM’s decision making. With Phoenix, Arize is offering an open source way to do exactly just that in a nifty library.

Shubham Sharma
Shubham Sharma, VentureBeat

Large language models...remain susceptible to hallucination — in other words, producing false or misleading results. Phoenix, announced today at Arize AI’s Observe 2023 summit, targets this exact problem by visualizing complex LLM decision-making and flagging when and where models fail, go wrong, give poor responses or incorrectly generalize.

Yujian Tang
Yujian Tang Published in Plain Simple Software

23 Open Source AI Libraries for 2023. AI may be the top field to get into in 2023. Here are 23 open source libraries to get you started.

Jerry Liu, LlamaIndex
Jerry Liu CEO and Co-Founder, LlamaIndex

As LLM-powered applications increase in sophistication and new use cases emerge, deeper capabilities around LLM observability are needed to help debug and troubleshoot. We’re pleased to see this open-source solution from Arize, along with a one-click integration to LlamaIndex, and recommend any AI engineers or developers building with LlamaIndex check it out.

Harrison Chase
Harrison Chase Co-Founder of LangChain

A huge barrier in getting LLMs and Generative Agents to be deployed into production is because of the lack of observability into these systems. With Phoenix, Arize is offering an open source way to visualize complex LLM decision-making.

Chris Brown
Christopher Brown CEO and Co-Founder of Decision Patterns and a former UC Berkeley Computer Science lecturer

Phoenix is a much-appreciated advancement in model observability and production. The integration of observability utilities directly into the development process not only saves time but encourages model development and production teams to actively think about model use and ongoing improvements before releasing to production. This is a big win for management of the model lifecycle.

Pietro Bolcato Lead ML Engineer, Kling Klang Klong

This is a library to LLMs and RNs that provides visual clustering analysis and model interpretability, super useful to help understand how a model works, and to demystify the black-box phenomena!

Yuki Waka
Yuki Waka Application Developer, Klick

Phoenix integrated into our team’s existing data science workflows and enabled the exploration of unstructured text data to identify root causes of unexpected user inputs, problematic LLM responses, and gaps in our knowledge base.

Lior Sinclair
Lior Sinclair AI Researcher

Just came across Arize-phoenix, a new library for LLMs and RNs that provides visual clustering and model interpretability. Super useful.

Tom Matthews
Tom Matthews Machine Learning Engineer at Unitary.ai

This is something that I was wanting to build at some point in the future, so I’m really happy to not have to build it. This is amazing.

Erick Siavichay
Erick Siavichay Project Mentor, Inspirit AI

We are in an exciting time for AI technology including LLMs. We will need better tools to understand and monitor an LLM’s decision making. With Phoenix, Arize is offering an open source way to do exactly just that in a nifty library.

Shubham Sharma
Shubham Sharma, VentureBeat

Large language models...remain susceptible to hallucination — in other words, producing false or misleading results. Phoenix, announced today at Arize AI’s Observe 2023 summit, targets this exact problem by visualizing complex LLM decision-making and flagging when and where models fail, go wrong, give poor responses or incorrectly generalize.

Yujian Tang
Yujian Tang Published in Plain Simple Software

23 Open Source AI Libraries for 2023. AI may be the top field to get into in 2023. Here are 23 open source libraries to get you started.

With Phoenix, AI Engineers and Data Scientists can

Code-Evaluate Performance of LLM Tasks with Evals Library
Use the Phoenix Evals library to easily evaluate tasks such as hallucination, summarization, and retrieval relevance, or create your own custom template. See docs.
Get visibility into where your complex or agentic workflow broke, or find performance bottlenecks, across different span types with LLM Tracing. See docs.
Identify missing context in your knowledge base, and when irrelevant context is retrieved by visualizing query embeddings alongside knowledge base embeddings with RAG Analysis. See docs.
Compare and evaluate performance across model versions prior to deploying to production. See docs.
Connect teams and workflows, with continued analysis of production data from Arize in a notebook environment for fine tuning workflows. See docs.
Find clusters of problems using performance metrics or drift. Export clusters for retraining workflows. See docs.
Use the Embeddings Analyzer to surface data drift for computer vision, NLP, and tabular models. See docs.

When to use Phoenix vs Arize

Early iteration

Pre-prod Evaluation

Production

Phoenix
RECOMMENDED FOR
  • Designed for fast, iterative development of models during pre-production and development
  • Notebook and local usage
  • EDA (exploratory data analysis)
  • LLM evaluation and iteration
  • Visibility into LLM traces and spans
VIEW FULL COMPARISON →
  • Available in a notebook
  • Supports Tabular, Image, NLP, and Generative models
  • Rich visualizations for exploratory data analysis
  • Single model support
  • Lightweight monitoring & checks
  • Workflows to export findings
  • Supports drift metrics
  • Runs locally on your data
RECOMMENDED FOR
  • Platform for observability of production models
  • Cloud or on-prem
  • ML teams looking for visibility across all their ML and LLM use cases
  • LLM prompt iteration and eval tracking
  • Advanced RCA (root cause analysis)
  • Always on data collection and monitoring
  • Timeseries and dashboard analysis
  • Scale and security
  • Robust integrations
  • Shareable URLs with your team
  • Explainability and fairness
VIEW FULL COMPARISON →
  • Available on cloud or on-prem
  • Supports Tabular, Image, NLP, and Generative models
  • Rich visualizations for exploratory data analysis
  • Opinionated root cause analysis (tracing workflows)
  • High Scale + Performant (works on billions of predictions)
  • Multi-model support
  • Configurable monitoring and alerting integrations
  • Shareable insights and dashboards for your team
  • Workflows to export findings
  • Customizable performance, drift, and data quality metrics
  • RBAC controls
  • Security and compliance
  • Embeddings and latent structure are the backbone of modern models

  • LLM and model complexity is off the charts

  • Model improvement, analysis and control are severely lacking a set of easy-to-use tools

  • Meets the data scientist (you) in the notebook to help solve the complex ML problems

Stay up to date with Phoenix updates