Open-source LLM
tracing and evaluation __

Accelerate your AI development with powerful insights.
Seamlessly evaluate, experiment, and optimize AI applications in real time

Get started
> pip install arize-phoenix
2.2M+ Downloads
4k+ Github Stars
1M+ OTEL Instrumentation Monthly Downloads
6k+ Community

AI Developers Use Phoenix

Flexible, Transparent, & Free from Lock-In

Phoenix uses OpenTelemetry (OTEL) for seamless setup, full transparency, and no vendor lock-in—start, scale, or move on without restriction.

Application Tracing for Total Visibility

Easily collect LLM app data with automatic instrumentation — or go manual for utmost control.

Go to Docs

Interactive Prompt Playground

Leverage an eval library with unparalleled speed and ergonomics. Start with pre-built templates, easily customized to any task, or incorporate Human feedback.

Go to Docs

Streamlined Evaluations and Annotations

Leverage an eval library with unparalleled speed and ergonomics. Start with pre-built templates, easily customized to any task, or incorporate Human feedback.

Go to Docs
Dataset Clustering & Visualization

Uncover semantically similar questions, document chunks, and responses using embeddings to isolate poor performance.

Proudly Open Source

Built on top of OpenTelemetry, Phoenix is agnostic of vendor, framework, and language – granting you the flexibility you need in today’s generative landscape.

Phoenix works with all of your LLM tools

Run model tests, leverage pre-built templates, and seamlessly incorporate human feedback. Customize to fit any project while fine-tuning faster than ever.

Don’t just take our word for it

"Just came across Arize-phoenix, a new library for LLMs and RNs that provides visual clustering and model interpretability. Super useful."
Lior Sinclair AI Researcher
"As LLM-powered applications increase in sophistication and new use cases emerge, deeper capabilities around LLM observability are needed to help debug and troubleshoot. We’re pleased to see this open-source solution from Arize, along with a one-click integration to LlamaIndex, and recommend any AI engineers or developers building with LlamaIndex check it out."
Jerry Liu CEO and Co-Founder, LlamaIndex
"I always get asked what my favorite **Open Source** eval tool is. The most impressive one thus far to me is Phoenix."
Hamel Husain Founder, Parlance Labs
"This is something that I was wanting to build at some point in the future, so I’m really happy to not have to build it. This is amazing."
Tom Matthews Machine Learning Engineer at Unitary.ai
"Arize's Phoenix tool uses one LLM to evaluate another for relevance, toxicity, and quality of responses. The tool uses "Traces" to record the paths taken by LLM requests (made by an application or end user) as they propagate through multiple steps. An accompanying OpenInference specification uses telemetry data to understand the execution of LLMs and the surrounding application context. In short, it's possible to figure out where an LLM workflow broke or troubleshoot problems related to retrieval and tool execution."
Lucas Mearian Senior Reporter, ComputerWorld
"Large language models...remain susceptible to hallucination — in other words, producing false or misleading results. Phoenix, announced today at Arize AI’s Observe 2023 summit, targets this exact problem by visualizing complex LLM decision-making and flagging when and where models fail, go wrong, give poor responses or incorrectly generalize."
Shubham Sharma, VentureBeat
"Phoenix is a much-appreciated advancement in model observability and production. The integration of observability utilities directly into the development process not only saves time but encourages model development and production teams to actively think about model use and ongoing improvements before releasing to production. This is a big win for management of the model lifecycle."
Christopher Brown CEO and Co-Founder of Decision Patterns and a former UC Berkeley Computer Science lecturer
"Phoenix integrated into our team’s existing data science workflows and enabled the exploration of unstructured text data to identify root causes of unexpected user inputs, problematic LLM responses, and gaps in our knowledge base."
Yuki Waka Application Developer, Klick
"We are in an exciting time for AI technology including LLMs. We will need better tools to understand and monitor an LLM’s decision making. With Phoenix, Arize is offering an open source way to do exactly just that in a nifty library."
Erick Siavichay Project Mentor, Inspirit AI

Start your AI Observability Journey