Category: Phoenix

Evaluating an Image Classifier

Category: Generative AI, Large Language Models, Phoenix | Write Comment
Posted by John Gilhuly | September 3, 2024

Phoenix supports multi-modal evaluation and tracing. In this tutorial, we’ll take advantage of that to walk through the process of setting up an image classification experiment using Phoenix. This involves uploading a dataset, creating an experiment to classify the images, and evaluating the model’s accuracy. We’ll be using OpenAI’s GPT-4o-mini model for the classification task. …

How To Set Up CrewAI Observability

Category: Phoenix | Comments Off on How To Set Up CrewAI Observability
Posted by Dat Ngo | August 26, 2024

Why Observability is Important with CrewAI In the world of autonomous AI agents, the ability to monitor and evaluate their performance is the key to unlocking their full potential. Observability allows you to see exactly what your agents are doing, how they are performing, and where they can be improved. Just as visibility into KPIs…

Trace Your Haystack Application

Category: Generative AI, Phoenix | Comments Off on Trace Your Haystack Application
Posted by Steve Herzog | August 19, 2024

Haystack is an open-source framework for building LLM applications, retrieval-augmented generative pipelines and search systems that work intelligently over large document collections. Haystack makes it very easy to get off and running quickly with search and RAG LLM apps. Combining Haystack with Phoenix gives you the tools to not only build these apps quickly, but…

How To Use Annotations To Collect Human Feedback On Your LLM Application

Category: Generative AI, Phoenix | Comments Off on How To Use Annotations To Collect Human Feedback On Your LLM Application
Posted by Steve Herzog | August 15, 2024

Liking Phoenix? Please consider giving us a ⭐ on Github! Cast your mind back to the early days of mainstream AI development – a whopping seven years ago. NVIDIA stock was just over $1 per share. “Transformers” meant Optimus Prime, not a word-changing technical leap forward. And the primary way of evaluating AI applications was…

LlamaIndex Workflows: Navigating a New Way To Build Cyclical Agents

Category: Large Language Models, Phoenix | Comments Off on LlamaIndex Workflows: Navigating a New Way To Build Cyclical Agents
Posted by Steve Herzog | August 8, 2024

Last week, LlamaIndex released Workflows, a new approach to easily create agents. Workflows use an event-based architecture instead of the directed acyclic graph approach used by traditional pipelines or chains. This new approach brings with it new considerations for developers looking to create agentic systems, as well as new questions on how to evaluate and…

Text To SQL: Evaluating SQL Generation with LLM as a Judge

Category: LLMOps, Phoenix | Comments Off on Text To SQL: Evaluating SQL Generation with LLM as a Judge
Posted by Aparna Dhinakaran | August 1, 2024

Special shoutout to Manas Singh for collaborating with us on this research! One application of LLMs that has garnered headlines and significant investment surrounds their ability to generate SQL queries. The ability to query large databases with natural language unlocks several compelling use cases on everything from greater data transparency to increasing accessibility for non-technical…

How To: Host Phoenix + Persistence

Category: Large Language Models, Phoenix | Comments Off on How To: Host Phoenix + Persistence
Posted by Trevor LaViale | July 31, 2024

With Arize Phoenix, getting started is relatively straightforward because you can run it locally and start iterating quickly during the development and experiment phases of building an LLM application. However, once you’re ready for production — or if you want to collaborate with your teammates — it’s time to deploy Phoenix. In addition to a…

Different Ways to Instrument Your LLM Application

Category: LLMOps, Phoenix | Comments Off on Different Ways to Instrument Your LLM Application
Posted by Evan Jolley | July 25, 2024

Thanks to John Gilhuly for his contributions to this piece LLM instrumentation is the process of monitoring and collecting data in an LLM application, and it plays an important role in achieving the level of performance and reliability necessary in these systems. This blog explores the different ways you can instrument your LLM application, comparing…

LLM Function Calling: Evaluating Tool Calls In LLM Pipelines

Category: Generative AI, Large Language Models, LLMOps, Phoenix | Comments Off on LLM Function Calling: Evaluating Tool Calls In LLM Pipelines
Posted by Steve Herzog | July 16, 2024

Function calling is an essential part of any AI engineer’s toolkit, enabling builders to enhance a model’s utility at specific tasks. As more LLM applications leveraging tool calls get deployed into production, the task of effectively evaluating their performance in LLM pipelines becomes more critical. What Is Function Calling In AI? First launched by OpenAI…

LlamaIndex’s Newly-Released Instrumentation Module + Phoenix Integration

Category: Generative AI, Large Language Models, LLMOps, Phoenix, Use-Case | Comments Off on LlamaIndex’s Newly-Released Instrumentation Module + Phoenix Integration
Posted by Evan Jolley | July 1, 2024

Due to the black box nature of LLMs and the importance of tasks they’re being trusted to handle, intelligent monitoring and optimization tools are essential to ensure they operate efficiently and effectively. The integration of Arize Phoenix with LlamaIndex’s newly released instrumentation module offers developers unprecedented power to fine-tune performance, diagnose issues, and enhance the…