Phoenix logo

Category: LLMOps

Special shoutout to Manas Singh for collaborating with us on this research! One application of LLMs that has garnered headlines and significant investment surrounds their ability to generate SQL queries. The ability to query large databases with natural language unlocks several compelling use cases on everything from greater data transparency to increasing accessibility for non-technical…

Thanks to John Gilhuly for his contributions to this piece LLM instrumentation is the process of monitoring and collecting data in an LLM application, and it plays an important role in achieving the level of performance and reliability necessary in these systems. This blog explores the different ways you can instrument your LLM application, comparing…

Function calling is an essential part of any AI engineer’s toolkit, enabling builders to enhance a model’s utility at specific tasks. As more LLM applications leveraging tool calls get deployed into production, the task of effectively evaluating their performance in LLM pipelines becomes more critical. What Is Function Calling In AI? First launched by OpenAI…

Due to the black box nature of LLMs and the importance of tasks they’re being trusted to handle, intelligent monitoring and optimization tools are essential to ensure they operate efficiently and effectively. The integration of Arize Phoenix with LlamaIndex’s newly released instrumentation module offers developers unprecedented power to fine-tune performance, diagnose issues, and enhance the…

Kansas City Chiefs kicker Harrison Butker recently sparked debate after delivering a commencement address to the 2024 graduating class at Benedictine College that touched on topics like gender roles, Pride Month, and President Joe Biden. With Butker’s words igniting intense debate and many offering their views across social media, we decided to investigate how AI…

This article co-authored by Dustin Ngo Large language model (LLM) applications are being deployed by an increasing number of companies to power everything from code generation to improved summarization of customer service calls. One area where LLMs with in-context learning show promise is text-to-SQL, or generating SQL queries from natural language. Achieving results is often…

Recently, I attended a workshop organized by Arize AI titled “RAG Time! Evaluate RAG with LLM Evals and Benchmarking.” Hosted by Amber Roberts – ML Growth Lead at Arize AI, and Mikyo King – Head of Open Source at Arize AI, the talks provided valuable insights into an important field of study. Miss the event?…

This article is co-authored by Mikyo King, Founding Engineer and Head of Open Source at Arize AI, and Xander Song, AI Engineer at Arize AI Building a baseline for a RAG pipeline is not usually difficult, but enhancing it to make it suitable for production and ensuring the quality of your responses is almost always…

This piece is co-authored by Roger Yang, Software Engineer at Arize AI Observability in third-party large language models (LLMs) is largely approached with benchmarking and evaluations since models like Anthropic’s Claude, OpenAI’s GPT models, and Google’s PaLM 2 are proprietary. In this blog post, we benchmark OpenAI’s GPT models with function calling and explanations against…

What is LLM App Tracing? The rise of large language model (LLM) application development has enabled developers to move quickly in building applications powered by LLMs. The abstractions created by these frameworks can accelerate development, but also make it hard to debug an LLM app. This is where Arize Phoenix, a popular open-source library for…