Phoenix logo

Category: Use-Case

Due to the black box nature of LLMs and the importance of tasks they’re being trusted to handle, intelligent monitoring and optimization tools are essential to ensure they operate efficiently and effectively. The integration of Arize Phoenix with LlamaIndex’s newly released instrumentation module offers developers unprecedented power to fine-tune performance, diagnose issues, and enhance the…

Recently, I attended a workshop hosted by Arize AI’s Jason Lapatecki and Dat Ngo on large language model summarization covering common challenges with the use case and how to evaluate generated summaries. Drawing from this session and additional research, this article dives into the concept of LLM summarization – why it is important, primary summarization…

This article co-authored by Dustin Ngo Large language model (LLM) applications are being deployed by an increasing number of companies to power everything from code generation to improved summarization of customer service calls. One area where LLMs with in-context learning show promise is text-to-SQL, or generating SQL queries from natural language. Achieving results is often…

Recently, I attended a workshop organized by Arize AI titled “RAG Time! Evaluate RAG with LLM Evals and Benchmarking.” Hosted by Amber Roberts – ML Growth Lead at Arize AI, and Mikyo King – Head of Open Source at Arize AI, the talks provided valuable insights into an important field of study. Miss the event?…

This piece is co-authored by Roger Yang, Software Engineer at Arize AI Observability in third-party large language models (LLMs) is largely approached with benchmarking and evaluations since models like Anthropic’s Claude, OpenAI’s GPT models, and Google’s PaLM 2 are proprietary. In this blog post, we benchmark OpenAI’s GPT models with function calling and explanations against…