Category: Phoenix

LLM Summarization: Getting To Production

Category: Phoenix, Use-Case | Comments Off on LLM Summarization: Getting To Production
Posted by Shittu Olumide | May 30, 2024

Recently, I attended a workshop hosted by Arize AI’s Jason Lapatecki and Dat Ngo on large language model summarization covering common challenges with the use case and how to evaluate generated summaries. Drawing from this session and additional research, this article dives into the concept of LLM summarization – why it is important, primary summarization…

Using Generative AI to Evaluate Bias in Speeches

Category: LLMOps, Phoenix, Responsible AI | Comments Off on Using Generative AI to Evaluate Bias in Speeches
Posted by Amber Roberts | May 17, 2024

Kansas City Chiefs kicker Harrison Butker recently sparked debate after delivering a commencement address to the 2024 graduating class at Benedictine College that touched on topics like gender roles, Pride Month, and President Joe Biden. With Butker’s words igniting intense debate and many offering their views across social media, we decided to investigate how AI…

How To Set Up a SQL Router Query Engine for Effective Text-To-SQL

Category: LLMOps, Phoenix, Use-Case | Comments Off on How To Set Up a SQL Router Query Engine for Effective Text-To-SQL
Posted by Amber Roberts | March 18, 2024

This article co-authored by Dustin Ngo Large language model (LLM) applications are being deployed by an increasing number of companies to power everything from code generation to improved summarization of customer service calls. One area where LLMs with in-context learning show promise is text-to-SQL, or generating SQL queries from natural language. Achieving results is often…

Evaluate RAG with LLM Evals and Benchmarks

Category: Generative AI, Large Language Models, LLMOps, Phoenix, Use-Case | Comments Off on Evaluate RAG with LLM Evals and Benchmarks
Posted by Shittu Olumide | March 6, 2024 | Tags: llamaindex, llm evaluation, phoenix

Recently, I attended a workshop organized by Arize AI titled “RAG Time! Evaluate RAG with LLM Evals and Benchmarking.” Hosted by Amber Roberts – ML Growth Lead at Arize AI, and Mikyo King – Head of Open Source at Arize AI, the talks provided valuable insights into an important field of study. Miss the event?…

Evaluating and Analyzing Your RAG Pipeline with Ragas

Category: Generative AI, Large Language Models, LLMOps, Phoenix, Product | Comments Off on Evaluating and Analyzing Your RAG Pipeline with Ragas
Posted by Shahul ES | February 20, 2024

This article is co-authored by Mikyo King, Founding Engineer and Head of Open Source at Arize AI, and Xander Song, AI Engineer at Arize AI Building a baseline for a RAG pipeline is not usually difficult, but enhancing it to make it suitable for production and ensuring the quality of your responses is almost always…

Calling All Functions: Benchmarking OpenAI Function Calling and Explanations

Category: Generative AI, Large Language Models, LLMOps, Phoenix, Use-Case | Comments Off on Calling All Functions: Benchmarking OpenAI Function Calling and Explanations
Posted by Amber Roberts | December 7, 2023

This piece is co-authored by Roger Yang, Software Engineer at Arize AI Observability in third-party large language models (LLMs) is largely approached with benchmarking and evaluations since models like Anthropic’s Claude, OpenAI’s GPT models, and Google’s PaLM 2 are proprietary. In this blog post, we benchmark OpenAI’s GPT models with function calling and explanations against…

LLM Tracing and Observability

Category: Generative AI, Large Language Models, LLMOps, Phoenix, Product | Comments Off on LLM Tracing and Observability
Posted by Amber Roberts | October 2, 2023 | Tags: LLM observability

What is LLM App Tracing? The rise of large language model (LLM) application development has enabled developers to move quickly in building applications powered by LLMs. The abstractions created by these frameworks can accelerate development, but also make it hard to debug an LLM app. This is where Arize Phoenix, a popular open-source library for…