LlamaIndex’s Newly-Released Instrumentation Module + Phoenix Integration

Published July 1, 2024

Due to the black box nature of LLMs and the importance of tasks they’re being trusted to handle, intelligent monitoring and optimization tools are essential to ensure they operate efficiently and effectively. The integration of Arize Phoenix with LlamaIndex’s newly released instrumentation module offers developers unprecedented power to fine-tune performance, diagnose issues, and enhance the overall functionality of their LLM applications.

LlamaIndex has introduced a significant update in its latest version (v0.10.20) — the new instrumentation module, which is set to replace the legacy callbacks module. This update is an important shift in how developers can instrument their LLM applications, providing a more structured and flexible approach to event and span management. The new module is a complete overhaul that aims to streamline the integration process and improve the ease with which developers can track and manage application flows. Both the old callbacks and the new instrumentation will be supported during the deprecation period, but after existing integrations are migrated, callbacks will no longer be supported.

Key Features of the New Instrumentation Module

The new instrumentation module introduces several core components, each designed to optimize the monitoring and management of LLM applications.

Event: At the heart of the instrumentation module is the Event class. An event represents a discrete moment in time during the application’s execution when something notable occurs. This could range from the start of a data retrieval operation to the completion of a processing task. Events are pivotal for pinpointing when specific actions take place within the application, providing a granular view of its operation.

EventHandler: The EventHandler class listens for these events. Once an event is detected, the EventHandler executes predefined logic. This setup allows developers to specify custom responses to different events, such as logging information, triggering alerts, or even initiating corrective actions. This responsiveness is crucial for maintaining LLM applications.

Span: The Span class extends the concept of events by representing a sequence of operations within the application. It tracks the flow from start to finish, capturing all related events. This is particularly useful for understanding the performance and behavior of complex sequences within the application, offering insights into bottlenecks or failures.

SpanHandler: Associated with managing spans, the SpanHandler is responsible for actions like entering, exiting, and dropping spans, particularly in scenarios of unexpected terminations or errors. It ensures that every part of the application’s operational flow is accounted for and managed effectively.

Dispatcher: Serving as the central hub for event and span management, the Dispatcher emits signals and directs the flow of events and spans to the appropriate handlers. It is the orchestrator ensuring that all parts of the instrumentation framework function in harmony.

Practical Example: Tracing a Multimodal Query Application Using LlamaIndex and Phoenix

In this example, we’ll look at how to set up a multimodal query application using LlamaIndex, integrated with tracing capabilities provided by Phoenix. The application will use textual and image data to perform complex queries that combine both modalities. Use this notebook to access and experiment with this code.

Setup and Tracing Configuration

First, import all the required Python libraries. This includes libraries for file and HTTP handling, LlamaIndex components for setting up the multimodal system, OpenAI for language model integration, and OpenTelemetry for tracing with Phoenix.

# Import all necessary libraries for the application and tracing
import tempfile
import requests
import os
from llama_index.core import Settings, VectorStoreIndex
from llama_index.core.agent import AgentRunner
from llama_index.core.agent.react_multimodal.step import MultimodalReActAgentWorker
from llama_index.core.base.agent.types import Task
from llama_index.core.schema import ImageDocument
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.llms.openai import OpenAI
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.readers.web import SimpleWebPageReader
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.trace.export import SimpleSpanProcessor

Configure Tracing and Environmental Variables

Set up the tracing infrastructure to monitor the application’s performance and interactions. Also, configure the environment variable for OpenAI’s API key to authenticate API requests.

# Configure OpenAI API key and tracing endpoint
OPENAI_API_KEY = "your_openai_api_key"
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

endpoint = "http://127.0.0.1:6006/v1/traces"
tracer_provider = trace_sdk.TracerProvider()
tracer_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter(endpoint)))

LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)

Define Utility Functions for Task Execution

Create helper functions to manage the execution of tasks within the multimodal agent. These functions handle the running of steps and the finalization of responses.

# Define utility functions to execute steps and handle task completion
def execute_step(agent: AgentRunner, task: Task):
    step_output = agent.run_step(task.task_id)
    if step_output.is_last:
        response = agent.finalize_response(task.task_id)
        print(f"> Agent finished: {str(response)}")
        return response
    else:
        return None

def execute_steps(agent: AgentRunner, task: Task):
    response = execute_step(agent, task)
    while response is None:
        response = execute_step(agent, task)
    return response

Load Documents and Configure the Query Agent

Load the necessary documents from a specified URL and set up the query tools and agents. This step involves initializing a VectorStore with the documents and setting up a multimodal agent to handle queries that incorporate both text and image data.

# Load documents and configure the multimodal query agent
url = "https://openai.com/blog/new-models-and-developer-products-announced-at-devday"
reader = SimpleWebPageReader(html_to_text=True)
documents = reader.load_data(urls=[url])

Settings.llm = OpenAI(temperature=0, model="gpt-3.5-turbo", api_key=OPENAI_API_KEY)
query_tool = QueryEngineTool(
    query_engine=VectorStoreIndex.from_documents(documents).as_query_engine(),
    metadata=ToolMetadata(
        name="vector_tool",
        description="Useful to lookup new features announced by OpenAI",
    ),
)

mm_llm = OpenAIMultiModal(model="gpt-4o", api_key=OPENAI_API_KEY, max_new_tokens=1000)
react_step_engine = MultimodalReActAgentWorker.from_tools(
    [query_tool],
    multi_modal_llm=mm_llm,
    verbose=True,
)
agent = react_step_engine.as_agent()

Execute a Multimodal Query

Finally, execute a query that utilizes both the textual content of a web page and an associated image. This demonstrates how multimodal data can be used to enrich query responses, providing more comprehensive answers.

# Execute a multimodal query using both text and image data
query_str = (
    "The photo shows some new features released by OpenAI. "
    "Can you pinpoint the features in the photo and give more details using relevant tools?"
)
jpg_url = "https://images.openai.com/blob/a2e49de2-ba5b-4869-9c2d-db3b4b5dcc19/new-models-and-developer-products-announced-at-devday.jpg"

with tempfile.NamedTemporaryFile(suffix=".jpg") as tf:
    with open(tf.name, "wb") as f:
        f.write(requests.get(jpg_url).content)
    image_document = ImageDocument(image_path=tf.name)
    task = agent.create_task(query_str, extra_state={"image_docs": [image_document]})
    response = execute_steps(agent, task)
    print(str(response))

After querying to model, traces and further evaluation can be seen in the Phoenix app at http://localhost:6006.

This example provides a comprehensive approach to setting up a multimodal query application with tracing and demonstrates the Phoenix – LlamaIndex integration in a practical scenario.

Conclusion

By leveraging the capabilities of both Phoenix and LlamaIndex, developers can achieve deeper insights and more effective management of their LLM systems. This step-by-step guide illustrates how the tracing features of Phoenix can be implemented to monitor and optimize the performance of a multimodal query application made possible by LlamaIndex. The integration of these technologies provides developers with the tools they need to build more robust, efficient, and scalable LLM applications.