Easily collect LLM app data with automatic instrumentation.
Laser fast, pre-tested eval templates that are easy to customize to any task.
Faithfulness | 0.66 |
48%
|
Correctness | 0.67 |
33%
|
Save, curate and build test sets for prompt templates, prompt iteration and fine-tuning.
Easily test new prompt changes against your data for greater confidence before deployment.
Uncover semantically similar questions, chunks, or responses using embeddings to find poor performance.
Built on top of OpenTelemetry, Phoenix is agnostic of vendor, framework, and language – granting you the flexibility you need in today’s generative landscape.
span = tracer.start_span("chat", attributes={SpanAttributes.OPENINFERENCE_SPAN_KIND: "CHAIN"})
with trace.use_span(span, end_on_exit=False):
last_message_content, messages = await parse_chat_data(data)
span.set_attribute(SpanAttributes.INPUT_VALUE, last_message_content)
response = await chat_engine.astream_chat(last_message_content, messages)
async def event_generator():
full_response = ""
async for token in response.async_response_gen():
if await request.is_disconnected():
break
full_response = full_response + token
yield token
span.set_attribute(SpanAttributes.OUTPUT_VALUE, full_response)
span.end()
return StreamingResponse(event_generator(), media_type="text/plain")
Get the latest news, expertise, and product updates from Phoenix .