Grafana Cloud

Use built-in dashboards

Note

AI Observability Dashboards is currently in public preview. Grafana Labs offers limited support, and breaking changes might occur prior to the feature being made generally available.

AI Observability includes pre-built analytics dashboards that visualize agent activity, performance, cost, and quality. The dashboards use Prometheus metrics and internal plugin data to surface actionable insights.

Access dashboards

Navigate to Analytics in the AI Observability plugin. The dashboards are organized into these areas:

  • Activity: generation counts, conversation counts, and active agents over time.
  • Performance: latency distributions, time to first token, and error rates.
  • Tokens and cost: token usage by model and provider, cost breakdown, and cache efficiency.
  • Tools: tool call frequency, tool execution duration, tool error rates, and usage percentage per tool.
  • Quality: evaluation scores, score distributions, and quality trends.

Identify performance issues

Use the performance dashboard to spot problems:

  • High latency: filter by agent or model to find slow generations. Drill into traces for specific conversations to identify bottlenecks.
  • Error spikes: the error rate panel shows failures over time. Click through to conversations with errors to inspect the call_error payload.
  • Slow time to first token: for streaming agents, the TTFT panel reveals which models or prompts have poor streaming performance.

Optimize costs

The tokens and cost dashboard helps you find optimization opportunities:

  • Cost by model: compare cost across models and providers. Consider switching expensive calls to cheaper models where quality is acceptable.
  • Cache efficiency: the cache read ratio shows how effectively prompt caching reduces token usage. Low cache rates may indicate prompts that change too frequently.
  • Token usage trends: spot unexpected increases in token usage that may indicate prompt regression or unnecessary verbosity.

Track quality

The quality dashboard visualizes evaluation scores alongside operational metrics:

  • Score trends: monitor if quality improves or degrades after agent version changes.
  • Score distributions: identify if responses cluster around high or low scores.
  • Correlation: compare quality scores with latency and cost to find the right balance.

Use Prometheus metrics directly

If you need custom dashboards, query the AI Observability OpenTelemetry metrics in Prometheus:

MetricDescription
gen_ai_client_operation_durationLLM call duration histogram.
gen_ai_client_token_usageToken consumption histogram.
gen_ai_client_time_to_first_tokenStreaming TTFT histogram.
gen_ai_client_tool_calls_per_operationTool calls per generation.
sigil_build_infoBuild version info with revision and branch labels.

If you enable evaluation metrics push (SIGIL_EVAL_METRICS_PUSH_ENDPOINT), per-tenant evaluation metrics are also available in Prometheus for custom dashboards and alerting.

Open conversations from exemplars

Dashboard exemplars connect metric spikes to example conversations. When exemplars are enabled, histogram panels can show exemplar markers with links that open the matching conversation.

Use exemplar links to move from a latency or error spike to the conversation that produced it. This feature requires exemplars to be enabled for your stack.

Set up alerts

Create Grafana alerts on AI Observability metrics to proactively catch issues:

  • Alert on error rate exceeding a threshold.
  • Alert on p95 latency exceeding SLO targets.
  • Alert on cost per day exceeding budget.
  • Alert on evaluation score drops below a quality threshold.

Configure alerts in Grafana using the standard alerting workflow with the Prometheus data source.

Next steps