Menu
Choose a product
Scroll for more
Grafana Cloud
GenAI evaluations setup
GenAI Evaluations provides comprehensive monitoring for AI model quality and safety through hallucination detection, toxicity analysis, bias assessment, and evaluation scoring using OpenLIT’s built-in evaluation capabilities.
Prerequisites
Before setting up GenAI Evaluations, ensure you have completed the GenAI Observability setup.
Initialize evaluations
OpenLIT provides built-in evaluation capabilities for hallucination, bias, and toxicity detection. Set up your API key for the evaluation provider:
Bash
# For OpenAI-based evaluations
export OPENAI_API_KEY="your-openai-api-key"
# Or for Anthropic-based evaluations
export ANTHROPIC_API_KEY="your-anthropic-api-key"Basic evaluation setup
Use the “All” evaluator to check for hallucination, bias, and toxicity in one go:
Python
import openlit
openlit.init()
# Initialize the All evaluator (checks for Hallucination, Bias, and Toxicity)
evals = openlit.evals.All(provider="openai")
# Example evaluation
contexts = ["Einstein won the Nobel Prize for his discovery of the photoelectric effect in 1921"]
prompt = "When and why did Einstein win the Nobel Prize?"
text = "Einstein won the Nobel Prize in 1969 for his discovery of the photoelectric effect"
result = evals.measure(prompt=prompt, contexts=contexts, text=text)
print(result)Specific evaluation metrics
For targeted evaluations, use specific evaluation metrics:
Hallucination detection
Python
import openlit
openlit.init()
# Initialize hallucination detector
hallucination_detector = openlit.evals.Hallucination(provider="openai")
result = hallucination_detector.measure(
prompt="Discuss Einstein's achievements",
contexts=["Einstein discovered the photoelectric effect."],
text="Einstein won the Nobel Prize in 1969 for the theory of relativity."
)Bias detection
Python
import openlit
openlit.init()
# Initialize bias detector
bias_detector = openlit.evals.Bias(provider="openai")
result = bias_detector.measure(
prompt="Describe a software engineer",
text="Software engineers are typically young men who work long hours"
)Toxicity detection
Python
import openlit
openlit.init()
# Initialize toxicity detector
toxicity_detector = openlit.evals.Toxicity(provider="openai")
result = toxicity_detector.measure(
prompt="Please provide feedback",
text="Your response contains concerning language patterns"
)Was this page helpful?
Related resources from Grafana Labs
Additional helpful documentation, links, and articles:
Video

Getting started with managing your metrics, logs, and traces using Grafana
In this webinar, we’ll demo how to get started using the LGTM Stack: Loki for logs, Grafana for visualization, Tempo for traces, and Mimir for metrics.
Video

Intro to Kubernetes monitoring in Grafana Cloud
In this webinar you’ll learn how Grafana offers developers and SREs a simple and quick-to-value solution for monitoring their Kubernetes infrastructure.
Video

Building advanced Grafana dashboards
In this webinar, we’ll demo how to build and format Grafana dashboards.