Menu
Grafana Cloud

Sift investigations

Sift investigations can be started from various locations in Grafana. In all cases, Sift requires some inputs so that it can look in the right places for issues. These inputs are labels, such as cluster, namespace or container, and a time range.

Note

In most cases Sift doesn’t require any specific labels to run an investigation, but investigations with labels such as cluster and namespace will find the best results.

Investigations can be started from:

  • Grafana Explore: use the + Add button in the toolbar and choose Run investigation. Sift will extract labels from the query and use the current Explore time range.

  • Grafana dashboards: use the dropdown on a panel and choose Run investigation. Sift will extract labels from the query and use the current dashboard time range.

  • Grafana Incident: see the Sift in Grafana Incident section below.

Make sure you have enabled Grafana Machine Learning before running an investigation. See Enable Grafana Machine Learning for more information.

Note

Currently Sift will only extract labels from PromQL queries in Explore/dashboard panels, but support for more data sources will be added in future. In the meantime you can manually add labels to the investigation using the form.

Label Management

Sift uses the provided labels to identify the scope of investigation and discover issues.

Auto-discovering datasources

While the default datasource to be used can be configured for every Sift check, Sift is capable of autodiscovering datasources based on provided labels.

Sift queries all Prometheus, Loki and Tempo datasources configured in Grafana for the labelset provided and identifies the right datasources based on number of matching series/streams. If the provided labelset matches too many series/streams, Sift will not run the investigation because a large scope can lead to noisy results and less value.

Label usage by Sift checks

Sift checks use different combinations of the provided labelset depending on their scope of operation. Checks like ‘Error Pattern Logs’ will use the complete labelset and analyse the resulting Loki streams, while checks like ‘Kube Crashes’ will use just ‘cluster’ and ’namespace’ (or ‘k8s.cluster.name’ and ‘k8s.namespace.name’) labels among the supplied labelset to query Prometheus for crashed pods.

Label filtering

Since Sift uses the provided labels in Prometheus/Loki queries as described above, it is important to filter out labels that not helpful. Sift will automatically filter out the following labels: grafana_folder, account_id, ref_id, alertname, severity, datasource_uid, filename and mountpoint.

Any labels containing whitespace in the key or value field are also filtered out for the same reason.

Viewing investigation results

Sift investigations can be viewed in the Grafana Machine Learning page. In the Alerts & IRM category of the sidebar, click Machine learning then View investigations. Your investigations will be listed and can be filtered from the toolbar.

Click an investigation to view the results. The checks are shown in a column on the left, grouped by status:

  • Interesting results contains checks which found something potentially useful.
  • Completed checks contains checks which ran and determined that nothing unusual had happened during the investigation.
  • Failed checks contains checks which failed to run for any reason.

Click a check to view the results. Each check has a custom-built UI designed to convey the information surfaced by the check.

Sift in Grafana Incident

Note

cluster and namespace are currently required to initiate a Sift investigation from Grafana Incident.

You can use Sift investigations in Grafana Incident to get valuable suggestions while working to resolve an active incident. Currently, there are two ways you can leverage Sift within Grafana Incident:

  • Run a Sift investigation within an incident: From the Suggestions section in the right sidebar of the incident timeline, click Start Sift investigation. Manually enter the cluster and namespace to start a Sift investigation specifically tailored to the incident.

  • Add some context to the Incident timeline: link to a dashboard, Explore query, alert rule or OnCall alert rule, and Sift will automatically extract cluster and namespace labels and start investigations.

Note

When a Sift investigation is triggered from within an incident, the Timerange is automatically set to the incident start time through the time investigation is triggered.

View and manage Sift suggestions

When a Sift check identifies interesting results, clickable links appear in the right sidebar under Suggestions. Click these links to review detailed information about the specific Sift check.

You can add important Sift suggestions directly to the main Incident timeline. Alternatively, if a Sift check result is deemed irrelevant, you can dismiss it from the suggestions.