We'll create a template with inspiration from Monzo's template but adjust it to the above guidelines and to my preference. Slack templateĪlertmanager has great support for custom templates where you can make use of both labels and annotations. We'll add additional buttons with links to your dashboards and runbooks. We'll use summary as the Slack message headline and description as anything detailed regarding the alert. It's also easier to interact with alerts using Slack on your computer than text messages on your phone. This is more difficult to provide through a paging incident sent to your phone. Additionally, we have two optional but recommended annotations: dashboard_url - an url to a dashboard related to the alert, runbook_url - an url to a runbook for handling the alert.įrom the above guidelines we can conclude that we'll route warnings to Slack, however I also route critical alerts to Slack since they provide easy access to dashboard links, runbook links and extensive information about the alert. Use the label severity to indicate the severity of an alert with the following values: info - not routed anywhere, but provides insights when debugging, warning - not urgent enough to wake someone up or any immediate action, in my case warnings fall into Slack, larger organizations might queue them into a bug tracking system, critical - someone gets paged. Use the mandatory annotation summary to summarize the alert and description as an optional annotation for any details regarding the alert. The original documentation describes the guidelines well, and I'll just summarize it. We expect all OSS Prometheus rules and alerts to follow a specific pattern and you apply the same pattern internally to your alerts and rules. The great thing about these guidelines is that you can have a single Alertmanager template to make use of labels and annotations shared across all alerts. For example the kube-prometheus project(backbone of the kube-prometheus-stack Helm chart) uses both the Kubernetes-monitoring and Node-exporter mixin amongst others mixins. Even though you might not be familiar with monitoring-mixins there's a probability that you've used them. Monitoring-mixins are OSS resources of Prometheus alerts and rules and Grafana dashboards for a specific technology. More or less standards or best practises that many monitoring-mixins follow. The monitoring-mixin documentation goes through guidelines for alert names, labels and annotations. Basics: guidelines for alert names, labels, and annotations This post will go through how to make efficient Slack alerts. You can create detailed Slack alerts with tons of information as dashboard links, runbook links and alert descriptions which go well together with the rest of your ChatOps stack. Alertmanager configuration, templates and rules make a huge difference, especially when the team has an approach of 'not staring at dashboards all day'. Alertmanager on the other hand is not highlighted as much and even though the use case can be seen as fairly simple it can be complex, the templating language has lots of features and functionality. The Prometheus and Grafana bits are well documented and there exists tons of open source approaches on how to make use of them the best. Prometheus, Grafana and Alertmanager is the default stack for me when deploying a monitoring system.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |