Self-healing with Checkmk and Event-Driven Ansible

How to resolve issues automatically

Ansible Anwendertreffen Austria 02/2026

About me

  • RenΓ© Koch
  • Self-employed consultant for:
    • Red Hat Ansible (Automation Platform)
    • Red Hat Enterprise Linux
    • Red Hat Satellite
    • Red Hat Identity Management (IPA)
  • Experienced monitoring user (Nagios, Icinga,
    Checkmk)
Ansible Anwendertreffen Austria 02/2026

About me

Ansible Anwendertreffen Austria 02/2026

Agenda

  • Monitoring: Short introduction
  • What is Event-Driven Ansible (EDA)?
  • Event-Driven Workflow
  • Live Demonstration
  • Use cases and best practices
Ansible Anwendertreffen Austria 02/2026

Monitoring: Typical workflow

  • πŸ•°οΈ 2005: Received email alerts from Nagios 2 for issues with Solaris machines
  • 🧩 Manual workflow:
    • πŸ“© Read email
    • πŸ” Log in to the system
    • πŸ”Ž Check if issue still exists
    • πŸ› οΈ Fix the issue
    • 🀬 Repeat the same procedure over and over again
  • πŸ•°οΈ 2026: Still the same workflow?
Ansible Anwendertreffen Austria 02/2026
Source: https://github.com/ansible/workshops/blob/devel/decks/ansible_rhel.pdf
Ansible Anwendertreffen Austria 02/2026

What is Checkmk?

  • Monitoring platform for infra-
    structure, applications and services
  • Provides agent- and agentless
    checks, dashboards, and alerting
  • Built for scale with distributed
    monitoring and automation support
  • Helps detect, analyze, and
    remediate issues faster
Ansible Anwendertreffen Austria 02/2026

What is Event-Driven Ansible?

  • EDA is automation that reacts to events, not
    schedules
  • Events can come from monitoring, webhooks,
    message queues, logs or cloud services
  • Rules decide when to run Ansible actions
  • Goal: faster response and consistent
    remediation
Ansible Anwendertreffen Austria 02/2026

What Is an "Event" (vs a Source Action)?

  • ⚑ Event
    • a state change or signal that matters (e.g., alert fired, service down)
    • often noisy and hard to filter
    • not every event triggers an action
  • πŸ” Source action
    • a routine trigger (e.g., "on every commit")
    • predefined target/action
  • πŸ§ͺ Examples:
    • 🚫 Update an AAP project after each commit (not EDA)
    • βœ… Send all monitoring alerts to a webhook; EDA decides what to do (EDA)
Ansible Anwendertreffen Austria 02/2026

Event-Driven Ansible vs. Ansible Playbook

  • ⚑ EDA keeps a listener running and reacts to events in near real time
  • 🧠 EDA evaluates event payloads and triggers automation only when rules match
  • πŸ“œ Ansible Playbooks do not listen for events out of the box
  • πŸ”” Without EDA, the monitoring source must trigger playbook runs directly
  • 🧱 This increases load and complexity on the monitoring system
Ansible Anwendertreffen Austria 02/2026

Event-Driven Ansible vs. AAP Controller

  • ⚑ EDA keeps an event listener active and can trigger job templates immediately
  • 🌐 AAP Controller can start jobs via webhook or API, but each job is a full run lifecycle
  • 🐒 Job startup overhead (container start, project sync, inventory/collections) adds latency
  • 🚦 Job execution may queue behind other jobs, which delays reaction time
  • 🎯 Use EDA for fast event decisions; use Controller for governed execution of the actual remediation
Ansible Anwendertreffen Austria 02/2026

Core Building Blocks

  • πŸ“‘ Event Sources: where events originate (ansible.eda plugins for webhooks, Kafka, Alertmanager, etc.)
  • πŸ“˜ Rulebook: Rulesets with conditions + actions
  • πŸ”Ž Conditions: Determine if a rule fires
  • πŸ› οΈ Actions: run playbooks, run job templates, run modules, etc.
  • 🧭 Controller (optional): central execution and governance
Ansible Anwendertreffen Austria 02/2026

Event-Driven Workflow

  1. Event arrives from a source
  2. Rulebook evaluates conditions
  3. Matching rule triggers an action
  4. Action runs playbook or other automation
  5. Results can emit new events or update systems
Ansible Anwendertreffen Austria 02/2026

Rulebook: Restart named

---

- name: Restart named on IPA server
  hosts: all
  gather_facts: false

  sources:
    - name: Listen on port 5000 for Checkmk events
      ansible.eda.webhook:
        port: 5000

  rules:
    - name: Restart named
      condition: >-
        event['payload']['servicename'] == "DNS example.com" and
        event['payload']['servicestate'] == "CRITICAL"
      action:
        run_job_template:
          name: "[LINUX] Restart named on IPA [@production] - Prompt"
          organization: "Default Organization"
          job_args:
            limit: "{{ event.payload.hostname }}"
Ansible Anwendertreffen Austria 02/2026

Checkmk Notification Script

#!/usr/bin/env bash

HEADER="X-Checkmk-Token"
TOKEN="${NOTIFY_PARAMETER_1}"
URL="${NOTIFY_PARAMETER_2}"

JSON=`cat <<EOF
{
  "hostname": "${NOTIFY_HOSTNAME}",
  "hostoutput": "${NOTIFY_HOSTOUTPUT}",
  "hoststate": "${NOTIFY_HOSTSTATE}",
  "servicename": "${NOTIFY_SERVICEDESC}",
  "serviceoutput": "${NOTIFY_SERVICEOUTPUT}",
  "servicestate": "${NOTIFY_SERVICESTATE}",
  "date": "${NOTIFY_SHORTDATETIME}",
  "type": "${NOTIFY_NOTIFICATIONTYPE}",
  "what": "${NOTIFY_WHAT}"
}
EOF
`

curl -X POST -H "Content-Type: application/json" -H "${HEADER}: ${TOKEN}" -d "${JSON}" ${URL}
exit $?
Ansible Anwendertreffen Austria 02/2026

Playbook: Restart named

---

- name: Restart named on IPA
  hosts: all
  become: true
  gather_facts: true

  tasks:
    - name: Restart named
      ansible.builtin.service:
        name: named
        state: restarted
Ansible Anwendertreffen Austria 02/2026

Run a Rulebook (CLI)

$ ansible-rulebook -r rulebooks/restart_named.yml -i localhost

PLAY [Restart named on IPA] ****************************************************

TASK [Gathering Facts] *********************************************************
ok: [ipa01.example.com]

TASK [Restart named] ***********************************************************
changed: [ipa01.example.com]

PLAY RECAP *********************************************************************
ipa01.example.com : ok=2 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 

❗Use the run_playbook action (instead of run_job_template) when running with ansible-playbook outside Ansible Automation Platform.

Ansible Anwendertreffen Austria 02/2026

Ansible Automation Platform Integration

  • πŸ“¦ Projects: Git repository configuration
  • πŸ§ͺ Decision Environments: Container images to run rulebooks
  • πŸ” Credentials: Secrets for Git, Controller, Hub, tokens, etc.
  • πŸ“‘ Event Streams: Entry points for events (mapped to source definition in rulebook)
  • πŸš€ Rulebook Activations: Rulebook runs
Ansible Anwendertreffen Austria 02/2026
Ansible Anwendertreffen Austria 02/2026

Live Demo: Fix DNS issue

Ansible Anwendertreffen Austria 02/2026

Self-Healing Best Practices

  • Start with low-risk automations
  • Use idempotent playbooks (if possible)
  • Add guardrails (approvals, maintenance windows,
    downtimes)
  • Emit metrics and logs for auditing
Ansible Anwendertreffen Austria 02/2026

Challenges with Self-healing

  • πŸ”Š Triggering on noisy events (missing filtering)
  • πŸ§ͺ Insufficient monitoring coverage
  • 🧯 Healing the wrong host (issue caused by a backend dependency)
  • πŸ“š Lack of knowledge or rulebooks
  • πŸ•’ Triggering during maintenance windows due to missing downtime
Ansible Anwendertreffen Austria 02/2026

Event-Driven Ansible Use Cases

  • 🚨 Monitoring alerts: run remediation playbooks
  • πŸ—οΈ Infrastructure events: auto-scale or restart services
  • πŸ” Security findings: isolate hosts or rotate credentials
  • 🎫 Ticketing: enrich and open incidents automatically
  • πŸ“˜ Documentation: update asset database or documentation system
Ansible Anwendertreffen Austria 02/2026

Additional Information

Ansible Anwendertreffen Austria 02/2026

Summary

  • πŸ›°οΈ Checkmk monitors your IT landscape and notifies on state change
  • ⚑ EDA turns these events into real-time automation
  • 🀝 It complements traditional Ansible by reacting instead of scheduling
  • 🎯 Start small, measure impact, and iterate
Ansible Anwendertreffen Austria 02/2026

Thank you!

RenΓ© Koch
Freelancer
Ansible Anwendertreffen Austria 18.02.2026

Slides: https://urlr.me/zCPcuK

Ansible Anwendertreffen Austria 02/2026