---
title: Self-hosted runners
description: Monitor the self-hosted CI runners behind your pipelines so you can spot capacity bottlenecks, degraded machines, and wasted spend.
---

Self-hosted runners are the machines that execute your CI. Mergify doesn't provision
or control them. It watches the jobs they run and turns that data into metrics about
capacity, performance, cost, and reliability.

That fleet is infrastructure you own and pay for, but it usually runs as a black
box: you only hear about it once builds start queuing or failing. These metrics let
you base scaling and reliability decisions on what the fleet actually does.

:::note
  Self-hosted runners are part of CI Insights. Enable CI Insights on your repository
  first. See the [setup guides](/ci-insights) to get started.
:::

## Why monitor your runners

The metrics answer a few recurring questions:

- **Is the fleet the right size?** Idle runners mean you pay for capacity you don't
  use. Saturated runners mean jobs wait. Both are expensive, in different ways.

- **Is a runner degraded?** One slow or flaky machine drags down every job scheduled
  on it, often without an obvious failure.

- **Where is CI time going?** Time spent *waiting* for a runner and time spent
  *running* on it are different problems with different fixes.

- **What is the fleet costing?** Cost per runner and per job shows where spend
  concentrates.

## What Mergify tracks

Mergify groups these metrics into three areas.

### Queue: is work waiting for a runner?

The queue tracks how long jobs wait to get a runner and how many are waiting,
grouped by the labels they request. Read the two signals together: rising wait
times with steady demand point to a capacity problem, so add runners; a spike in
queued jobs points to a demand surge instead. The aim is to keep runners ahead of
demand, before queue time starts delaying pull requests.

### Fleet: how is each runner performing?

The fleet is your runners seen one at a time: throughput, speed, success rate, and
how each runner compares to its peers. Each runner carries a health status, so an
underperforming or unstable one is easy to single out. You can also follow each
runner's trends over time and see how heavily it is being used.

### Settings: what does Mergify monitor?

You decide which runner groups and labels Mergify watches. Because metrics are
aggregated by runner group, this is also how you scope monitoring to the runners that
matter.

## Key concepts

- **Runner groups and labels.** Runners are organized into groups, and labels (the
  `runs-on` values a job requests) identify which kind of runner handled the work.
  Mergify uses the group to scope what it monitors and to compare each runner against
  its peers.

- **Long-lived runners only.** This page tracks persistent runners. Ephemeral runners
  that exist for a single job aren't tracked here; use the [Jobs](/ci-insights/jobs)
  page for those.

- **Wait time vs. run time.** Wait time is how long a job sits before a runner picks
  it up; run time is how long it executes once started.

- **Health status.** Derived from a runner's success rate and its speed relative to
  its group, so machines that need attention surface on their own.

:::note
  GitHub-hosted runners aren't monitored here. They use a new identity on every run,
  which makes per-runner metrics unreliable.
:::
