View as Markdown

Self⁠-⁠hosted runners

Monitor the self-hosted CI runners behind your pipelines so you can spot capacity bottlenecks, degraded machines, and wasted spend.


Self-hosted runners are the machines that execute your CI. Mergify doesn’t provision or control them. It watches the jobs they run and turns that data into metrics about capacity, performance, cost, and reliability.

That fleet is infrastructure you own and pay for, but it usually runs as a black box: you only hear about it once builds start queuing or failing. These metrics let you base scaling and reliability decisions on what the fleet actually does.

The metrics answer a few recurring questions:

  • Is the fleet the right size? Idle runners mean you pay for capacity you don’t use. Saturated runners mean jobs wait. Both are expensive, in different ways.

  • Is a runner degraded? One slow or flaky machine drags down every job scheduled on it, often without an obvious failure.

  • Where is CI time going? Time spent waiting for a runner and time spent running on it are different problems with different fixes.

  • What is the fleet costing? Cost per runner and per job shows where spend concentrates.

Mergify groups these metrics into three areas.

Queue: is work waiting for a runner?

Section titled Queue: is work waiting for a runner?

The queue tracks how long jobs wait to get a runner and how many are waiting, grouped by the labels they request. Read the two signals together: rising wait times with steady demand point to a capacity problem, so add runners; a spike in queued jobs points to a demand surge instead. The aim is to keep runners ahead of demand, before queue time starts delaying pull requests.

Fleet: how is each runner performing?

Section titled Fleet: how is each runner performing?

The fleet is your runners seen one at a time: throughput, speed, success rate, and how each runner compares to its peers. Each runner carries a health status, so an underperforming or unstable one is easy to single out. You can also follow each runner’s trends over time and see how heavily it is being used.

Settings: what does Mergify monitor?

Section titled Settings: what does Mergify monitor?

You decide which runner groups and labels Mergify watches. Because metrics are aggregated by runner group, this is also how you scope monitoring to the runners that matter.

  • Runner groups and labels. Runners are organized into groups, and labels (the runs-on values a job requests) identify which kind of runner handled the work. Mergify uses the group to scope what it monitors and to compare each runner against its peers.

  • Long-lived runners only. This page tracks persistent runners. Ephemeral runners that exist for a single job aren’t tracked here; use the Jobs page for those.

  • Wait time vs. run time. Wait time is how long a job sits before a runner picks it up; run time is how long it executes once started.

  • Health status. Derived from a runner’s success rate and its speed relative to its group, so machines that need attention surface on their own.

Was this page helpful?