MTTD - Mean Time to Detect is the average time between the moment an incident begins and the moment your team becomes aware of it. It measures the gap between failure onset and the first alert or human recognition of the problem. A short MTTD means your monitoring is doing its job. A long one means users were already hurting before anyone on your team noticed.
MTTD sits at the very start of the incident response chain, before acknowledgment, investigation, or repair can begin. You cannot respond to something you have not detected. That ordering matters because every minute of undetected failure translates directly into user impact, potential data exposure, and revenue loss.
What counts as a good MTTD depends on system criticality. For high-availability production systems, anything above 5 minutes is worth investigating. Teams with mature observability stacks typically measure MTTD in seconds to low minutes. MTTD measured in hours almost always points to a gap in alerting coverage or alert fatigue, too many low-fidelity signals overwhelming the on-call queue.
The formula is straightforward:
MTTD = Total detection delay across all incidents / Number of incidents in the period
Detection delay per incident is the time elapsed from when the incident began, the first anomaly timestamp, the first failing health check, or the deployment that introduced the fault, to when your team received an actionable alert or confirmed awareness of the problem.
Three inputs shape this calculation: the incident start timestamp (from your deployment events, logs, or monitoring system), the detection timestamp (from your alerting tool, PagerDuty, Opsgenie, or similar), and the number of incidents in your measurement window. A rolling 30-day average is the most common reporting period for MTTD.
Teams sometimes debate where the detection clock stops. The clearest definition: detection is confirmed when an engineer acknowledges the alert or opens an investigation. If your alerting tool fires but nobody sees it for 40 minutes, those 40 minutes count.
Hivel does not surface MTTD as a standalone metric. Hivel is built around code-level incident indicators, specifically MTTR (the average cycle time of hotfix PRs from first commit to merge) and Change Failure Rate. These measure what happens after an incident is detected.
To approximate MTTD within Hivel, use the Incident API to pass a reported_time field (your detection timestamp) alongside each incident's resolution_time.
Trending reported_time relative to deployment timestamps in your Quality dashboard reveals your detection lag pattern over time. The Pull Request screen surfaces MTTR alongside CFR so engineering leaders can see how fast incidents are resolved once detected.
How to approximate MTTD in Hivel
1. Navigate to the Pull Request screen. Filter by team, time frame, and click the MTTR tile.
2. Use the Incident API to pass reported_time (detection timestamp) and resolution_time.
3. Compare the reported_time value against your deployment timestamps to calculate detection lag per incident.
4. Trend this over time alongside MTTR and CFR in the Quality dashboard.
See how Hivel tracks incident metrics across your engineering org →
MTTD and MTTR are often reported together and just as often confused. The distinction is operationally significant. MTTD measures how fast your team spots the problem. MTTR measures how fast your team fixes it. Both metrics are components of total incident duration, but they point to completely different failure modes.
A team with low MTTD and high MTTR has strong detection but a slow resolution process. A team with high MTTD and low MTTR is fast at fixing problems once they know about them, but users bear the cost of that detection gap. Improving both is the goal, but they require different interventions.
Every moment a production incident goes undetected is a moment your users are working around a problem you do not know about yet. The cost of that gap compounds. A database timeout that goes undetected for 20 minutes affects far more users than one caught in the first 90 seconds. Detection speed is not a vanity metric; it is a direct proxy for how much impact an incident actually delivers.
MTTD also drives the economics of on-call work. When detection is slow, on-call engineers spend their shifts reacting to user-reported issues rather than catching problems early. That reactive posture is exhausting and expensive. Teams that invest in observability coverage, alert tuning, and automated anomaly detection consistently see MTTD drop, and engineer satisfaction improve alongside it.
Beyond operational reliability, MTTD has a security dimension. Undetected incidents give attackers dwell time. A detection window measured in hours means the exposure risk that a detection window measured in minutes prevents. Platforms like Hivel surface incident patterns alongside cycle time and code quality metrics so engineering leaders can see whether their delivery speed is outpacing their detection coverage.



We'll show you exactly how AI is impacting your speed and code quality.


