Name: Hivel - Software Engineering Productivity Tool
Brand: Hivel
Rating: 4.8 (70 reviews)

TABLE OF CONTENTS

What Does DORA Mean?

Why DevOps Teams Use DORA Metrics?

Why DORA Metrics Exist: The Problem They Solve?

How DORA Research Redefined Software Performance

The Four Core DORA Metrics Explained

The DORA Performance Levels: How Teams are Categorized by Delivery Performance

The AI Era: Why DORA Metrics Matter More Than Ever

The New DORA AI Capabilities Model (2025)

How Engineering Leaders Use DORA Metrics in Practice

Tools That Help Track DORA Metrics

What Does DORA Mean?

DORA stands for DevOps Research and Assessment.

It began as a research program that studied how high-performing engineering teams build, test, and deliver software. The research later became part of Google Cloud, but its findings are used across the entire software industry.

The program analyzed thousands of engineering teams to understand what actually makes software delivery fast, reliable, and stable.

From that research came a set of four DevOps performance metrics that are now widely known as DORA metrics.

DORA metrics are four key measurements used to evaluate software delivery performance.

They help organizations understand how efficiently and reliably their engineering teams ship software.

The four DORA DevOps metrics are:

Deployment Frequency - how often code is released to production
Lead Time for Changes - how long it takes for code changes to reach production
Change Failure Rate - the percentage of deployments that cause failures or incidents
Mean Time to Recovery (MTTR) - how quickly teams restore service after a failure

Together, these DevOps metrics measure two critical aspects of engineering performance: delivery speed and system stability.

‍

Why DevOps Teams Use DORA Metrics?

DevOps teams use DORA metrics because they offer a very clear and objective way to identify and measure software delivery performance.

While traditional engineering metrics often focus on activity, such as lines of code written or the number of commits, which do not show whether software is delivered quickly or safely, these DevOps metrics focus on real delivery outcomes, including…

How frequently teams ship updates
How quickly changes move through the pipeline
How stable releases are in production
How fast teams recover from failures

With these insights, engineering leaders can easily comprehend whether their development process is actually improving delivery performance.

How DORA Became the Standard for Measuring Software Delivery Performance

These DevOps performance metrics became the industry standard because they are based on large-scale research and real engineering data.

The research behind DORA was published in the widely cited Accelerate State of DevOps reports, which analyzed thousands of engineering teams across different industries.

The findings consistently showed that teams with strong DORA performance tend to:

Release software more frequently
Maintain more stable systems
Recover from incidents faster
Deliver better product outcomes

Because of this strong link between engineering practices and business performance, DORA metrics are now widely used by:

DevOps teams
Platform engineering teams
CTOs and engineering leaders
Startups and large enterprises

Today, many engineering platforms and DevOps tools include built-in dashboards for tracking DORA metrics.

Why DORA Metrics Exist: The Problem They Solve?

Nathan Harvey, who leads the DORA program at Google Cloud, has spent over a decade helping teams confront this belief with data. He shared his conclusion in one of his YouTube Talks,"Throughput and stability tend to move together. You're either fast and stable, or you're slow and unstable."

That's a finding backed by over 12 years of research, across tens of thousands of engineers, in companies of every size, in every industry imaginable. And it raises an uncomfortable follow-up question: if speed and stability actually go together, why were so many engineering teams experiencing exactly the opposite?

The answer lies in what they were measuring. And more importantly, what they weren't.

Engineering Leaders Didn’t Start with DORA Metrics

Before DORA became popular, most engineering organizations used productivity-style metrics to evaluate developers and teams.

Some of the most common ones were:

Lines of code written
Number of commits
Story points completed
Tickets closed
Hours spent on development

At first glance, these metrics seemed logical. More output should mean more progress.

But software engineering does not work that way.

A developer can write thousands of lines of code and still slow down a system.

A team can close dozens of tickets and still delay product delivery.

These metrics measured developer activity, not software delivery performance.

Why Traditional Vanity Engineering Metrics Failed?

Traditional metrics created several problems for engineering teams.

1. They rewarded the wrong behavior

When teams are judged by lines of code or ticket counts, they often focus on doing more work instead of delivering better outcomes.

More code does not always mean better software. In many cases, the best solution is actually less code.

2. They ignored system stability

Older metrics rarely tracked what happened after software reached production.

A team could ship features quickly, but if those releases cause frequent outages that require hotfixes, the organization still suffers.

Delivery speed without stability creates chaos.

3. They failed to measure the full delivery pipeline

Software delivery is not just about writing code.

It includes:

Code reviews
Testing
Integration
Deployment
Monitoring
Incident response

Traditional metrics focused only on the development phase, ignoring the rest of the pipeline.

However, with multiple metrics spanning across the engineering pipeline, it becomes crucial to identify whether you are tracking a real metric or a vanity metric.

In this Reddit thread, a Redditor makes a very thought-provoking point. He says, “Vanity metrics are anything you can't tie to a specific decision or action. Lines of code? Vanity. Even deployment frequency doesn't matter if you can't connect it to 'we're slow because of X bottleneck.”

The Rise of DevOps and Continuous Delivery

Around the early 2010s, the software industry started moving toward DevOps and continuous delivery practices.

Teams began adopting:

Automated testing
Continuous integration
Continuous deployment
Infrastructure automation
Smaller and more frequent releases

Instead of large releases every few months, teams began shipping small changes frequently.

But this new model created a new challenge: How do you measure performance in a fast-moving delivery pipeline?

Traditional metrics were no longer useful.

Engineering leaders needed a way to measure both speed and reliability at the same time.

How DORA Research Redefined Software Performance

This is where the DORA research program changed the conversation.

Starting in 2014, Dr. Nicole Forsgren led a research effort that would eventually become the most comprehensive study of software delivery performance ever conducted.

The annual State of DevOps Report surveyed tens of thousands of professionals - developers, ops engineers, product managers, CTOs - and asked one core question:

What does it actually look like when software teams perform well?

The findings were clear. High-performing teams shared four measurable traits. They deployed frequently. Their lead time was short. Their failure rate was low. And when things broke, they recovered fast.

The research also made a direct connection between software delivery performance and business outcomes. Teams that scored high on DORA or DevOps metrics were more likely to hit their revenue targets, achieve their reliability goals, and outperform competitors.

The Shift That Changed Everything: Activity → Outcomes

The biggest change DORA introduced was a new way of thinking about engineering performance.

Instead of asking:

“How busy are our developers?”

The focus shifted to:

How fast can we deliver improvements to users?
How stable are our production systems?
How quickly can we recover from failures?

This shift moved the industry away from activity-based metrics and toward outcome-based metrics. And that changed how engineering teams evaluate success.

Today, the best software teams optimize for delivering reliable software faster. DORA metrics simply provide the clearest way to measure that.

The Four Core DORA Metrics Explained

DORA metrics are built around four simple measurements. Together, these DevOps performance metrics show how well a team delivers software to users.

Each metric looks at a different part of the delivery process. Some measure speed, while others measure stability.

This balance is important. A team that moves fast but constantly breaks production is not high-performing. At the same time, a team that is extremely stable but releases updates once every few months is also not effective.

The four DORA metrics help teams understand both sides of software delivery.

1) DORA Deployment Frequency

Deployment Frequency measures how often a team releases code to production.

It answers a simple question: How frequently are we delivering changes to users?

High-performing teams release updates regularly. In many modern organizations, deployments can happen multiple times per day.

Frequent deployments usually indicate:

Smaller code changes
Automated testing and pipelines
Strong DevOps practices
Lower risk releases

When deployments happen rarely, it often signals bottlenecks in the delivery pipeline.

Large releases also tend to increase risk because many changes are shipped at once.

Deployment Frequency helps teams see whether their delivery process encourages small, safe, and frequent releases.

2) DORA Lead Time for Changes

Lead Time for Changes measures how long it takes for a code change to reach production.

The timer typically starts when code is committed to the repository and ends when that change is successfully deployed.

This metric shows how quickly ideas turn into real product improvements.

A short lead time usually means:

Efficient code reviews
Automated testing
Smooth CI/CD pipelines
Minimal handoffs between teams

A long lead time often points to delays somewhere in the system, such as:

Manual testing steps
Slow approval processes
Complex deployment procedures

Lead Time for Changes helps organizations understand how fast their delivery pipeline actually moves.

3) DORA Change Failure Rate

Change Failure Rate measures how often deployments cause problems in production.

These problems might include:

Service outages
Bugs that affect users
Degraded system performance
Emergency rollbacks or hotfixes

The metric is calculated as the percentage of deployments that lead to failures requiring remediation.

For example, if a team deploys 100 times and 10 releases cause incidents, the change failure rate is 10%.

This metric focuses on release stability. Even if a team deploys frequently, a high failure rate means the system is fragile and risky.

Strong engineering teams aim to keep this number low while still delivering updates quickly.

4) DORA Mean Time to Recovery (MTTR)

MTTR measures how quickly teams restore service after a production failure.

No software system is perfect. Failures happen even in the best engineering environments.

What separates high-performing teams is how fast they recover.

MTTR tracks the average time it takes to:

Detect an issue
Diagnose the cause
Fix the problem
Restore normal service

A shorter recovery time usually indicates:

Strong monitoring and observability
Clear incident response processes
Experienced engineering teams
Well-designed rollback strategies

Organizations with fast MTTR minimize the impact of failures on customers and business operations.

The real strength of DORA metrics is that they work together as a balanced system.

Two metrics measure speed:

Deployment Frequency
Lead Time for Changes

Two metrics measure stability:

Change Failure Rate
Mean Time to Recovery (MTTR)

When teams improve both speed and stability at the same time, they achieve what every engineering organization wants: fast, reliable software delivery.

Deployment Frequency: How Often You Deliver Value

One of the clearest signals of a healthy engineering team is how often they deliver updates to users.

Software only creates value when it reaches production. Until then, it is just code sitting in a repository.

Deployment Frequency measures how regularly teams push new changes into production. It shows whether a team can move from development to delivery smoothly.

When Deployment Frequency increases, teams can ship improvements faster, fix issues earlier, and respond to user feedback quickly.

What is Deployment Frequency?

Deployment Frequency measures how often a team successfully releases code to production.

A deployment can include:

New features
Bug fixes
Performance improvements
Security updates
Small configuration changes

The metric focuses on completed deployments, not builds or commits.

The question it answers is simple: How frequently are users receiving updates to the product?

A higher deployment frequency usually means the delivery pipeline is automated, reliable, and efficient.

How High-Performing Teams Treat Deployment Frequency

High-performing engineering teams treat deployments as a routine activity, not a stressful event.

Instead of releasing large batches of code every few weeks or months, they ship small changes continuously.

This approach offers several advantages:

Small changes are easier to review
Testing becomes faster
Rollbacks become safer
Risks are reduced because each deployment contains fewer changes

As a result, deployments become predictable, which is exactly what mature engineering teams aim for.

Many modern companies deploy:

Several times per day
Multiple times per week
Automatically through CI/CD pipelines

What Slows Deployment Frequency?

When deployment frequency is low, it usually means something in the delivery pipeline is slowing teams down.

Common causes include:

Large release batches - When teams bundle many changes into a single release, deployments become risky and difficult to manage.
Manual testing processes - If testing requires extensive manual effort, releases naturally become slower.
Complex approval processes - Too many review layers can delay deployments even when code is ready.
Fragile infrastructure - If deployments frequently break production systems, teams become cautious and deploy less often.
Lack of automation - Without automated builds, tests, and deployment pipelines, releasing software becomes time-consuming.

Improving deployment frequency often requires removing friction from the delivery pipeline.

Example Benchmarks of Deployment Frequency from DORA Research

DORA research categorized software teams into performance groups based on how frequently they deploy.

Performance Level

Deployment Frequency

Elite

On demand (multiple deploys per day)

High

Between once per day and once per week

Medium

Between once per week and once per month

Low

Between once per month and once every six months

Lead Time for Changes: The Speed of Your Delivery Pipeline

Here's a question that seems simple but stops most engineering leaders cold.

Right now, if one of your engineers committed a one-line bug fix, how long would it take to reach production?

Lead Time for Changes is DORA's answer to that question.

What is Lead Time for Changes?

Lead Time for Changes measures the time between a code commit and that change running successfully in production.

The clock typically starts when a developer commits code to the repository and ends when that change is deployed.

This metric reflects the efficiency of the entire delivery pipeline, not just development speed.

A short Lead Time for Changes means code moves through the system smoothly. A long Lead Time for Changes suggests delays or bottlenecks, somewhere between development and deployment.

Stages That Impact Lead Time for Changes

A code change usually passes through multiple steps before it reaches production. Each step affects the overall Lead Time for Changes.

Common stages include:

Code commit - A developer pushes code to the repository
Code review - Team members review the change to ensure quality and maintainability
Automated builds - The system compiles the code and prepares it for testing
Testing - Unit tests, integration tests, and other checks validate the change
Approval or merge process - The change is approved and merged into the main branch
Deployment pipeline - The code is packaged and deployed to production

If any of these stages become slow or manual, Lead Time for Changes increases.

The goal is to make each stage fast, predictable, and automated.

Why Code Reviews and Approvals Become Bottlenecks and Affect Lead Time for Changes

Code reviews are important for maintaining quality, but when they take too long, they delay delivery.

Some common issues include:

Reviewers overloaded with too many PRs
Large (with more than 400 LoC) PRs that are hard to review quickly
Unclear ownership of code changes
Waiting for senior engineers to approve merges

Similarly, approval processes can introduce delays when multiple layers of sign-off are required before deployment.

These bottlenecks increase Lead Time for Changes even when the code itself is ready.

High-performing teams address this by keeping pull requests small and making reviews fast and collaborative.

How Elite Teams Reduce Lead Time for Changes

Elite engineering teams treat Lead Time for Changes as a system design problem, rather than a development issue.

They reduce it by improving the entire delivery pipeline.

Common practices include:

Small and frequent code changes - Smaller changes move through reviews and testing much faster
Strong automation - Automated testing and CI pipelines remove manual steps.
Continuous integration - Code is merged and validated frequently, reducing integration delays
Clear code ownership - Teams know exactly who reviews and approves changes
Efficient deployment pipelines - Infrastructure and deployment tools are optimized for speed and reliability
PR hygiene - Strong PR hygiene practice in place to make PR reviewable in less time
AI agents for code review - They deploy AI agents that carry out the first level of code review, saving significant time for senior developers

When these practices are in place, Lead Time for Changes can drop dramatically. In elite teams, changes often reach production within hours instead of days or weeks.

Change Failure Rate: The Stability of Your Releases

Shipping software quickly is valuable. But if releases constantly break production, speed becomes a problem.

That is why DORA does not measure delivery speed alone. It also measures how stable those releases are.

What is Change Failure Rate?

Change Failure Rate measures the percentage of deployments that cause a failure in production.

A failure can include situations such as:

System outages
Bugs affecting users
Performance degradation
Security issues
Deployments that require rollback or hotfixes

The metric is calculated by dividing the number of failed deployments by the total number of deployments.

For example:

If a team deploys 100 times and 10 deployments cause incidents, the change failure rate is 10%.

Lower percentages indicate more stable releases.

Why Faster Teams Sometimes Break More Things

Many teams assume that deploying faster will automatically increase failure rates. And sometimes that does happen.

When organizations start accelerating delivery, they may initially see more incidents. This often occurs because:

Testing processes are weak
Monitoring systems are limited
Deployment pipelines are immature
Teams lack strong rollback mechanisms
Lack of an AI code governance framework, letting AI defects pass review systems

When releases become more frequent without improving these systems, failures become more visible.

However, this does not mean fast delivery is the problem. It simply reveals weaknesses in the delivery system that were already present.

Why Elite Teams Actually Reduce Failure While Increasing Speed

One of the most surprising findings from DORA research is this: The fastest teams also tend to have the most stable releases.

At first glance, this seems counterintuitive. But the reason becomes clear when you look at how elite teams operate.

They focus on small, incremental changes instead of large releases.

Smaller deployments offer multiple advantages:

Problems are easier to identify
Rollbacks are faster
Testing becomes simpler
The blast radius of failures is smaller

Elite teams also invest heavily in:

Automated testing and AI agent-powered code reviews
Strong monitoring and observability
Feature flags and safe release techniques
Automated rollback mechanisms
AI code governance framework

These practices reduce the risk associated with each deployment. Over time, the system becomes both faster and more stable.

This is why high-performing engineering teams do not choose between speed and quality. They design their systems to achieve both at the same time.

Mean Time to Recovery (MTTR): How Fast Teams Fix Production Issues

No software system is perfect. Even the best engineering teams experience production incidents.

A deployment may introduce a bug. A dependency may fail. Infrastructure may behave unexpectedly.

What separates strong engineering teams from struggling ones is how quickly they recover.

What is Mean Time to Recovery?

Mean Time to Recovery (MTTR) measures the average time it takes to restore service after a production incident.

The timer usually starts when the failure begins or is detected, and ends when the system is fully functioning again.

Recovery may involve actions such as:

Rolling back a faulty deployment
Fixing a production bug
Restarting or scaling infrastructure
Applying a hotfix or patch
Resolving configuration issues

For example:

If a system outage lasts 40 minutes, and another incident lasts 20 minutes, the average recovery time becomes 30 minutes.

Lower MTTR indicates that teams can detect, diagnose, and resolve incidents quickly.

The Importance of Observability and Rollback in Reducing MTTR

Fast recovery depends heavily on two capabilities: observability and rollback mechanisms.

Teams must first detect problems before they can fix them.

Modern engineering teams rely on strong observability systems that include:

Monitoring dashboards
Alerting systems
Log analysis
Distributed tracing

These tools help engineers quickly identify where a problem exists and what caused it.

Without visibility into the system, diagnosing failures becomes slow and difficult.

Once the problem is identified, teams need a safe way to restore the system.

Many high-performing teams rely on strategies such as:

Automated rollback of faulty deployments
Feature flags to disable problematic features
Canary releases to limit the impact of new changes
Blue-green deployments to switch traffic safely

These mechanisms allow teams to restore service quickly without complex manual intervention.

Why MTTR Reflects Operational Maturity

MTTR is not just a technical metric. It reflects the overall operational maturity of an engineering organization.

Teams with low MTTR usually have:

Strong monitoring systems
Well-defined incident response processes
Clear communication during outages
Experienced engineers who understand the system deeply
Automated recovery and rollback procedures

In contrast, organizations with high MTTR often struggle with:

Delayed incident detection
Unclear ownership during outages
Manual troubleshooting processes
Lack of recovery automation

In many ways, MTTR measures how prepared a team is when things go wrong.

Don't get confused between Mean Time to Recovery and…

MTBF (Mean Time Between Failures) - The average time a system operates without failure.
MTTR (Mean Time to Repair) - Sometimes used interchangeably with recovery, but it specifically refers to the time required to fix the underlying issue rather than just restoring service temporarily.

MTTR in the Real World: How Engineering Teams at Netflix and Amazon Do It

Netflix built Chaos Monkey - a tool that randomly kills production servers during business hours. On purpose. The thinking: if your system can't survive a random server dying on a Tuesday afternoon, it has no business being in production.

By forcing failures constantly, engineers had no choice but to build systems that recovered automatically. The result? Netflix now maintains only a few minutes of downtime per year - at a scale of hundreds of millions of users. (Source)

Amazon loses an estimated $400,000 per minute during outages. At that price, waiting for a human to notice something broke is not an option.

Their solution: design recovery out of human hands entirely. Using AWS load balancers with 5-second health checks, their systems detect failure, make a decision, and reroute traffic - automatically - in under 10 seconds. (Source: AWS Whitepaper: Availability and Beyond)

The DORA Performance Levels: How Teams are Categorized by Delivery Performance

DORA research groups engineering teams into performance levels based on how they perform across the four core DevOps metrics. These levels help organizations understand where their delivery performance stands and what improvements are needed.

Why DORA Metrics Work (And Why They Became the Industry Standard)

In late 2025, Sudheer, CEO of Hivel, hosted Benjamin Good, co-author of the DORA report, on a very insightful webinar. In that webinar, Ben made a very good point around DORA’s importance: “DORA research looks at the capabilities and practices that result in high-performing teams and organizations when it comes to delivering software.”

Many engineering metrics have come and gone. Teams experimented with measuring productivity through commits, lines of code, and story points. But most of those signals failed to capture what truly matters in software delivery.

DORA metrics stood the test of time because they measure how software systems actually perform in the real world.

Instead of tracking developer activity, these DevOps performance metrics focus on how quickly and safely teams deliver working software to users.

This shift made DORA metrics one of the most trusted frameworks for evaluating engineering performance.

System-Level Measurement

One reason these DevOps metrics work so well is that they measure the entire delivery system, not just individual developer activity.

Traditional metrics often focus on only one part of engineering. For example, commit counts only measure development activity.

DORA metrics look at the end-to-end system. They measure how efficiently code moves from development to production and how reliably that system operates.

This makes them much more useful for engineering leaders who want to understand how well the entire delivery pipeline works.

Balance Between Speed and Stability

For a long time, engineering teams believed they had to choose between moving fast and keeping systems stable.

If a team deployed frequently, leaders assumed incidents would increase. If a team focused heavily on stability, releases usually slowed down.

DORA research challenged this assumption.

The data showed that the best engineering organizations do not trade speed for reliability. Instead, they design systems that improve both together.

When teams only optimize for speed, systems become fragile. When teams optimize only for stability, innovation slows down. DORA metrics prevent both extremes.

Focus on Outcomes Instead of Output

Perhaps the biggest reason DORA metrics work is that they measure outcomes instead of output.

Older engineering metrics often tracked how busy developers were. But writing more code or completing more tasks does not necessarily improve a product.

DORA metrics focus on the outcomes that matter most:

How quickly users receive improvements
How stable the system remains after changes
How quickly teams recover from failures

These outcomes reflect the real impact of engineering work.

By focusing on delivery results instead of developer activity, DORA metrics help organizations align engineering performance with product value and customer experience.

Situation in an Engineering Team

What Happens Without DORA

What Happens With DORA

Measuring developer productivity

Teams track commits, tickets closed, or lines of code. Activity looks high, but the real delivery speed is unclear.

Teams measure deployments, lead time, failures, and recovery.

Releasing software

Releases happen in large batches and often feel risky. Teams deploy less frequently.

Teams ship smaller changes more often because deployment performance is visible and measurable.

Handling production issues

Incidents are analyzed only after major failures. Recovery processes are often slow.

Teams track failure rate and recovery time, making incident response faster and more structured.

Engineering performance discussions

Conversations are subjective and based on opinions.

Conversations become data-driven, focused on improving the delivery system.

The AI Era: Why DORA Metrics Matter More Than Ever

Something interesting is happening in engineering right now.

Developers are writing code faster than ever before. AI coding assistants are everywhere - in IDEs, in code review tools, in terminals, and in browsers.

As per the Stack Overflow Developer Survey, 84% of respondents are using AI tools. That number isn't a trend anymore. It's the baseline.

But the on-ground situation is different. Code is being generated more quickly. Pull requests are being opened more frequently. But lead times? deployment frequency? change failure rates? For many teams, those numbers haven't moved the way the AI productivity narrative would suggest they should.

Thus, when someone asks engineering leaders whether their teams are delivering faster, the answer gets complicated.

Considering the gravity of the situation, the DORA Research team kept the entire theme of their most recent report around AI. DORA’s 2025 AI Capabilities Model Report reveals a critical truth: AI’s primary role in software development is to amplify. It magnifies the

strengths of high-performing organizations and the dysfunctions of struggling ones.

The Major Shift Introduced in DORA 2025 Research

Instead of only studying DevOps practices, the research now looks at how AI affects engineering teams, delivery pipelines, and system stability.

The report introduces AI as an amplifier. The following image depicts capabilities that amplify the effect of AI adoption on specific outcomes. This is the DORA AI Capabilities Model.

What it further reveals is that AI is accelerating coding, but not delivery. Developers can generate boilerplate code, write functions, or explore solutions much faster than before. However, software delivery includes many stages beyond writing code.

If these parts of the system remain slow, the overall delivery speed does not change much.

In many teams, AI simply moves the bottleneck from coding to other stages, especially code reviews and validation.

The New Leadership Challenge

This creates a new challenge for engineering leaders.

For years, the focus was on improving developer productivity. AI is now solving part of that problem.

But the bigger question has changed.

If developers can write code faster than ever, why isn’t software reaching production faster?

The answer usually lies in the delivery system itself. And that is exactly what DORA metrics were designed to measure.

By tracking deployment frequency, lead time, failure rates, and recovery time, organizations can understand whether AI is actually improving delivery performance or just accelerating code generation.

‍

DORA Metric

What It Shows in the AI Era

Deployment Frequency

Whether faster coding actually leads to more releases

Lead Time for Changes

Whether AI-written code moves faster through the delivery pipeline

Change Failure Rate

Whether AI-generated code increases or reduces release failures

Mean Time to Recovery

How quickly teams fix issues may be caused by AI

So, it’s needless to say that DORA metrics help leaders see whether AI is improving software delivery or simply making code generation faster. And that’s why DORA is still relevant in AI-augmented development practices.

The Hidden Risk of AI-Assisted Development

DORA 2025 research highlights an important nuance.

Writing code faster does not automatically mean delivering better software.

In many organizations, AI accelerates the creation of code, but the rest of the delivery system - reviews, testing, validation, and deployment - does not speed up at the same rate.

This mismatch introduces a new kind of risk.

In fact, some large technology companies are already responding to these challenges. After several coding incidents involving automated development tools, Amazon orders a 90-day reset after code mishaps cause millions of lost orders.

The move reflects a growing industry realization: faster code generation must be balanced with stronger safeguards in the delivery pipeline.

More Code Does Not Mean Better Delivery

DORA's 2025 research found that higher AI adoption is associated with increased deployment instability. Teams generating more code with AI tools are, in many cases, experiencing more failures in production, not fewer.

More code means more surface area for bugs. More PRs mean more review load on the same number of senior engineers. More changes batching up against the same deployment windows means bigger, riskier releases. The system gets congested - and congested systems fail more often and recover more slowly.

AI didn't create any of these problems. But it turbocharged them. This leads to the Throughput Trap!

Systems that aren't optimized for AI can actually reduce delivery throughput over time when there is increased AI adoption.

Think about what happens to a delivery pipeline when the volume of code going into it suddenly doubles. Every stage downstream from code generation now has twice the work to process.

AI Productivity Paradox for Developers

When an individual developer can write a function in five minutes that used to take an hour, that feels like a genuine, meaningful productivity gain.

But individual productivity and system-level delivery performance are different things. The gap just wasn't this visible before AI made individual productivity gains so dramatic.

However, several developers acknowledge that AI is infusing artificial productivity gain by making them feel productive for the time being. In this Reddit thread, one developer shares his frustration with the idea of being productive by using AI.

In many cases, the time saved during coding is later spent reviewing, validating, or fixing AI-generated logic. Developers may write more code in less time, but they also need to spend additional effort ensuring the generated code actually works as intended.

This creates a subtle illusion of progress - developers feel faster, while the overall delivery pipeline may remain unchanged.

The New Measurement Problem for Engineering Leaders

In the past, it was easier to interpret productivity signals. More commits or faster coding usually meant work was progressing.

With AI, those signals can become misleading.

A developer may generate large amounts of code quickly, yet the system may take longer to review, test, and deploy that code safely.

This creates a new leadership question: Are teams actually delivering faster, or are they simply generating code faster?

This is precisely why DORA metrics matter more in the AI era. Deployment frequency, lead time for changes, change failure rate, and MTTR measure the system. They can't be fooled by how productive someone feels. They measure what the pipeline is actually producing.

If AI is genuinely improving your delivery performance, it will show in those numbers. If AI is generating more code in a congested pipeline while your change failure rate climbs and your lead time extends, that will show too.

The New DORA AI Capabilities Model (2025)

The DORA research program introduced a new framework to help organizations understand why AI improves performance in some teams but not in others.

This framework is called the DORA AI Capabilities Model.

Instead of focusing on AI tools themselves, the model highlights organizational and engineering capabilities.

The key insight from the research is simple: AI success depends far more on system design and engineering practices than on the tools themselves.

The Seven Capabilities That Amplify AI’s Impact

Capability

What It Means for Engineering Teams

Clear and communicatedAI stance

Ambiguity creates risk. A clear policy provides the psychological safety needed for effective experimentation.

Healthy data ecosystems

The benefits of AI are significantly amplified by high-quality, accessible, and unified internal data.

AI-accessible internal data

Connecting AI to your internal documentation and codebases moves it from a generic assistant to a specialized expert.

Strong version control practices

As AI increases the velocity of change, version control becomes the critical safety net that enables confidentexperimentation.

Working in small batches

This discipline counteracts the risk of AI generating large,unstable changes, ensuring that speed translates to better product performance.

User-centric focus

A focus on user needs is essential to ensure that AI-accelerated teams are moving quickly in the right direction.

Quality internal platforms

A platform provides automated, secure pathwaysthat allow AI’s benefits to scale across the organization.

Why These Capabilities Matter

The DORA research shows that AI acts as a multiplier.

When these capabilities exist, AI amplifies productivity, improves product outcomes, and increases organizational performance.

But when these foundations are weak, AI can amplify the opposite effects - introducing instability, increasing friction, and exposing weaknesses in the delivery system.

In other words, AI does not automatically make engineering teams high-performing.

The surrounding system determines whether AI becomes an accelerator or a source of new complexity.

Why AI Alone Won’t Improve DORA Metrics ?

One of the most important insights from the latest DORA research is that AI tools alone cannot improve software delivery performance.

AI can make developers faster at writing code. It can suggest implementations, generate tests, and reduce the time spent on repetitive tasks. At the level of an individual engineer, these gains can feel dramatic.

But software delivery performance is determined by how effectively an entire engineering system moves changes from development to production.

This is exactly what DORA metrics measure.

AI is an Amplifier of Systems, Not a Solution

The research repeatedly highlights a critical principle:

AI amplifies the system it operates in.

If an organization already has strong delivery practices - small batch changes, reliable CI/CD pipelines, strong testing, and well-structured platforms - AI accelerates those systems. Teams can deliver improvements faster and with less friction.

But if the delivery system has weaknesses, AI tends to magnify them.

Faster code generation may produce:

Larger pull requests that slow down reviews
More code paths that increase testing complexity
Higher volumes of changes that strain deployment pipelines

In this case, AI increases activity in the system without improving the system itself.

Where Many Organizations Struggle When it Comes to Improving DORA

The organizations that fail to see improvements in DORA metrics after adopting AI usually share a common set of gaps.

Missing Foundation

What Happens When AI is Added

Clear development workflows

AI speeds up coding, but changes still move slowly through reviews and approvals.

Strong internal developer platforms

Engineers generate more code but still struggle with infrastructure, tooling, and environment setup.

Reliable delivery practices

Deployment pipelines become bottlenecks as the volume of changes increases.

In these situations, AI produces localized productivity gains.

The System Still Determines the DORA Improvement

The DORA research ultimately reinforces a principle that has been consistent for more than a decade:

Engineering performance is determined by the system, not individual productivity.

AI improves how quickly developers can produce code. But the outcomes that matter - deployment frequency, lead time, stability, and recovery - depend on the health of the entire delivery pipeline.

When organizations strengthen that system, AI becomes a powerful accelerator.

When they do not, AI simply adds more velocity to the first step of the process while the rest of the system struggles to keep up.

And that is why, even in the age of AI-assisted development, DORA metrics remain essential for understanding real engineering performance.

Tip for Engineering Leaders

Don’t measure AI adoption in isolation.
Track metrics like AI-generated code share, review time, and rework rate, and correlate them with DORA metrics such as lead time and change failure rate.
This helps reveal whether AI is truly improving delivery - or simply increasing code volume

How Engineering Leaders Use DORA Metrics in Practice

Understanding DORA metrics is useful. But their real value appears when teams use them to continuously improve how software is delivered.

Many engineering organizations operationalize DORA metrics using a continuous improvement approach similar to the PDCA Cycle, a widely used framework in Lean and quality engineering.

PDCA Stage

How Engineering Teams Use DORA Metrics

Plan

Identify delivery problems such as slow releases or unstable deployments. Teams choose one metric to improve, such as lead time or change failure rate.

Implement changes such as improving CI/CD pipelines, reducing pull request size, or strengthening automated testing.

Check

Measure whether deployment frequency, lead time, or failure rates actually improved after the change.

Act

Standardize successful practices and move to the next bottleneck in the delivery pipeline.

This approach helps teams improve delivery systems incrementally instead of attempting large transformations all at once.

The following are the 4 major use cases of DORA for engineering leaders.

1) Monitoring Delivery Pipelines

Engineering leaders use DORA metrics as an operational dashboard for their delivery systems.

Signals such as deployment frequency and lead time help teams understand whether the pipeline is flowing smoothly or slowing down.

Sudden changes in these metrics often indicate friction in the system long before major delivery problems appear.

2) Identifying Bottlenecks

DORA metrics are also extremely effective at locating hidden bottlenecks.

For example:

Signal

Possible Bottleneck

Lead time increases

Code reviews or testing pipelines are slowing down delivery

Deployment frequency drops

Infrastructure or CI/CD pipeline friction

Change failure rate rises

Quality gaps in testing or release validation, especially after AI adoption

MTTR rises

Weak monitoring or incident response processes

3) Improving Review Cycles

Code review is one of the most common bottlenecks revealed by DORA metrics.

When lead time grows, teams often discover that pull requests are too large or reviewers are overloaded.

Engineering leaders improve review cycles by:

Encouraging smaller pull requests
Distributing review responsibilities
Automating repetitive checks
Strengthening test automation

These changes reduce review delays and improve delivery flow.

4) Tracking Improvement Over Time

The most powerful use of DORA metrics is tracking improvement trends over time.

Instead of focusing on benchmarks alone, high-performing teams observe how metrics evolve as their systems improve.

Over time, they expect to see patterns such as:

Lead time decreasing
Deployment frequency increasing
Failure rates declining
Recovery times improving

This turns DORA metrics into a continuous improvement engine for software delivery.

Tools That Help Track DORA Metrics

DORA metrics are powerful, but only when teams can measure them consistently and reliably.

Common Ways Teams Track DORA Metrics…

Approach

How It Works

Limitations

CI/CD Pipeline Analytics

Deployment tools and pipelines provide data on deployments, build times, and failures.

Limited visibility into code review cycles, incident recovery, or upstream workflow delays.

DevOps Platforms

Integrated platforms combine source control, CI/CD, and deployment insights in one environment.

Often focused on pipeline activity rather than the full engineering workflow.

Engineering Intelligence Tools or DORA metrics tools

Dedicated engineering analytics platforms like Hivel aggregate data from repositories, pipelines, issue trackers, and incident systems to generate DORA metrics.

Requires extra tooling investment but provides the most comprehensive and data-rich, AI-powered view of DORA metrics.

Most mature organizations eventually move toward engineering intelligence platforms or DORA metrics tools because they provide a system-level view of software delivery.

What Engineering Leaders Should Look for in a DORA Metrics Platform

Not every tool that claims to track DORA metrics provides meaningful insights. The most effective platforms typically offer several key capabilities.

Capability

Why it Matters

Automated data collection

Metrics should be generated automatically from repositories, pipelines, and incident systems. Manual reporting quickly becomes inaccurate.

End-to-end pipeline visibility

Leaders need to see how code moves from commit to deployment and where delays occur.

Bottleneck identification

The platform should highlight slow review cycles, pipeline delays, or unstable releases.

Trend analysis over time

DORA metrics are most useful when tracked over weeks and months to observe improvement trends.

Integration across developer tools

The system should connect with version control, CI/CD, issue tracking, and incident management tools.

AI impact measurement and correlations

The platform should track AI-related signals and correlate them with DORA metrics to determine whether AI is actually improving delivery performance.

When implemented well, these tools turn DORA metrics into a real-time view of engineering delivery performance.

And as software systems grow more complex - especially in the AI era - this level of context-rich insights becomes increasingly essential.

The Future of Engineering Metrics

Engineering organizations have spent years trying to answer a simple but difficult question:

How do we measure the true performance of software teams?

The introduction of DORA metrics or DevOps metrics shifted the industry toward measuring system-level delivery performance.

However, today’s teams operate in environments shaped by distributed architectures, platform engineering, and increasingly AI-assisted development. These changes have made it clear that understanding delivery performance alone is not enough.

Organizations now want to understand how engineers work within the system as well.

Expanding the Measurement Stack

Modern engineering leaders are beginning to combine DORA DevOps metrics with additional frameworks that capture a broader view of engineering performance.

‍

Framework

What It Helps Measure

DORA Metrics

Delivery performance - how fast and reliably software reaches production.

SPACE Metrics

Developer productivity across satisfaction, activity, collaboration, and efficiency.

Value Stream Metrics

The flow of work from idea to customer value across the product lifecycle.

Developer Experience Metrics

Friction in tooling, environments, and engineering workflows.

AI Productivity Metrics

How AI tools influence coding speed, review cycles, rework, and delivery outcomes.

The Emerging Measurement Philosophy

The future of engineering metrics is therefore not about replacing DORA metrics.

Instead, it is about building a layered measurement model.

DORA continues to measure the health of the delivery system. Other frameworks add context about how developers experience that system and how work flows through it.

This combined approach allows organizations to optimize both sides of engineering performance:

Developer productivity
System delivery performance

When these two dimensions improve together, engineering teams achieve what every modern software organization ultimately aims for: faster innovation, more reliable systems, and sustainable developer productivity.

Final Thoughts: Measuring What Actually Drives Software Delivery

“Improving software delivery isn’t about pushing people harder. It’s about improving the system they work in.” - Jez Humble, Co-author of the book, Continuous Delivery & core member of early DORA leadership group

This statement challenges one of the oldest instincts in engineering leadership: when delivery slows down, the first reaction is often to push teams to work faster.

History shows that the approach rarely works.

Software delivery performance is not limited by individual effort. It is shaped by the system that surrounds the engineers.

When that system is inefficient, even the most talented developers struggle to deliver consistently. When the system is well designed, teams can ship faster without increasing risk or burnout.

The real engineering progress is measured differently.

It shows up in faster delivery of meaningful changes, more stable systems, and quicker recovery when things fail. That is what DORA metrics reveal.

And for engineering leaders willing to look honestly at those signals, they offer something far more valuable than a dashboard!

Get the full picture on your AI adoption and impact.

DORA Metrics: The Complete Guide to Measuring Software Delivery Performance

Reveal Invisible Roadblocks

Review Efficiency

Ready To Maximize the AI Impact of Your Teams?

Get the full picture on your AI adoption and impact.