June 22, 2023

From Bugs to Brilliance: Discover the Path to Software Engineering Excellence while Reducing Production Bugs

From Bugs to Brilliance: Discover the Path to Software Engineering Excellence while Reducing Production Bugs
From Bugs to Brilliance: Discover the Path to Software Engineering Excellence while Reducing Production Bugs
Sudheer Bandaru

Reducing production bugs is a top priority for CTOs and engineering teams in product tech companies, and here's why: Picture this — a bustling tech company with a cutting-edge product but plagued by frequent bugs. These bugs not only frustrate users but also tarnish the company's reputation and cost a fortune to fix. According to a study by IBM, the average cost of a production bug is $100,000. This cost can be even higher for critical bugs that impact customer service or revenue.

Slowing down might sound counterintuitive.

But here’s my take. Providing more time for testing, encouraging developers to be more careful, and creating a culture of quality enable the slowing down process to be more meaningful. The same study by IBM also found that the cost of fixing a bug after it has been released to production is 100 times more expensive than fixing it during the development phase.

You might be wondering what slowing down might look like.

Slowing can help improve production bugs by:

  1. Providing more time for testing: Slowing allows developers to spend more time testing their code before it is deployed to production. This can help to catch bugs early on when they are easier and less expensive to fix.
  2. Encouraging developers to do unit testing: Slowing can help developers spend time testing their code with unit testing. This will avoid the back-and-forth between the dev and QA.
  3. Spend more time reviewing code: Spend time reviewing all Pull Requests and ensure they are thoroughly reviewed. Provide time for devs to act upon the feedback loops from senior engineers.

Learnings derived from insights by Hivel.ai

The interesting pattern that emerged out of the data that we studied across Hivel users indicated that increased review time resulted in reduced production bugs.

It further revealed some learnings that would help engineering teams reduce production bugs and improve feature quality.

  1. When release cycles are forced to be at a very high speed, the reviewers of code who are custodians of quality may compromise their time spent on evaluating the quality of the PRs. Re-evaluate your team’s PR review procedures that indicate a high change failure rate.
  2. Features are being shipped faster than yesterday, but product adoption is slowing down due to increased glitches in the product. Poor code and feature quality is defeating the purpose of features being shipped at a high speed. Hence, don’t overvalue ‘speed and faster release cycles’ at the cost of neglected quality.
  3. Identify unreviewed PRs merged using the metric around time spent on PRs reviewed. Ideally, the larger PRs need more time to be reviewed. On the contrary, if lengthier PRs are marked as reviewed in no time, they can be flagged as flashy reviews (1 min reviews or rubber stamp reviews). PR size is proportionate to the PR review time.

Our customers use data-driven insights emerging from Hivel.ai to bring leadership buy-in to encourage radical thinking balanced with proof-based decisions.

One of our customers in the e-commerce tech industry utilized insights obtained from hivel.ai through their dev tools to persuade their leadership and gain approval on the benefits of slowing down their feature release cycle. Even before they could start measuring, they had 3-months of data in retrospection right from day-1 of using Hivel.ai

Download and read the full story of how they got Leadership Buy-in

Metrics to measure to reduce production bugs

If you are also trying to find out how to reduce production bugs, here are 4 metrics that you should start measuring and evaluating.

  • Deployment frequency — How many releases are you doing today for a given timeframe?
  • No.of unreviewed PRs — how many PRs are you unable to review due to time constraints?
  • Time taken to review PRs — are we rushing the code review process to release faster?
  • Change failure rate — with this review process, how many production bugs are happening?

John Doerr once said, "If you can't measure it, you can't improve it."

**Hivel.ai can help you run these metrics on dashboards designed for engineering leaders. You can measure them and define the reasons for recurring production bugs.**

Production Bugs are not for free! They cost you money.

Production bugs can result in a waste of money in a number of ways, but the costs may not seem very apparent.

  • Customer support costs: Bugs can lead to customer support tickets, which can be expensive to handle.
  • Reputational damage: Bugs can damage an organization's reputation, which can lead to lost sales.
  • Revenue loss: Bugs can lead to revenue loss if they prevent customers from using an organization's products or services.
  • Increased development costs: Bugs can lead to increased development costs if they require developers to spend time fixing them.
  • Slipped deadlines: Every time there’s a production bug, they derail your roadmap and impact the deadline.
  • Engineer burnout: Engineers plan to develop a certain feature but have to context switch to the production bug, slowing them down. Repeated context-switching frustrates the devs and causes burnout.

A study by Tricentis revealed that 80% of consumers would abandon a mobile app or website if they encountered a bug, emphasizing the need for robust engineering practices in maintaining user trust.

Learn from failures. Lead by example.

Tobi Lütke, CEO of Shopify, believes, "Efficiency is doing things right; effectiveness is doing the right things." According to a report by Statista, software bugs, and inefficiencies cost businesses approximately $1.7 trillion annually.

In a bug-free environment, your product in the hands of the end-user and customer becomes very powerful. Slowing down adds pace to the production cycles. It accelerates time to market with improved engineering efficiency.  Etsy, the global marketplace for unique and handmade goods, led this by example. By focusing on engineering efficiency, they empowered their sellers to quickly onboard, list products, and process orders efficiently, creating a seamless buying experience for customers.

Reducing production bugs not only improves operational efficiency but also builds trust and enhances a company's reputation. The online payment platform PayPal recognizes this importance. By prioritizing engineering excellence, they have earned the trust of millions of users worldwide. As Sri Shivananda, CTO of PayPal, asserts, "We are responsible for delivering a reliable, secure, and performant platform that empowers our customers and instills trust."

Slowing can help to reduce production bugs and reduce the waste of money that they can cause. By providing more time for testing, encouraging developers to be more careful, and creating a culture of quality code review processes, slowing can help to ensure that organizations produce high-quality software that is free of bugs.

Download Case Study: Boosting Engineering Efficiency: How a tech unicorn slashed productionbugs by 22%?

Written by
From Bugs to Brilliance: Discover the Path to Software Engineering Excellence while Reducing Production Bugs
Sudheer Bandaru
Founder, CEO

Sudheer started as a Software developer in Silicon Valley, worked at startups and large corporations like Merrill Lynch, AT&T, Hewlett Packard. Sudheer got into engineering leadership roles at startups that went IPO, led multiple M&As in the US, and managed remote global teams. During his career, there were many instances where he felt that a lack of data-driven culture for continuous improvement of processes led to poor gut-based decisions and costly mistakes. This problem led him to start Hivel which helps engineering teams continuously improve via access to critical metrics using interactive dashboards and actionable insights.

Engineering teams love