Vibe Coding Is Fun Until the Incident

Name: Hivel - Software Engineering Productivity Tool
Brand: Hivel
Rating: 4.8 (70 reviews)

min

Content

What Is Not Measured About Vibe Coding

Incidents Follow a Pattern

Why AI Code Fails in Ways Human Code Doesn't

06 Metrics Every Engineering Leader Before Production

How to Build the Governance Layer?

FAQs

What No one Is Measuring About Vibe Coding

We track AI adoption rate. We rarely track AI code quality rate.

Across the engineering organizations we work with, the average AI production-merge rate sits around 18%. That means roughly 82% of AI-generated suggestions get rejected, modified, or never make it to production. The number sounds healthy. It might not be. The question nobody is asking is: what is the selection logic that determines which 18% ships?

In some teams, it is rigorous review catching the security gaps and logic errors that AI consistently produces. In others, it is time pressure, rubber-stamping, and the quiet assumption that 'the AI wouldn't generate something broken.'

That assumption has a documented track record. It is not a good one.

‍

Track your AI production-merge rate as a leading quality signal, not a vanity metric. If it is climbing while your rework rate and post-merge bug escape rate are also climbing, you are rubber-stamping, not reviewing. Hivel's investment profile surfaces this breakdown across your engineering org.

‍

Visual 3 — AI Production-Merge Rate

What Actually Happens to AI-Generated Code?

The 18% production-merge rate — and what the selection logic behind it reveals

18%

Ships to prod

Reaches production 18%

Rejected / modified / abandoned 82%

What drives the 18% that ships?

✅ Rigorous Review — the 18% you want

Caught: SAST

78%

Caught: Review

65%

Ships clean

18%

⚠️ Rubber-Stamping — the rising number to fear

Ships unreviewed

45%

Post-merge bugs

62%

Rework rate

↑ fast

Compare your org — what's your AI merge rate?

% of AI code ships

Source: Hivel engineering org data · Industry baseline n=multiple orgs

‍

The Incidents Are Documented. They Follow a Pattern.

In July 2025, Tea, a women's dating safety app, exposed 72,000 images and 1.1 million private messages via an open Firebase bucket. The database was not hacked. Nobody exploited a sophisticated vulnerability. The Firebase instance had no authorization policies configured. The code was functional. The security was just missing.

In May 2025, CVE-2025-48757 was filed against Lovable, a vibe coding platform with 170+ production applications affected. Missing Row Level Security on Supabase tables. Full database exposure. User data, authentication info, and business data accessible to anyone with the public key.

In late 2025, Enrichlead shut down entirely after an AI-generated codebase put all security logic on the client side. Not a subtle bug. A structural failure that went undetected until it was too late to fix at reasonable cost.

The pattern across every documented incident is the same: functionally correct code with security layers that were never implemented, because the AI was never prompted to implement them and nobody in the review process noticed they were absent.

The code looks right. It just doesn't have the security controls a senior developer would include by default.

Veracode's 2025 GenAI Code Security Report tested over 100 LLMs across four languages and found that 45% of AI-generated code failed OWASP Top 10 security benchmarks. Apiiro tracked a 322% increase in privilege escalation paths in AI-heavy codebases across Fortune 50 enterprises. By June 2025, AI-generated code was introducing over 10,000 new security findings per month in the organizations Apiiro tracked. That's a 10x increase from December 2024.

The speed gains are real. The security debt is also real.

Why AI Code Fails in Ways Human Code Doesn't

AI models generate code that satisfies the stated requirement. They don't generate code that satisfies the unstated assumptions a senior developer carries into every line they write.

When an experienced engineer builds an authentication endpoint, they apply a dozen implicit rules they have learned from incidents, code reviews, and security training. The AI doesn't have that context. It has patterns from training data, and security-critical patterns are statistically rare in public codebases.

This creates a specific failure profile. CodeRabbit's December 2025 analysis of AI versus human pull requests found that AI-generated code was:

2.74x more likely to introduce XSS vulnerabilities
1.91x more likely to create insecure object references (IDOR)
1.88x more likely to mishandle passwords
1.82x more likely to implement insecure deserialization

Visual 1 — AI vs Human Code Risk Profile

AI vs Human Code — Vulnerability Risk Profile

Click any row to see why AI consistently misses this class of vulnerability

AI-Generated Code

Human-Written Code (baseline)

Sources: CodeRabbit Dec 2025 · Apiiro 2025 · Tenzai Dec 2025 · Veracode 2025

‍

The vulnerability classes that increased most sharply were not the obvious logic errors. Syntax errors in AI-generated code actually dropped 76%. The flaws that grew were the dangerous architectural ones that look correct at the function level but expose the system at the access control layer.

Nearly 80% of developers believe AI tools generate more secure code than humans write, according to Snyk's research. The empirical data says the opposite, consistently, across every systematic study reviewed in 2025-2026.

Confidence without calibration is how incidents happen.

‍

Table 3 Preview — AI Code Vulnerability Classes

Vulnerability Class	AI vs Human Rate	Why AI Misses It	What to Gate
Cross-site scripting (XSS)	2.74x more likely	AI optimises for functional output, not output encoding	SAST scan + manual review of all user-facing endpoints
Hardcoded credentials / secrets	6.4% of AI-assisted repos vs 4.6% baseline	AI replicates patterns from training data including insecure examples	Secret scanning on every PR (GitHub Advanced Security, Trufflehog)
Improper password handling	1.88x more likely	Auth logic is statistically rare in training data; AI guesses patterns	Require human review on all auth-related PRs, no exceptions
Insecure object references (IDOR)	1.91x more likely	AI generates access control logic without understanding your authorization model	Cross-user authorization tests on every endpoint with access control
Privilege escalation paths	322% more common in AI-heavy codebases (Apiiro 2025)	Architectural design flaws that look correct at the function level	Threat modeling review before merging any auth or permission change
Missing security headers / CSRF	100% failure rate across tested vibe-coded apps (Tenzai, Dec 2025)	AI generates functional code; security headers are 'extra' configuration it skips	Automated header check in CI (SecurityHeaders.com API or OWASP ZAP)

The Six Metrics Every Engineering Leader Needs Before AI Code Hits Production

Your existing quality metrics were not designed for AI-generated code. Cycle time looks fine. Velocity looks fine. PR count looks fine. None of these tell you whether the code that shipped was actually reviewed or whether the reviewer spent four minutes clicking approve because the diff looked clean and the tests passed.

You need a measurement layer that is specifically designed for the AI code era. Here is what to track.

‍

Visual 2 — 6 Metrics Before AI Code Hits Production

6 Metrics Before AI Code Hits Production

Click any metric to see the red flag threshold and tools to measure it

‍

Table 4 Preview — AI Code Quality Metrics

Metric	What It Catches	Tool to Measure It	Target / Red Flag Threshold
AI-originated code % by PR	Concentration risk: which PRs are high-AI	Hivel investment profile, GitHub Copilot telemetry	No universal target; flag PRs >80% AI-generated for enhanced review
Security finding rate per AI PR vs human PR	Whether AI code introduces disproportionate security debt	Snyk, SonarQube, Semgrep with PR-level attribution	Red flag: AI PRs showing 2x+ finding rate vs human PRs in your repo
Rework rate on AI-sourced code	How much sprint time goes to fixing AI code after merge	Hivel rework rate dashboard	Red flag: rework rate climbing while velocity holds steady
Secret / credential exposure rate	Hardcoded keys, tokens, passwords in AI PRs	Trufflehog, GitHub Advanced Security secret scanning	Zero tolerance. One exposed secret in prod is the incident.
AI production-merge rate	What percentage of AI-generated suggestions actually reach production	Hivel AI impact measurement	Baseline: industry average is ~18%. Your number tells you if reviews are catching or rubber-stamping.
Post-merge bug escape rate by code origin	Whether AI code produces disproportionate prod bugs	Link Jira/Linear bugs to PR origin in Hivel	Red flag: AI-coded features accounting for >40% of post-release bugs at <40% of total code

‍

A few notes on how to use this table in practice.

The AI production-merge rate is the most counterintuitive of these. A high number is not necessarily good. If your merge rate is climbing from 18% to 35%, the question is why. Is it because code quality improved and reviewers are correctly approving more? Or is it because review standards have quietly dropped as engineers normalized AI-generated output?

One 500-person engineering organization with a $2M annual AI tooling investment found that only 12% of AI-generated code was making it to production with meaningful quality signals attached. The other 88% was being generated, discarded, or silently skipped past review. After six weeks of measurement with Hivel, the team had visibility into exactly where the AI code was going and where the rework was accumulating.

‍

Set up AI code attribution in your pipeline before you set targets. You cannot track AI code quality if you cannot tell which code is AI-generated. Hivel maps AI-originated contributions to individual PRs and sprint cycles. Without that attribution layer, all you have is aggregate quality metrics that can't tell you whether AI is helping or hurting.

‍

How to Build the Governance Layer Without Killing the Speed

The response to documented AI code security failures should not be to ban vibe coding. The speed gains are real. Teams that have adopted AI coding tools well are shipping 4x more features per sprint cycle than teams that haven't. You don't want to give that back.

The right response is to build a governance layer that is proportional to the risk level of what the AI is generating.

Low-risk AI code: CRUD, UI, data transforms

Standard code review, SAST on every PR, no additional gates. AI can be trusted here with normal oversight. This covers the majority of your codebase.

High-risk AI code: auth, access control, payment logic, secrets management

Mandatory human review before merge, regardless of AI tool confidence score. Threat modeling before any auth change lands in main. Automated checks for headers, CSRF tokens, and OWASP Top 10 basics running in CI. No exceptions for urgency.

The Tenzai analysis of 15 production vibe-coded applications found that every single one was missing CSRF protection and security headers, and every single tool introduced SSRF vulnerabilities. These are not edge cases. They are defaults.

Enforcement, not policy

Policy without enforcement is wishful thinking. Every governance rule needs a corresponding CI gate. If auth code cannot merge without a designated security reviewer, that rule needs to be enforced by branch protection, not by memory.

SonarQube for code complexity and OWASP patterns. Semgrep for custom rules your team defines around your own security requirements. Trufflehog or GitHub Advanced Security for secret scanning. These run on every PR, automatically, without requiring a human to remember to check.

The teams that are getting AI code right in 2026 are not reviewing AI code more slowly. They are reviewing it more specifically, with tooling that flags the patterns AI consistently gets wrong, so human reviewers can focus on the architectural decisions that tools can't catch.

‍

Measure rework rate on AI-coded features separately from human-coded features for 30 days. If AI-coded features are generating disproportionate post-merge fixes, you have a review problem, not an AI problem. The fix is in the process, not the tool.

‍

Frequently Asked Questions

What is vibe coding quality?

Vibe coding quality refers to the security, correctness, and maintainability of code produced through natural language AI prompts, where developers describe intent and accept AI-generated implementation without writing each line themselves. Vibe-coded applications consistently show higher rates of specific vulnerability classes, particularly missing access controls, hardcoded credentials, and insecure object references, because AI generates functionally correct code without the implicit security assumptions experienced developers apply by default.

What are the main AI code security risks?

The most consistently documented AI code security risks include: XSS vulnerabilities (2.74x more likely in AI PRs than human PRs), improper password handling (1.88x more likely), insecure object references (1.91x), missing security headers and CSRF protection (found in 100% of tested vibe-coded apps in one December 2025 study), and hardcoded credentials. The underlying cause is the same across all categories: AI models optimize for functional output and lack the security context that senior developers internalize from experience.

How do you measure AI code quality in an engineering organization?

Start with attribution: you need to know which code is AI-generated at the PR level before you can measure its quality. Then track six metrics: AI production-merge rate, security finding rate per AI PR vs human PR, rework rate on AI-sourced code, credential exposure rate, post-merge bug escape rate by code origin, and review time per AI PR vs human PR. Rising rework rate combined with rising merge rate is the clearest early warning signal that review standards have dropped.

What percentage of AI-generated code contains security vulnerabilities?

Research findings range from 40% to 62% depending on methodology and tool tested. Veracode's 2025 GenAI Code Security Report, covering 100+ LLMs across four languages, found 45% failure rate against OWASP Top 10. CSA and Endor Labs found 62% of AI-generated code containing design flaws or known vulnerabilities. Escape.tech scanned 1,400 vibe-coded production applications and found 65% had security issues, with 58% containing at least one critical vulnerability.

Can you use AI coding tools safely in enterprise environments?

Yes, with proportional governance. Low-risk AI code (CRUD operations, UI components, data transformations) can use standard review processes with automated SAST. High-risk AI code (authentication, authorization, payment logic, secrets management) requires mandatory human security review, automated header and CSRF checks in CI, and threat modeling before changes merge. The teams getting this right in 2026 are not banning AI tools. They are attributing AI code at the PR level, measuring it separately, and enforcing risk-proportional review gates automatically.

Subscribe to our Newsletter

Sudheer Bandaru

Founder, CEO

Sudheer started as a Software developer in Silicon Valley, worked at startups and large corporations like Merrill Lynch, AT&T, Hewlett Packard. Sudheer got into engineering leadership roles at startups that went IPO, led multiple M&As in the US, and managed remote global teams. During his career, there were many instances where he felt that a lack of data-driven culture for continuous improvement of processes led to poor gut-based decisions and costly mistakes. This problem led him to start Hivel which helps engineering teams continuously improve via access to critical metrics using interactive dashboards and actionable insights.