Why 91% of SAST Findings Are False Positives (And What To Do About It)

91%

of SAST findings are false positives, according to Ghost Security's 2025 research

Static Application Security Testing (SAST) tools are supposed to find vulnerabilities in source code before they reach production. In theory, they shift security left -- catching bugs early when they are cheapest to fix. In practice, they generate so much noise that developers stop paying attention.

The numbers are damning. Ghost Security's "Exorcising the SAST Demons" report, published in June 2025, analyzed findings from traditional SAST tools across thousands of applications and found that 91% of all SAST findings are false positives. For certain vulnerability categories, the numbers are even worse: Python/Flask command injection findings had a 99.5% false positive rate.

This is not a tool problem. It is a fundamental limitation of how traditional static analysis works, and it has real consequences for organizations that rely on these tools as their primary code security mechanism.

The False Positive Crisis by the Numbers

The Ghost Security research quantified what most security engineers already knew anecdotally. Here are the key findings:

Metric	Value
Overall false positive rate	91%
Python/Flask command injection FP rate	99.5%
Java SQL injection FP rate	~88%
Average findings per scan (medium+ severity)	200-500
Time to triage a single finding	15-45 minutes

Do the math. If a typical scan produces 300 findings and 91% are false positives, that means 273 of them are noise. At 30 minutes per triage, that is 136 hours of engineering time wasted chasing findings that are not real -- per scan. If you scan weekly, that is more than three full-time engineers doing nothing but reviewing false positives.

Why SAST Tools Produce So Many False Positives

Traditional SAST tools analyze source code without executing it. They trace data flows from user-controlled inputs (sources) to security-sensitive operations (sinks) and flag any path where tainted data might reach a dangerous function. The problem is that this analysis lacks several critical pieces of context:

1. No Runtime Context

SAST tools cannot know which code paths are actually reachable at runtime. A function that appears to accept unsanitized input might only be called internally with pre-validated data, but the SAST tool flags it anyway because it can see a theoretical path from source to sink.

2. Framework-Specific Sanitization

Modern web frameworks include built-in sanitization that SAST tools often fail to recognize. Django's ORM parameterizes SQL queries by default. React escapes JSX output by default. Flask's Jinja2 templates auto-escape HTML. When SAST tools do not model these framework behaviors, they flag every database query or template render as a potential injection point.

The 99.5% false positive rate for Python/Flask command injection is a direct result of this. Flask applications route user input through layers of framework-provided sanitization that the SAST tool does not understand.

3. Overly Broad Taint Tracking

SAST tools err on the side of caution -- which means flagging anything that could theoretically be a vulnerability, regardless of how unlikely exploitation actually is. A variable that was user-controlled three function calls ago but has since been validated, cast to an integer, and bounds-checked will still be flagged if the taint tracking does not account for those transformations.

4. Configuration Blindness

Many security configurations happen outside the code -- in web server configs, WAF rules, cloud security groups, or runtime environment variables. A SAST tool looking only at source code cannot know that a WAF blocks SQL injection at the edge, or that the application runs behind a reverse proxy that strips dangerous headers.

The Consequence: Alert Fatigue and Learned Helplessness

When 9 out of 10 alerts are false, developers learn to ignore them. This is not a character flaw -- it is a rational response to unreliable information. Research in cognitive psychology calls this "alarm fatigue," and it is well-documented in fields from aviation to healthcare.

The practical consequences for software security are severe:

Real vulnerabilities get buried. The 9% of findings that are genuine vulnerabilities are hidden in a haystack of noise. A critical SQL injection finding looks identical to the dozens of false positives that came before it.
Developers route around security tools. Teams start suppressing findings in bulk, marking categories as "won't fix," or adding blanket suppressions to CI pipelines to keep builds passing. We have seen codebases with thousands of suppressed findings -- some of which were real vulnerabilities.
Security loses credibility. When security teams file tickets for findings that turn out to be false positives, developers lose trust in the security function. Subsequent legitimate findings face skepticism, even when they are critical.
Compliance becomes checkbox security. Organizations running SAST tools for compliance (SOC 2, PCI-DSS) treat the scan results as a checkbox rather than actionable intelligence. The report says "we ran SAST," but the findings were never meaningfully triaged.

What Actually Works: Reducing SAST Noise

The false positive problem is not unsolvable. Here are the approaches that produce the most significant improvements, ordered by impact:

1. Reachability Analysis

Not all code is reachable from the application's entry points. Reachability analysis builds a call graph from actual entry points (HTTP routes, message handlers, CLI commands) and only flags vulnerabilities on reachable paths. This eliminates findings in dead code, test utilities, unused library functions, and other code that cannot be triggered by an attacker.

Modern tools like Semgrep's DeepSemgrep and Snyk Code incorporate reachability analysis. The reduction in false positives is substantial -- typically 40-60% of findings are eliminated as unreachable.

2. AI-Powered Triage

This is the approach we use at CELVEX Group. Instead of relying solely on static pattern matching, we feed SAST findings through an AI triage pipeline that evaluates each finding in context:

Code context analysis: The AI examines the surrounding code, not just the flagged line. It recognizes validation functions, type conversions, and framework-specific sanitization patterns that traditional SAST misses.
Historical pattern matching: Findings that match patterns we have previously confirmed as false positives are automatically downgraded. The system learns from every triage decision.
Exploitability assessment: For each finding, the AI evaluates whether a realistic attack path exists. A SQL injection finding where the input is cast to an integer before reaching the query is not exploitable, regardless of what the taint tracker says.

Our pipeline reduces the effective false positive rate from ~91% to under 15%, meaning the findings that reach a human analyst are overwhelmingly real vulnerabilities that need fixing.

3. Custom Rule Tuning

Out-of-the-box SAST rules are designed to be generic. They work across all codebases, which means they are tuned for maximum recall (catch everything) at the expense of precision (only report real issues). Custom rules tuned to your specific frameworks, patterns, and conventions dramatically reduce noise.

Practical steps:

Disable rules for vulnerability classes your framework handles by default (e.g., disable SQL injection rules if your ORM parameterizes all queries)
Create custom rules that understand your internal sanitization libraries
Set severity thresholds so only high-confidence findings break the build

4. Validate Before Assigning

Never assign a SAST finding to a developer without first confirming it is real. This requires a security engineer (or AI system) to review the finding, verify the vulnerability exists, and provide a clear explanation of the risk and remediation. This takes more time upfront but preserves developer trust and ensures that when a finding is assigned, it actually needs to be fixed.

5. Complement SAST with DAST and Manual Review

SAST is one layer of a defense-in-depth approach. Dynamic Application Security Testing (DAST) validates findings at runtime, confirming whether a potential vulnerability is actually exploitable. Manual code review by security researchers catches logic flaws and business logic vulnerabilities that no automated tool can find.

The most effective security programs use all three: SAST for broad coverage, DAST for validation, and manual review for the nuanced, high-impact findings that require human judgment.

The Path Forward

SAST is not dead. It is a valuable tool when used correctly. But using it correctly means acknowledging its limitations, investing in noise reduction, and never treating raw SAST output as actionable without triage.

The 91% false positive rate is not an indictment of static analysis as a concept. It is an indictment of how most organizations deploy it: install the tool, run it in CI, dump the results into Jira, and call it a security program. That approach does not find vulnerabilities. It finds false positives and buries real issues in the noise.

The organizations getting real value from SAST are the ones that treat it as a data source rather than an oracle -- one input into a triage process that includes AI analysis, reachability checks, and human validation before any finding reaches a developer.

Tired of chasing false positives?

CELVEX Group's AI-powered code audit reduces SAST noise by 85%+ and delivers only validated, exploitable findings.

Scan Your Domain Free

Sources

Help Net Security: Traditional SAST tools are failing developers -- Coverage of the Ghost Security report
Ghost Security -- "Exorcising the SAST Demons" research report
OWASP: Source Code Analysis Tools
Semgrep Documentation -- modern SAST with custom rule support