22,000 confirmed breaches. Three findings. One gap they all share.
Verizon published the 2026 Data Breach Investigations Report in May, and for anyone working in data security, the timing is useful. The DBIR is the closest thing the industry has to an unbiased empirical record of how breaches actually happen. It does not have a product to sell. It just reports what happened across more than 22,000 confirmed breaches and 31,000 total incidents in the twelve months ending October 2025.
Reading the 2026 DBIR through an AI data readiness lens, three findings stand out. They are not the three headline numbers most write-ups are leading with. But they are the three that matter most if you are trying to understand where enterprise data exposure is heading, and why the organizations that cannot answer basic questions about their sensitive data are the ones showing up in next year's report.
Finding one: Shadow AI is now the third most common insider DLP event
This is the finding that matters most for any organization that has deployed AI tools without a corresponding data governance program, which at this point describes most enterprises.
Shadow AI is now the third most common non-malicious insider action detected in DLP datasets, representing a fourfold increase from the previous year. 45% of employees are now regular users of AI on corporate devices, up from just 15% the year before, and 67% of those users are accessing AI services through non-corporate accounts on corporate devices.
The data types being submitted to unauthorized AI systems are worth noting specifically. The DBIR identified source code, internal documents, structured data, and technical documentation being uploaded into unauthorized AI platforms. These are not accidental exposures. Employees are submitting the data they are actively working with because they want to use AI to do their jobs faster. The problem is that the data they are working with was never classified, the access they have to it was never right-sized, and nobody built a governance model that accounts for employees sending sensitive business data to external AI systems through personal accounts.
Most enterprise DLP programs were built to catch data leaving a defined boundary, through email, web uploads, USB devices. Shadow AI breaks that model because the "exfiltration" looks like normal web traffic to a legitimate service. By the time DLP catches a policy violation, the data has already left. The sustainable answer is not a more aggressive DLP policy. It is knowing what data your employees have access to, classifying what is genuinely sensitive, and building the controls upstream so that the conditions leading to a Shadow AI DLP event are addressed before the employee ever opens a browser tab.
That is an AI data readiness problem, not a DLP configuration problem.
Finding two: Third-party breaches jumped 60% and now account for nearly half of all breaches
Based on analysis of more than 31,000 security incidents, the 2026 DBIR finds that third-party involvement in breaches jumped 60% year over year, with third-party breaches now accounting for 48% of total breaches.
The remediation picture behind that number makes it worse. Looking at cloud-based exposures in the third-party dataset, only 23% of third-party organizations fully remediated missing or improperly secured MFA on their cloud accounts. For weak passwords and permission misconfigurations, the time to resolve 50% of all findings exceeded eight months.
The standard response to third-party breach risk is vendor risk management: SOC 2 reports, security questionnaires, annual reviews. None of that addresses the actual exposure. A vendor with a clean SOC 2 report can still have access to your customer PII, your financial records, or your intellectual property through a connection that nobody has reviewed in two years. The SOC 2 tells you the vendor has controls. It says nothing about what sensitive data in your environment that vendor can reach.
The 60% jump in third-party breaches is not primarily a vendor security maturity problem. It is a data visibility problem. Organizations that cannot answer the question "what sensitive data can each of our third parties access" are exposed, regardless of how thorough their vendor questionnaire is. Effective third-party data risk management requires knowing what data each relationship exposes and whether that access is still appropriate, continuously, not just at the point of onboarding.
Finding three: Vulnerability exploitation is now the primary initial access vector, and data context determines what that actually means
The 2026 Verizon DBIR found that vulnerability exploitation is the top initial access vector, accounting for 31% of data breaches during the study period. Even more concerning is that the median time-to-patch has increased from 32 days to 43 days, a 34% increase. Only 26% of critical vulnerabilities listed in the CISA Known Exploited Vulnerabilities catalog were fully remediated during 2025, down from 38% the previous year.
The reason remediation is getting slower while the volume of critical vulnerabilities grows faster is not that security teams are less capable. It is that they cannot prioritize effectively without knowing what each vulnerable resource actually contains. If your vulnerability management platform flags a misconfigured cloud storage bucket without telling you whether it holds 40,000 customer financial records or an empty test environment, every finding carries the same theoretical severity. Teams are paralyzed by volume because they lack the data context that would let them rank by actual business impact.
Verizon mapped every MITRE ATT&CK technique attackers use to escalate privilege, and 83% of privilege escalation incidents in the dataset involved no CVE exploitation at all. The exposure surface that matters most to attackers is not primarily the unpatched CVE list. It is the accumulated misconfigurations, overpermissioned identities, and unclassified sensitive data that sits behind those resources. That surface cannot be addressed by infrastructure scanning alone. It requires knowing what the data is.
The thread connecting all three findings
Read the three findings together and they describe a single underlying gap from three different angles.
Shadow AI is employees sending sensitive data to external AI systems because that data was never governed tightly enough to prevent it. Third-party breaches are attackers reaching sensitive data through vendor connections that were never scoped to what data they actually needed to access. Vulnerability exploitation is attackers using misconfigured or unpatched resources to reach sensitive data that security teams cannot prioritize protecting because they do not have data context on those resources.
In each case, the attack or exposure succeeds because the organization does not have a current, continuous, and accurate picture of what sensitive data exists, where it lives, and who or what can reach it. The DBIR does not use the phrase "AI data readiness." But the gap it describes across all three of these findings is exactly the gap that AI data readiness closes.
What the DBIR's Shadow AI finding specifically means for AI data readiness programs
The Shadow AI finding deserves more attention than it typically gets in DBIR coverage, because it represents a structural shift in how sensitive data moves out of organizations. It is no longer primarily an external attacker problem. It is an internal behavior problem driven by employees who want to use AI and are doing so with whatever data they have access to, regardless of whether that data should be leaving the organization at all.
The organizations showing up in the DBIR Shadow AI finding share a common characteristic: they deployed AI tools, or their employees found AI tools, before they had a clear picture of what sensitive data those employees could reach. The employees did not decide to leak intellectual property. They decided to paste a document into an AI assistant to get a summary faster. The fact that the document was sensitive was invisible to them, because nobody had classified it.
Classifying sensitive data before AI touches it is the only sustainable version of Shadow AI governance. DLP can slow the bleeding. Access controls on specific AI domains can block known services. But employees will find new services, new browser extensions, new ways to get AI assistance on the data they are already working with. The only durable answer is to know what data is sensitive, right-size who can access it, and build that classification into the controls before the employee ever opens an AI tool.
How Sentra approaches the gap the DBIR describes
Sentra continuously discovers and classifies sensitive data across cloud, SaaS, on-premises, and data warehouse environments, entirely within the customer's own environment so that data never leaves the organization's control to be analyzed. That means the data governance picture is always current, always covering the full environment, and never depends on a periodic scan that is already outdated by the time anyone reads it.
For the three DBIR findings discussed above, this translates directly into practical outcomes. Shadow AI exposure is reduced by ensuring the data employees work with is classified and access-governed before any AI tool can reach it. Third-party breach risk is addressed by maintaining continuous visibility into what sensitive data each third-party connection can access, not just whether the third party passed a questionnaire. Vulnerability remediation becomes more actionable when each finding is enriched with data context, so security teams can prioritize by actual business impact rather than theoretical infrastructure severity.
Classification drives enforcement in each case. When Sentra identifies sensitive data that is overexposed, that signal automatically triggers remediation: access is right-sized, stale copies are flagged for cleanup, and the relevant team receives a prioritized action. The answer to the gap the 2026 DBIR describes is not faster patching, more aggressive DLP, or better vendor questionnaires. It is a continuous, accurate understanding of where sensitive data lives, who can reach it, and whether that access is appropriate.
Key takeaways
- Shadow AI is now the third most common insider DLP finding, representing a fourfold increase year over year. 45% of employees regularly use AI on corporate devices. The root cause is not bad intent but unclassified, over-accessible data.
- Third-party breaches jumped 60% and now account for 48% of all breaches. Vendor risk management without data-level visibility into what each third party can access is addressing the wrong question.
- Vulnerability exploitation is the primary initial access vector at 31% of breaches, with median time to patch growing from 32 to 43 days. Data context is what makes remediation prioritization actionable rather than theoretical.
- All three findings point to the same underlying gap: organizations do not have a current, continuous understanding of what sensitive data exists, where it lives, or who and what can reach it.
- AI data readiness is the framework that closes that gap. The DBIR does not use the term, but it describes the problem in three separate findings from three separate angles.
The 2026 DBIR analyzed more than 22,000 confirmed breaches. The patterns it describes point consistently toward organizations that do not know enough about their own sensitive data to protect it. That is the problem AI data readiness is built to solve.
Download the AI Data Readiness white paper to see what a continuous data governance model looks like in practice, and what most organizations find when they first map what AI can already reach.
Read our post on what AI data readiness actually means for the foundational context behind the findings discussed here.
-> Request a demo to see what sensitive data is reachable in your environment today
Related reading: What is Shadow AI? | Data Access Governance | Overpermissioned Data | Secure AI Agents | M365 Copilot Adoption
