Jun 30, 20268 Min ReadAI Data Readiness

Why AI Agents Are Exposing a Problem CISOs Have Had All Along

VP Product Marketing & Analyst Relations

It's not how many AI agents you have. It's whether you know what data they can reach.

It's easy to think AI agents created a new security problem, but they didn't. What AI agents have actually done is expose a problem most enterprises have lived with for years: they don't have a complete, current understanding of what sensitive data they have, where it lives, or who and what can access it.

The conversation around AI security has become increasingly focused on visibility. Organizations are trying to inventory their AI applications, discover shadow AI, and count the number of agents operating across the enterprise. Those are useful exercises, but they aren't the question that actually matters. A single AI agent with unnecessary access to sensitive customer data creates far more risk than a hundred agents that only ever interact with public documentation.

The question every CISO should be asking is much simpler: what sensitive data exists across our environment today, and who or what can already access it? Most organizations can't answer that with confidence.

Somewhere in your environment right now, there is sensitive data that no one has classified. It could be a finance record, an HR file, or a customer document sitting in a folder that hasn't been reviewed in years. For a long time, that data represented relatively low risk simply because nothing fast or automated could reach it. That's no longer true. Something can reach it now, synthesize it, and act on it without anyone in the loop, and whether that something is an AI agent, an application, or a human with excessive access is almost beside the point. The data was never classified. That's the actual gap.

According to OWASP's 2026 research, 68% of organizations cannot reliably distinguish human activity from AI agent activity. That statistic is often treated as an identity problem, but in reality it's evidence of something much broader. Most enterprises still don't have a reliable, continuously updated understanding of what their sensitive data actually is, where it lives, or who and what can reach it. AI agents didn't create that gap. They made it impossible to ignore.

The exposure was always about the data

For two decades, security maturity has been measured by how well organizations track known things, such as assets, identities, vulnerabilities, and patch cycles. That approach quietly assumes the thing worth tracking is whatever acts, whether it's a user, an application, or now an AI agent. But the asset that's actually exposed, leaked, or misused has never been the identity. It's the data.

An identity with broad access to nothing sensitive is a non-event. The same identity with access to unclassified customer records, financial information, or intellectual property is the entire risk, and that risk existed long before the first AI agent was ever deployed. What's changed isn't the sensitivity of the data. It's the speed at which something can now discover it, combine it, and act on it.

A developer connects a new integration to a SharePoint library. A business unit automates reporting against a database that hasn't been reviewed in years. An IT team deploys Microsoft 365 Copilot against a file share. None of those actions typically trigger a security review, because the integration itself isn't seen as the event worth flagging. The real event is what sits behind that integration. Was the data ever classified? Is it actually sensitive? Should that AI application, service account, or user have access to it in the first place? The AI agent is simply the first thing fast enough to expose the answer.

Counting AI agents won't tell you where your risk is

You can't govern what you can't count, but counting the things that can reach your data tells you very little if you've never identified the sensitive data itself. The access path is the door. The data is what's behind it. Today, most organizations have a much better inventory of their doors than they do of what's inside the rooms.

That's why the current conversation around AI governance is incomplete. Knowing you have 500 AI agents says almost nothing about your actual exposure. Knowing that one of those agents can access payroll data, M&A documents, or regulated customer records tells you everything you need to know. AI visibility is useful. AI data readiness is what actually determines risk.

Why this data gap is different from every visibility problem before it

Security teams have dealt with blind spots before. Shadow IT, unmanaged endpoints, forgotten cloud storage, and abandoned SaaS applications all followed a familiar pattern, because the underlying assets changed slowly enough that periodic reviews eventually caught up. Enterprise data doesn't behave that way anymore, and neither do the access paths into it.

This is where identity statistics become useful, not as the headline, but as evidence of the underlying data problem. According to Entro Security's 2025 research, 97% of non-human identities carry excessive privileges, and the typical enterprise operates roughly 100 non-human identities for every human identity. Read those numbers as a statement about your data, not your identities. For every dataset a human has consciously reviewed, there are hundreds of potential access paths that have never been evaluated in the context of the data they expose. Much of that data has never been classified at all. Nobody determined it wasn't sensitive. Nobody ever looked.

That's the real shift. The data didn't become more sensitive. The number of fast, automated identities capable of reaching unclassified sensitive data exploded, while the classification process remained largely manual and point in time.

The question every CISO needs to answer, and currently can't

Here's the test. Ask your security leadership one simple question: what sensitive data exists across our environment today, and who or what can already access it?

If the honest answer includes phrases like 'we're still mapping that out' or 'it depends which systems you mean,' your organization isn't unusual, but it does mean you're making AI decisions without a complete understanding of the data those systems can already reach. The question itself isn't new. Security teams have always needed to know where sensitive data lives and who can access it. What's changed is the speed. New access paths are opening continuously through AI agents, applications, service accounts, copilots, and integrations, while many organizations still rely on spreadsheets, periodic access reviews, and point-in-time data discovery projects to understand what data exists. Those approaches simply can't keep pace.

This isn't an argument for slowing AI adoption. It's a recognition that knowing what your data is, where it lives, and who or what can access it must become a continuous process instead of a periodic audit. By the time an incident forces the question into the open, the question has already changed from 'what sensitive data can AI access' to 'what has AI already done with the sensitive data it could access.'

AI data readiness starts with the data

Most AI security discussions begin with the AI itself, and that's backwards. AI systems inherit the permissions, governance, and data quality that already exist across the enterprise. If sensitive data is unknown, unclassified, or overpermissioned, AI simply operates on that same flawed foundation.

That's why AI data readiness has to start with the data itself. Before organizations can govern AI effectively, they need continuous visibility into what sensitive data exists, where it lives, who or what can access it, and how that access changes over time. Only once those questions are answered can they make informed decisions about AI governance, least privilege, and automated enforcement.

How Sentra approaches AI data readiness

Sentra approaches the problem from the data outward. It continuously discovers data across cloud, SaaS, collaboration platforms, on-premises environments, and data warehouses, without moving customer data outside the customer's own environment.

Once discovered, the data is continuously classified using more than 250 classifiers across 130 file formats, covering structured data, unstructured documents, and images, with greater than 98% classification accuracy validated by customers operating at petabyte scale. Classification becomes the foundation for everything that follows.

Once sensitive data is understood, Sentra maps every identity that can access it, whether that identity is a human user, a service account, an application, or an AI agent. It continuously evaluates those access paths as data moves between systems, into knowledge bases, or into AI workflows. Most importantly, the process doesn't stop at visibility. Classification drives action. Access can be right-sized through data access governance, redundant or stale data can be identified for cleanup, and security teams receive prioritized remediation instead of another stream of unreviewed alerts.

That sequence is what matters: discover, classify, govern, and remediate. Not once, but continuously. This is what DSPM for AI looks like in practice.

Key takeaways

The data is the asset at risk, not the identity reaching it. An access path to unclassified sensitive data is the actual exposure. An access path to nothing sensitive is a non-event. You can't tell the difference without classification.
97% of non-human identities carry excessive privileges, and the typical enterprise runs roughly 100 non-human identities for every human one. Read that as a statement about how much of your data estate has never been reviewed, not as an identity-count statistic.
Classification has to come before access governance, not alongside it. Without it, access reviews are reviewing permission levels without ever confirming whether the data behind them is actually sensitive.
Periodic audits are structurally too slow for how fast access paths now open. Classification and governance have to run continuously, at the same pace data and access actually change.
The fix starts with discovering and classifying your data, not counting what can reach it. Map what's sensitive first. Everything else follows from that.

Find out what your data actually looks like to AI

Your organization's AI agent count was never the real question. The real question is what those agents can already see, and whether anyone classified it as sensitive before they got there.

-> See what your sensitive data looks like to AI. Book a demo with Sentra.

-> Secure AI agents in your environment

FAQs

If agent count isn't the right metric, why does it keep coming up?

Because it's measurable and unsettling — a number you can put in a headline. But two organizations with an identical agent count can carry wildly different risk depending on what data those agents can reach and whether that data was ever classified. The agent count tells you the size of the door. It says nothing about what's in the room.

Why can't traditional data security tools answer this on their own?

Most data security tooling was built around periodic, point-in-time scans — a snapshot of where sensitive data lives, refreshed on a quarterly or annual cycle. New access paths into that data open continuously, often within days, which means a periodic snapshot is stale almost as soon as it's produced.

Doesn't this apply just as much to humans as it does to AI agents?

Yes — and that's part of the point. Unclassified sensitive data has always been a risk regardless of who or what could reach it. What's changed is that AI agents reach and act on data far faster and more continuously than a human ever would, which means unclassified data that sat passively at risk for years is now being actively touched.

Is this primarily a cloud data problem, or does it affect on-premises data too?

Both. Sensitive data sits in cloud storage, SaaS platforms, on-premises file servers, and data warehouses alike. The exposure follows wherever the data lives, not any particular storage location or deployment model.

What's the first practical step toward closing this gap?

Start with discovery and classification: a continuous, automated inventory of what data actually exists across your environment and what's genuinely sensitive about it. Only once that exists does mapping access — human or AI — actually mean anything.