Sentra Launches Breakthrough AI Classification Capabilities!
All Resources
In this article:
minus iconplus icon
Share the Blog

Why Legacy Data Classification Tools Don't Work Well in the Cloud (But DSPM Does)

September 7, 2023
5
Min Read
Data Security

Data security teams are always trying to understand where their sensitive data is. Yet this goal has remained out of reach for a number of reasons.

The main difficulty is creating a continuously updated data catalog of all production and cloud data. Creating this catalog would involve:

  1.  Identifying everyone in the organization with knowledge of any data stores, with visibility into its contents
  1. Connecting a data classification tool to these data stores
  1. Ensure there’s network connectivity by configuring network and security policies
  1. Confirm that business-critical production systems using each data source won’t be negatively affected, causing damage to performance or availability

Having a process this complex requires a major investment of resources, long workflows, and will still probably not provide the full coverage organizations are looking for. Many so-called successful implementations of such solutions will prove unreliable and too difficult to maintain after a short period of time.

Another pain with a legacy data classification solution is accuracy. Data security professionals are all too aware of the problem of false positives (i.e. wrong classification and data findings) and false negatives (i.e. missing classification of sensitive data that remains unknown). This is mainly due to two reasons.

 

  • Legacy classification solutions rely solely on patterns, such as regular expressions, to identify sensitive data, which falls short in both unstructured data and structured data. 
  • These solutions don’t understand the business context around the data, such as how it is being used, by whom, for what purposes and more.

Without the business context, security teams can’t get any actionable items to remove or protect sensitive data against data risks and security breaches.

Lastly, there’s the reason behind high operational costs. Legacy data classification solutions were not built for the cloud, where each data read/write and network operation has a price tag.

The cloud also offers a much more cost efficient data storage solution and advanced data services that causes organizations to store much more data than they did before moving to the cloud. On the other hand, the public cloud providers also offer a variety of cloud-native APIs and mechanisms that can extremely benefit a data classification and security solution, such as automated backups, cross account federation, direct access to block storage, storage classes, compute instance types, and much more. However, legacy data classification tools, that were not built for the cloud, will completely ignore those benefits and differences, making them an extremely expensive solution for cloud-native organizations.

DSPM: Built to Solve Data Classification in the Cloud 

These challenges have led to the growth of a new approach to securing cloud data - Data Security Posture Management, or DSPM. Sentra’s DSPM  is able to provide full coverage and an up-to-date data catalog with classification of sensitive data, without any complex deployment or operational work involved. This is achieved thanks to a cloud-native agentless architecture, using cloud-native APIs and mechanisms.

A good example of this approach is how Sentra’s DSPM architecture leverages the public cloud mechanism of automated backups for compute instances, block storage, and more. This allows Sentra to securely run its robust discovery and classification technology from within the customer’s premises, in any VPC or subscription/account of the customer’s choice.

This offers a number of benefits:

  1. The organization does not need to change any existing infrastructure configuration, network policies, or security groups.
  1. There’s no need to provide individual credentials for each data source in order for Sentra to discover and scan it.
  1. There is never a performance impact on the actual workloads that are compute-based/bounded, such as virtual machines, that run in production environments. In fact, Sentra’s scanning will never connect via network or application layers to those data stores.

Another benefit of a DSPM built for the cloud is classification accuracy.  Sentra’s DSPM provides an unprecedented level of accuracy thanks to more modern and cloud-native capabilities.This starts with advanced statistical relevance for structured data, enabling our classification engine to understand with high confidence that sensitive data is found within a specific column or field, without scanning every row in a large table.

Sentra leverages even more advanced algorithms for key-value stores and document databases. For unstructured data, the use of AI and LLM -based algorithms unlock tremendous accuracy in understanding and detecting sensitive data types by understanding the context within the data itself. Lastly, the combination of data-centric and identity-centric security approaches provides greater context that allows Sentra’s users to know what actions they should take to remediate data risks when it comes to classification.

Here are two examples of how we apply this context:

1. Different Types of Databases

Personal Identifiable Information (PII) that is found in a database in which only users from the Analytics team have access to, is often a privacy violation and a data risk. On the other hand, PII that is found in a database that only three production microservices have access to is expected,  but requires the data to be isolated within a secure VPC. 

2. Different Access Histories

If 100 employees have access to a sensitive shadow data lake, but only 10 people have actually accessed it in the last year. In this case, the solution would be to reduce permissions and implement stricter access controls. We’d also want to ensure that the data has the right retention policy, to reduce both risks and storage costs. Sentra’s risk score prioritization engine takes multiple data layers into account, including data access permissions, activity, sensitivity, movement and misconfigurations, giving enterprises greater visibility and control over their data risk management processes.

With regards to costs, Sentra’s Data Security Posture Management (DSPM) solution utilizes innovative features that make its scanning and classification solution about two or three orders of magnitude more cost efficient than legacy solutions. The first is the use of smart sampling, where Sentra is able to cluster multiple data units that share the same characteristics, and using intelligent sampling with statistical relevance, understand what sensitive data exists within such data assets that are grouped automatically. This is extremely powerful especially when dealing with data lakes that are often the size of dozens of petabytes, without compromising the solution coverage and accuracy.

Second, Sentra’s modern architecture leverages the benefits of cloud ephemeral resources, such as snapshotting and ephemeral compute workloads with a cloud-native orchestration technology that leverages the elasticity and the scale of the cloud. Sentra balances its resource utilization with the needs of the customer's business, providing advanced scan settings that are built and designed for the cloud. This allows teams to optimize cost according to their business needs, such as determining the frequency and sampling of scans, among more advanced features.

To summarize:

  1. Given the current macroeconomic climate, CISOs should find DSPMs like Sentra as an opportunity to increase their security and minimize their costs
  2. DSPM solutions like Sentra bring an important context - awareness to security teams and tools, allowing them to do better risk management and prioritization by focusing on whats important
  3. Data is likely to continue to be the most important asset of every business, as more organizations embrace the power of the cloud. Therefore, a DSPM will be a pivotal tool in realizing the true value of the data while ensuring it is always secured
  4. Accuracy is key and AI is an enabler for a good data classification tool

<blogcta-big>

Yair brings a wealth of experience in cybersecurity and data product management. In his previous role, Yair led product management at Microsoft and Datadog. With a background as a member of the IDF's Unit 8200 for five years, he possesses over 18 years of expertise in enterprise software, security, data, and cloud computing. Yair has held senior product management positions at Datadog, Digital Asset, and Microsoft Azure Protection.

Subscribe

Latest Blog Posts

Gilad Golani
Gilad Golani
November 27, 2025
3
Min Read

Unstructured Data Is 80% of Your Risk: Why DSPM 1.0 Vendors, Like Varonis and Cyera, Fail to Protect It at Petabyte Scale

Unstructured Data Is 80% of Your Risk: Why DSPM 1.0 Vendors, Like Varonis and Cyera, Fail to Protect It at Petabyte Scale

Unstructured data is the fastest-growing, least-governed, and most dangerous class of enterprise data. Emails, Slack messages, PDFs, screenshots, presentations, code repositories, logs, and the endless stream of GenAI-generated content — this is where the real risk lives.

The Unstructured data dilemma is this: 80% of your organization’s data is essentially invisible to your current security tools, and the volume is climbing by up to 65% each year. This isn’t just a hypothetical - it’s the reality for enterprises as unstructured data spreads across cloud and SaaS platforms. Yet, most Data Security Posture Management (DSPM) solutions - often called DSPM 1.0 - were never built to handle this explosion at petabyte scale. Especially legacy vendors and first-generation players like Cyera — were never designed to handle unstructured data at scale. Their architectures, classification engines, and scanning models break under real enterprise load.

Looking ahead to 2026, unstructured data security risk stands out as the single largest blind spot in enterprise security. If overlooked, it won’t just cause compliance headaches and soaring breach costs - it could put your organization in the headlines for all the wrong reasons.

The 80% Problem: Unstructured Data Dominates Your Risk

The Scale You Can’t Ignore - Over 80% of enterprise data is unstructured

  • Unstructured data is growing 55-65% per year; by 2025, the world will store more than 180 zettabytes of it.
  • 95% of organizations say unstructured data management is a critical challenge but less than 40% of data security budgets address this high-risk area. Unstructured data is everywhere: cloud object stores, SaaS apps, collaboration tools, and legacy file shares. Unlike structured data in databases, it often lacks consistent metadata, access controls, or even basic visibility. This “dark data” is behind countless breaches, from accidental file exposures and overshared documents to sensitive AI training datasets left unmonitored.

The Business Impact - The average breach now costs $4-4.9M, with unstructured data often at the center.

  • Poor data quality, mostly from unstructured sources, costs the U.S. economy $3.1 trillion each year.
  • More than half of organizations report at least one non-compliance incident annually, with average costs topping $1M. The takeaway: Unstructured data isn’t just a storage problem.

Why DSPM 1.0 Fails: The Blind Spots of Legacy Approaches

Traditional Tools Fall Short in Cloud-First, Petabyte-Scale Environments

Legacy DSPM and DCAP solutions, such as Varonis or Netwrix - were built for an era when data lived on-premises, followed predictable structures, and grew at a manageable pace.

In today’s cloud-first reality, their limitations have become impossible to ignore:

  • Discovery Gaps: Agent-based scanning can’t keep up with sprawling, constantly changing cloud and SaaS environments. Shadow and dark data across platforms like Google Drive, Dropbox, Slack, and AWS S3 often go unseen.
  • Performance Limits: Once environments exceed 100 TB, and especially as they reach petabyte scale—these tools slow dramatically or miss data entirely.
  • Manual Classification: Most legacy tools rely on static pattern matching and keyword rules, causing them to miss sensitive information hidden in natural language, code, images, or unconventional file formats.
  • Limited Automation: They generate alerts but offer little or no automated remediation, leaving security teams overwhelmed and forcing manual cleanup.
  • Siloed Coverage: Solutions designed for on-premises or single-cloud deployments create dangerous blind spots as organizations shift to multi-cloud and hybrid architectures.

Example: Collaboration App Exposure

A global enterprise recently discovered thousands of highly sensitive files—contracts, intellectual property, and PII—were unintentionally shared with “anyone with the link” inside a cloud collaboration platform. Their legacy DSPM tool failed to identify the exposure because it couldn’t scan within the app or detect real-time sharing changes.

Further, even Emerging DSPM tools often rely on pattern matching or LLM-based scanning. These approaches also fail for three reasons:

  • Inaccuracy at scale: LLMs hallucinate, mislabel, and require enormous compute.
  • Cost blow-ups: Vendors pass massive cloud bills back to customers or incur inordinate compute cost.
  • Architectural limitations: Without clustering and elastic scaling, large datasets overwhelm the system.

This is exactly where Cyera and legacy tools struggle - and where Sentra’s SLM-powered classifier thrives with >99% accuracy at a fraction of the cost.

The New Mandate: Securing Unstructured Data in 2026 and Beyond

GenAI, and stricter privacy laws (GDPR, CCPA, HIPAA) have raised the stakes for unstructured data security. Gartner now recommends Data Access Governance (DAG) and AI-driven classification to reduce oversharing and prepare for AI-centric workloads.

What Modern Security Leaders Need - Agentless, Real-Time Discovery: No deployment hassles, continuous visibility, and coverage for unstructured data stores no matter where they live.

  • Petabyte-Scale Performance: Scan, classify, and risk-score all data, everywhere it lives.
  • AI-Driven Deep Classification: Use of natural language processing (NLP), Domain-specific  Small Language Models (SLMs), and context analysis for every unstructured format.
  • Automated Remediation: Playbooks that fix exposures, govern permissions, and ensure compliance without manual work.
  • Multi-Cloud & SaaS Coverage: Security that follows your data, wherever it goes.

Sentra: Turning the 80% Blind Spot into a Competitive Advantage

Sentra was built specifically to address the risks of unstructured data in 2026 and beyond. There are nuances involved in solving this.  Selecting an appropriate solution is key to a sustainable approach. Here’s what sets Sentra apart:
 

  • Agentless Discovery Across All Environments:Instantly scans and classifies unstructured data across AWS, Azure, Google, M365, Dropbox, legacy file shares, and more - no agents required, no blind spots left behind.
  • Petabyte-Tested Performance:Designed for Fortune 500 scale, Sentra keeps speed and accuracy high across petabytes, not just terabytes.
  • AI-Powered Deep Classification:Our platform uses advanced NLP, SLMs, and context-aware algorithms to classify, label, and risk-score every file - including code, images, and AI training data, not just structured fields.
  • Continuous, Context-Rich Visibility:Real-time risk scoring, identity and access mapping, and automated data lineage show not just where data lives, but who can access it and how it’s used.
  • Automated Remediation and Orchestration: Sentra goes beyond alerts. Built-in playbooks fix permissions, restrict sharing, and enforce policies within seconds.
  • Compliance-First, Audit-Ready: Quickly spot compliance gaps, generate audit trails, and reduce regulatory risk and reporting costs.     

During a recent deployment with a global financial services company, Sentra uncovered 40% more exposed sensitive files than their previous DSPM tool. Automated remediation covered over 10 million documents across three clouds, cutting manual investigation time by 80%.

Actionable Takeaways for Security Leaders 

1. Put Unstructured Data at the Center of Your 2026 Security Plan: Make sure your DSPM strategy covers all data, especially “dark” and shadow data in SaaS, object stores, and collaboration platforms.

2.  Choose Agentless, AI-Driven Discovery: Legacy, agent-based tools can’t keep up. And underperforming emerging tools may not adequately scale.  Look for continuous, automated scanning and classification that scales with your data.

3.  Automate Remediation Workflows: Visibility is just the start; your platform should fix exposures and enforce policies in real time.

4.  Adopt Multi-Cloud, SaaS-Agnostic Solutions: Your data is everywhere, and your security should be too. Ensure your solution supports all of your unstructured data repositories.

5.  Make Compliance Proactive: Use real-time risk scoring and automated reporting to stay ahead of auditors and regulators.

    

Conclusion: Ready for the 80% Challenge?

With petabyte-scale, cloud-first data, ignoring unstructured data risk is no longer an option. Traditional DSPM tools can’t keep up, leaving most of your data - and your business - vulnerable. Sentra’s agentless, AI-powered platform closes this gap, delivering the discovery, classification, and automated response you need to turn your biggest blind spot into your strongest defense. See how Sentra uncovers your hidden risk - book an instant demo today.

Don’t let unstructured data be your organization’s Achilles’ heel. With Sentra, enterprises finally have a way to secure the data that matters most.

<blogcta-big>

Read More
David Stuart
David Stuart
Nikki Ralston
Nikki Ralston
November 24, 2025
3
Min Read

Third-Party OAuth Apps Are the New Shadow Data Risk: Lessons from the Gainsight/Salesforce Incident

Third-Party OAuth Apps Are the New Shadow Data Risk: Lessons from the Gainsight/Salesforce Incident

The recent exposure of customer data through a compromised Gainsight integration within Salesforce environments is more than an isolated event - it’s a sign of a rapidly evolving class of SaaS supply-chain threats. Even trusted AppExchange partners can inadvertently create access pathways that attackers exploit, especially when OAuth tokens and machine-to-machine connections are involved. This post explores what happened, why today’s security tooling cannot fully address this scenario, and how data-centric visibility and identity governance can meaningfully reduce the blast radius of similar breaches.

A Recap of the Incident

In this case, attackers obtained sensitive credentials tied to a Gainsight integration used by multiple enterprises. Those credentials allowed adversaries to generate valid OAuth tokens and access customer Salesforce orgs, in some cases with extensive read capabilities. Neither Salesforce nor Gainsight intentionally misconfigured their systems. This was not a product flaw in either platform. Instead, the incident illustrates how deeply interconnected SaaS environments have become and how the security of one integration can impact many downstream customers.

Understanding the Kill Chain: From Stolen Secrets to Salesforce Lateral Movement

The attackers’ pathway followed a pattern increasingly common in SaaS-based attacks. It began with the theft of secrets; likely API keys, OAuth client secrets, or other credentials that often end up buried in repositories, CI/CD logs, or overlooked storage locations. Once in hand, these secrets enabled the attackers to generate long-lived OAuth tokens, which are designed for application-level access and operate outside MFA or user-based access controls.

What makes OAuth tokens particularly powerful is that they inherit whatever permissions the connected app holds. If an integration has broad read access, which many do for convenience or legacy reasons, an attacker who compromises its token suddenly gains the same level of visibility. Inside Salesforce, this enabled lateral movement across objects, records, and reporting surfaces far beyond the intended scope of the original integration. The entire kill chain was essentially a progression from a single weakly-protected secret to high-value data access across multiple Salesforce tenants.

Why Traditional SaaS Security Tools Missed This

Incident response teams quickly learned what many organizations are now realizing: traditional CASBs and CSPMs don’t provide the level of identity-to-data context necessary to detect or prevent OAuth-driven supply-chain attacks.

CASBs primarily analyze user behavior and endpoint connections, but OAuth apps are “non-human identities” - they don’t log in through browsers or trigger interactive events. CSPMs, in contrast, focus on cloud misconfigurations and posture, but they don’t understand the fine-grained data models of SaaS platforms like Salesforce. What was missing in this incident was visibility into how much sensitive data the Gainsight connector could access and whether the privileges it held were appropriate or excessive. Without that context, organizations had no meaningful way to spot the risk until the compromise became public.

Sentra Helps Prevent and Contain This Attack Pattern

Sentra’s approach is fundamentally different because it starts with data: what exists, where it resides, who or what can access it, and whether that access is appropriate. Rather than treating Salesforce or other SaaS platforms as black boxes, Sentra maps the data structures inside them, identifies sensitive records, and correlates that information with identity permissions including third-party apps, machine identities, and OAuth sessions.

One key pillar of Sentra’s value lies in its DSPM capabilities. The platform identifies sensitive data across all repositories, including cloud storage, SaaS environments, data warehouses, code repositories, collaboration platforms, and even on-prem file systems. Because Sentra also detects secrets such as API keys, OAuth credentials, private keys, and authentication tokens across these environments, it becomes possible to catch compromised or improperly stored secrets before an attacker ever uses them to access a SaaS platform.

OAuth 2.0 Access Token

Another area where this becomes critical is the detection of over-privileged connected apps. Sentra continuously evaluates the scopes and permissions granted to integrations like Gainsight, identifying when either an app or an identity holds more access than its business purpose requires. This type of analysis would have revealed that a compromised integrated app could see far more data than necessary, providing early signals of elevated risk long before an attacker exploited it.

Sentra further tracks the health and behavior of non-human identities. Service accounts and connectors often rely on long-lived credentials that are rarely rotated and may remain active long after the responsible team has changed. Sentra identifies these stale or overly permissive identities and highlights when their behavior deviates from historical norms. In the context of this incident type, that means detecting when a connector suddenly begins accessing objects it never touched before or when large volumes of data begin flowing to unexpected locations or IP ranges.

Finally, Sentra’s behavior analytics (part of DDR) help surface early signs of misuse. Even if an attacker obtains valid OAuth tokens, their data access patterns, query behavior, or geography often diverge from the legitimate integration. By correlating anomalous activity with the sensitivity of the data being accessed, Sentra can detect exfiltration patterns in real time—something traditional tools simply aren’t designed to do.

The 2026 Outlook: More Incidents Are Coming

The Gainsight/Salesforce incident is unlikely to be the last of its kind. The speed at which enterprises adopt SaaS integrations far exceeds the rate at which they assess the data exposure those integrations create. OAuth-based supply-chain attacks are growing quickly because they allow adversaries to compromise one provider and gain access to dozens or hundreds of downstream environments. Given the proliferation of partner ecosystems, machine identities, and unmonitored secrets, this attack vector will continue to scale.

Prediction:
Unless enterprises add data-centric SaaS visibility and identity-aware DSPM, we should expect three to five more incidents of similar magnitude before summer 2026.

Conclusion

The real lesson from the Gainsight/Salesforce breach is not to reduce reliance on third-party SaaS providers as modern business would grind to a halt without them. The lesson is that enterprises must know where their sensitive data lives, understand exactly which identities and integrations can access it, and ensure those privileges are continuously validated. Sentra provides that visibility and contextual intelligence, making it possible to identify the risks that made this breach possible and help to prevent the next one.

<blogcta-big>

Read More
David Stuart
David Stuart
November 24, 2025
3
Min Read

Securing Unstructured Data in Microsoft 365: The Case for Petabyte-Scale, AI-Driven Classification

Securing Unstructured Data in Microsoft 365: The Case for Petabyte-Scale, AI-Driven Classification

The modern enterprise runs on collaboration and nothing powers that more than Microsoft 365. From Exchange Online and OneDrive to SharePoint, Teams, and Copilot workflows, M365 hosts a massive and ever-growing volume of unstructured content: documents, presentations, spreadsheets, image files, chats, attachments, and more.

Yet unstructured = harder to govern. Unlike tidy database tables with defined schemas, unstructured repositories flood in with ambiguous content types, buried duplicates, or unused legacy files. It’s in these stacks that sensitive IP, model training data, or derivative work can quietly accumulate, and then leak.

Consider this: one recent study found that more than 81 % of IT professionals report data-loss events in M365 environments. And to make matters worse, according to the International Data Corporation (IDC), 60% of organizations do not have a strategy for protecting their critical business data that resides in Microsoft 365.

Why Traditional Tools Struggle

  • Built-in classification tools (e.g., M365’s native capabilities) often rely on pattern matching or simple keywords, and therefore struggle with accuracy, context, scale and derivative content.

  • Many solutions only surface that a file exists and carries a type label - but stop short of mapping who or what can access it, its purpose, and what its downstream exposure might be.

  • GenAI workflows now pump massive volumes of unstructured data into copilots, knowledge bases, training sets - creating new blast radii that legacy DLP or labeling tools weren’t designed to catch.

What a Modern Platform Must Deliver

  1. High-accuracy, petabyte-scale classification of unstructured data (so you know what you have, where it sits, and how sensitive it is). And it must keep pace with explosive data growth and do so cost efficiently.

  2. Unified Data Access Governance (DAG) - mapping identities (users, service principals, agents), permissions, implicit shares, federated/cloud-native paths across M365 and beyond.
  3. Data Detection & Response (DDR) - continuous monitoring of data movement, copies, derivative creation, AI agent interactions, and automated response/remediation.

How Sentra addresses this in M365

Assets contain plain text credit card numbers

At Sentra, we’ve built a cloud-native data-security platform specifically to address this triad of capabilities - and we extend that deeply into M365 (OneDrive, SharePoint, Teams, Exchange Online) and other SaaS platforms.

  • A newly announced AI Classifier for Unstructured Data accelerates and improves classification across M365’s unstructured repositories (see: Sentra launches breakthrough unstructured-data AI classification capabilities).

  • Petabyte-scale processing: our architecture supports classification and monitoring of massive file estates without astronomical cost or time-to-value.

  • Seamless support for M365 services: read/write access, ingestion, classification, access-graph correlation, detection of shadow/unmanaged copies across OneDrive and SharePoint—plus integration into our DAG and DDR layers (see our guide: How to Secure Regulated Data in Microsoft 365 + Copilot).

  • Cost-efficient deployment: designed for high scale without breaking the budget or massive manual effort.

The Bottom Line

In today’s cloud/AI era, saying “we discovered the PII in my M365 tenant” isn’t enough.

The real question is: Do I know who or what (user/agent/app) can access that content, what its business purpose is, and whether it’s already been copied or transformed into a risk vector?


If your solution can’t answer that, your unstructured data remains a silent, high-stakes liability and resolving concerns becomes a very costly, resource-draining burden. By embracing a platform that combines classification accuracy, petabyte-scale processing, unified DSPM + DAG + DDR, and deep M365 support, you move from “hope I’m secure” to “I know I’m secure.”

Want to see how it works in a real M365 setup? Check out our video or book a demo.

<blogcta-big>

Read More
decorative ball
Expert Data Security Insights Straight to Your Inbox
What Should I Do Now:
1

Get the latest GigaOm DSPM Radar report - see why Sentra was named a Leader and Fast Mover in data security. Download now and stay ahead on securing sensitive data.

2

Sign up for a demo and learn how Sentra’s data security platform can uncover hidden risks, simplify compliance, and safeguard your sensitive data.

3

Follow us on LinkedIn, X (Twitter), and YouTube for actionable expert insights on how to strengthen your data security, build a successful DSPM program, and more!

Before you go...

Get the Gartner Customers' Choice for DSPM Report

Read why 98% of users recommend Sentra.

Gartner Certificate for Sentra