Sentra Launches Breakthrough AI Classification Capabilities!
All Resources
In this article:
minus iconplus icon
Share the Blog

AI & Data Privacy: Challenges and Tips for Security Leaders

June 26, 2024
3
Min Read
Data Security

Balancing Trust and Unpredictability in AI

AI systems represent a transformative advancement in technology, promising innovative progress across various industries. Yet, their inherent unpredictability introduces significant concerns, particularly regarding data security and privacy. Developers face substantial challenges in ensuring the integrity and reliability of AI models amidst this unpredictability.

This uncertainty complicates matters for buyers, who rely on trust when investing in AI products. Establishing and maintaining trust in AI necessitates rigorous testing, continuous monitoring, and transparent communication regarding potential risks and limitations. Developers must implement robust safeguards, while buyers benefit from being informed about these measures to mitigate risks effectively.

AI and Data Privacy

Data privacy is a critical component of AI security. As AI systems often rely on vast amounts of personal data to function effectively, ensuring the privacy and security of this data is paramount. Breaches of data privacy can lead to severe consequences, including identity theft, financial loss, and erosion of trust in AI technologies. Developers must implement stringent data protection measures, such as encryption, anonymization, and secure data storage, to safeguard user information.

The Role of Data Privacy Regulations in AI Development

Data privacy regulations are playing an increasingly significant role in the development and deployment of AI technologies. As AI continues to advance globally, regulatory frameworks are being established to ensure the ethical and responsible use of these powerful tools.

  • Europe:

The European Parliament has approved the AI Act, a comprehensive regulatory framework designed to govern AI technologies. This Act is set to be completed by June and will become fully applicable 24 months after its entry into force, with some provisions becoming effective even sooner. The AI Act aims to balance innovation with stringent safeguards to protect privacy and prevent misuse of AI.

  • California:

In the United States, California is at the forefront of AI regulation. A bill concerning AI and its training processes has progressed through legislative stages, having been read for the second time and now ordered for a third reading. This bill represents a proactive approach to regulating AI within the state, reflecting California's leadership in technology and data privacy.

  • Self-Regulation:

In addition to government-led initiatives, there are self-regulation frameworks available for companies that wish to proactively manage their AI operations. The National Institute of Standards and Technology (NIST) AI Risk Management Framework (RMF) and the ISO/IEC 42001 standard provide guidelines for developing trustworthy AI systems. Companies that adopt these standards not only enhance their operational integrity but also position themselves to better align with future regulatory requirements.

  • NIST Model for a Trustworthy AI System:

The NIST model outlines key principles for developing AI systems that are ethical, accountable, and transparent. This framework emphasizes the importance of ensuring that AI technologies are reliable, secure, and unbiased. By adhering to these guidelines, organizations can build AI systems that earn public trust and comply with emerging regulatory standards.Understanding and adhering to these regulations and frameworks is crucial for any organization involved in AI development. Not only do they help in safeguarding privacy and promoting ethical practices, but they also prepare organizations to navigate the evolving landscape of AI governance effectively.

How to Build Secure AI Products

Ensuring the integrity of AI products is crucial for protecting users from potential harm caused by errors, biases, or unintended consequences of AI decisions. Safe AI products foster trust among users, which is essential for the widespread adoption and positive impact of AI technologies. These technologies have an increasing effect on various aspects of our lives, from healthcare and finance to transportation and personal devices, making it such a critical topic to focus on. 

How can developers build secure AI products?

  1. Remove sensitive data from training data (pre-training): Addressing this task is challenging, due to the vast amounts of data involved in AI-training, and the lack of automated methods to detect all types of  sensitive data.
  2. Test the model for privacy compliance (pre-production): Like any software, both manual tests and automated tests are done before production. But, how can users guarantee that sensitive data isn’t exposed during testing? Developers must explore innovative approaches to automate this process and ensure continuous monitoring of privacy compliance throughout the development lifecycle.
  3. Implement proactive monitoring in production: Even with thorough pre-production testing, no model can guarantee complete immunity from privacy violations in real-world scenarios. Continuous monitoring during production is essential to promptly detect and address any unexpected privacy breaches. Leveraging advanced anomaly detection techniques and real-time monitoring systems can help developers identify and mitigate potential risks promptly.

Secure LLMs Across the Entire Development Pipeline With Sentra

Gain Comprehensive Visibility and Secure Training Data (Sentra’s DSPM)

  • Automatically discover and classify sensitive information within your training datasets.
  • Protect against unauthorized access with robust security measures.
  • Continuously monitor your security posture to identify and remediate vulnerabilities.

Monitor Models in Real Time (Sentra’s DDR)

  • Detect potential leaks of sensitive data by continuously monitoring model activity logs.
  • Proactively identify threats such as data poisoning and model theft.
  • Seamlessly integrate with your existing CI/CD and production systems for effortless deployment.

Finally, Sentra helps you effortlessly comply with industry regulations like NIST AI RMF and ISO/IEC 42001, preparing you for future governance requirements. This comprehensive approach minimizes risks and empowers developers to confidently state:

"This model was thoroughly tested for privacy safety using Sentra," fostering trust in your AI initiatives.

As AI continues to redefine industries, prioritizing data privacy is essential for responsible AI development. Implementing stringent data protection measures, adhering to evolving regulatory frameworks, and maintaining proactive monitoring throughout the AI lifecycle are crucial.
 

By prioritizing strong privacy measures from the start, developers not only build trust in AI technologies but also maintain ethical standards essential for long-term use and societal approval.

<blogcta-big>

Discover Ron’s expertise, shaped by over 20 years of hands-on tech and leadership experience in cybersecurity, cloud, big data, and machine learning. As a serial entrepreneur and seed investor, Ron has contributed to the success of several startups, including Axonius, Firefly, Guardio, Talon Cyber Security, and Lightricks, after founding a company acquired by Oracle.

Subscribe

Latest Blog Posts

David Stuart
David Stuart
Nikki Ralston
Nikki Ralston
November 24, 2025
3
Min Read

Third-Party OAuth Apps Are the New Shadow Data Risk: Lessons from the Gainsight/Salesforce Incident

Third-Party OAuth Apps Are the New Shadow Data Risk: Lessons from the Gainsight/Salesforce Incident

The recent exposure of customer data through a compromised Gainsight integration within Salesforce environments is more than an isolated event - it’s a sign of a rapidly evolving class of SaaS supply-chain threats. Even trusted AppExchange partners can inadvertently create access pathways that attackers exploit, especially when OAuth tokens and machine-to-machine connections are involved. This post explores what happened, why today’s security tooling cannot fully address this scenario, and how data-centric visibility and identity governance can meaningfully reduce the blast radius of similar breaches.

A Recap of the Incident

In this case, attackers obtained sensitive credentials tied to a Gainsight integration used by multiple enterprises. Those credentials allowed adversaries to generate valid OAuth tokens and access customer Salesforce orgs, in some cases with extensive read capabilities. Neither Salesforce nor Gainsight intentionally misconfigured their systems. This was not a product flaw in either platform. Instead, the incident illustrates how deeply interconnected SaaS environments have become and how the security of one integration can impact many downstream customers.

Understanding the Kill Chain: From Stolen Secrets to Salesforce Lateral Movement

The attackers’ pathway followed a pattern increasingly common in SaaS-based attacks. It began with the theft of secrets; likely API keys, OAuth client secrets, or other credentials that often end up buried in repositories, CI/CD logs, or overlooked storage locations. Once in hand, these secrets enabled the attackers to generate long-lived OAuth tokens, which are designed for application-level access and operate outside MFA or user-based access controls.

What makes OAuth tokens particularly powerful is that they inherit whatever permissions the connected app holds. If an integration has broad read access, which many do for convenience or legacy reasons, an attacker who compromises its token suddenly gains the same level of visibility. Inside Salesforce, this enabled lateral movement across objects, records, and reporting surfaces far beyond the intended scope of the original integration. The entire kill chain was essentially a progression from a single weakly-protected secret to high-value data access across multiple Salesforce tenants.

Why Traditional SaaS Security Tools Missed This

Incident response teams quickly learned what many organizations are now realizing: traditional CASBs and CSPMs don’t provide the level of identity-to-data context necessary to detect or prevent OAuth-driven supply-chain attacks.

CASBs primarily analyze user behavior and endpoint connections, but OAuth apps are “non-human identities” - they don’t log in through browsers or trigger interactive events. CSPMs, in contrast, focus on cloud misconfigurations and posture, but they don’t understand the fine-grained data models of SaaS platforms like Salesforce. What was missing in this incident was visibility into how much sensitive data the Gainsight connector could access and whether the privileges it held were appropriate or excessive. Without that context, organizations had no meaningful way to spot the risk until the compromise became public.

Sentra Helps Prevent and Contain This Attack Pattern

Sentra’s approach is fundamentally different because it starts with data: what exists, where it resides, who or what can access it, and whether that access is appropriate. Rather than treating Salesforce or other SaaS platforms as black boxes, Sentra maps the data structures inside them, identifies sensitive records, and correlates that information with identity permissions including third-party apps, machine identities, and OAuth sessions.

One key pillar of Sentra’s value lies in its DSPM capabilities. The platform identifies sensitive data across all repositories, including cloud storage, SaaS environments, data warehouses, code repositories, collaboration platforms, and even on-prem file systems. Because Sentra also detects secrets such as API keys, OAuth credentials, private keys, and authentication tokens across these environments, it becomes possible to catch compromised or improperly stored secrets before an attacker ever uses them to access a SaaS platform.

OAuth 2.0 Access Token

Another area where this becomes critical is the detection of over-privileged connected apps. Sentra continuously evaluates the scopes and permissions granted to integrations like Gainsight, identifying when either an app or an identity holds more access than its business purpose requires. This type of analysis would have revealed that a compromised integrated app could see far more data than necessary, providing early signals of elevated risk long before an attacker exploited it.

Sentra further tracks the health and behavior of non-human identities. Service accounts and connectors often rely on long-lived credentials that are rarely rotated and may remain active long after the responsible team has changed. Sentra identifies these stale or overly permissive identities and highlights when their behavior deviates from historical norms. In the context of this incident type, that means detecting when a connector suddenly begins accessing objects it never touched before or when large volumes of data begin flowing to unexpected locations or IP ranges.

Finally, Sentra’s behavior analytics (part of DDR) help surface early signs of misuse. Even if an attacker obtains valid OAuth tokens, their data access patterns, query behavior, or geography often diverge from the legitimate integration. By correlating anomalous activity with the sensitivity of the data being accessed, Sentra can detect exfiltration patterns in real time—something traditional tools simply aren’t designed to do.

The 2026 Outlook: More Incidents Are Coming

The Gainsight/Salesforce incident is unlikely to be the last of its kind. The speed at which enterprises adopt SaaS integrations far exceeds the rate at which they assess the data exposure those integrations create. OAuth-based supply-chain attacks are growing quickly because they allow adversaries to compromise one provider and gain access to dozens or hundreds of downstream environments. Given the proliferation of partner ecosystems, machine identities, and unmonitored secrets, this attack vector will continue to scale.

Prediction:
Unless enterprises add data-centric SaaS visibility and identity-aware DSPM, we should expect three to five more incidents of similar magnitude before summer 2026.

Conclusion

The real lesson from the Gainsight/Salesforce breach is not to reduce reliance on third-party SaaS providers as modern business would grind to a halt without them. The lesson is that enterprises must know where their sensitive data lives, understand exactly which identities and integrations can access it, and ensure those privileges are continuously validated. Sentra provides that visibility and contextual intelligence, making it possible to identify the risks that made this breach possible and help to prevent the next one.

<blogcta-big>

Read More
David Stuart
David Stuart
November 24, 2025
3
Min Read

Securing Unstructured Data in Microsoft 365: The Case for Petabyte-Scale, AI-Driven Classification

Securing Unstructured Data in Microsoft 365: The Case for Petabyte-Scale, AI-Driven Classification

The modern enterprise runs on collaboration and nothing powers that more than Microsoft 365. From Exchange Online and OneDrive to SharePoint, Teams, and Copilot workflows, M365 hosts a massive and ever-growing volume of unstructured content: documents, presentations, spreadsheets, image files, chats, attachments, and more.

Yet unstructured = harder to govern. Unlike tidy database tables with defined schemas, unstructured repositories flood in with ambiguous content types, buried duplicates, or unused legacy files. It’s in these stacks that sensitive IP, model training data, or derivative work can quietly accumulate, and then leak.

Consider this: one recent study found that more than 81 % of IT professionals report data-loss events in M365 environments. And to make matters worse, according to the International Data Corporation (IDC), 60% of organizations do not have a strategy for protecting their critical business data that resides in Microsoft 365.

Why Traditional Tools Struggle

  • Built-in classification tools (e.g., M365’s native capabilities) often rely on pattern matching or simple keywords, and therefore struggle with accuracy, context, scale and derivative content.

  • Many solutions only surface that a file exists and carries a type label - but stop short of mapping who or what can access it, its purpose, and what its downstream exposure might be.

  • GenAI workflows now pump massive volumes of unstructured data into copilots, knowledge bases, training sets - creating new blast radii that legacy DLP or labeling tools weren’t designed to catch.

What a Modern Platform Must Deliver

  1. High-accuracy, petabyte-scale classification of unstructured data (so you know what you have, where it sits, and how sensitive it is). And it must keep pace with explosive data growth and do so cost efficiently.

  2. Unified Data Access Governance (DAG) - mapping identities (users, service principals, agents), permissions, implicit shares, federated/cloud-native paths across M365 and beyond.
  3. Data Detection & Response (DDR) - continuous monitoring of data movement, copies, derivative creation, AI agent interactions, and automated response/remediation.

How Sentra addresses this in M365

Assets contain plain text credit card numbers

At Sentra, we’ve built a cloud-native data-security platform specifically to address this triad of capabilities - and we extend that deeply into M365 (OneDrive, SharePoint, Teams, Exchange Online) and other SaaS platforms.

  • A newly announced AI Classifier for Unstructured Data accelerates and improves classification across M365’s unstructured repositories (see: Sentra launches breakthrough unstructured-data AI classification capabilities).

  • Petabyte-scale processing: our architecture supports classification and monitoring of massive file estates without astronomical cost or time-to-value.

  • Seamless support for M365 services: read/write access, ingestion, classification, access-graph correlation, detection of shadow/unmanaged copies across OneDrive and SharePoint—plus integration into our DAG and DDR layers (see our guide: How to Secure Regulated Data in Microsoft 365 + Copilot).

  • Cost-efficient deployment: designed for high scale without breaking the budget or massive manual effort.

The Bottom Line

In today’s cloud/AI era, saying “we discovered the PII in my M365 tenant” isn’t enough.

The real question is: Do I know who or what (user/agent/app) can access that content, what its business purpose is, and whether it’s already been copied or transformed into a risk vector?


If your solution can’t answer that, your unstructured data remains a silent, high-stakes liability and resolving concerns becomes a very costly, resource-draining burden. By embracing a platform that combines classification accuracy, petabyte-scale processing, unified DSPM + DAG + DDR, and deep M365 support, you move from “hope I’m secure” to “I know I’m secure.”

Want to see how it works in a real M365 setup? Check out our video or book a demo.

<blogcta-big>

Read More
Ofir Yehoshua
Ofir Yehoshua
November 17, 2025
4
Min Read

How to Gain Visibility and Control in Petabyte-Scale Data Scanning

How to Gain Visibility and Control in Petabyte-Scale Data Scanning

Every organization today is drowning in data - millions of assets spread across cloud platforms, on-premises systems, and an ever-expanding landscape of SaaS tools. Each asset carries value, but also risk. For security and compliance teams, the mandate is clear: sensitive data must be inventoried, managed and protected.

Scanning every asset for security and compliance is no longer optional, it’s the line between trust and exposure, between resilience and chaos.

Many data security tools promise to scan and classify sensitive information across environments. In practice, doing this effectively and at scale, demands more than raw ‘brute force’ scanning power. It requires robust visibility and management capabilities: a cockpit view that lets teams monitor coverage, prioritize intelligently, and strike the right balance between scan speed, cost, and accuracy.

Why Scan Tracking Is Crucial

Scanning is not instantaneous. Depending on the size and complexity of your environment, it can take days - sometimes even weeks to complete. Meanwhile, new data is constantly being created or modified, adding to the challenge.

Without clear visibility into the scanning process, organizations face several critical obstacles:

  • Unclear progress: It’s often difficult to know what has already been scanned, what is currently in progress, and what remains pending. This lack of clarity creates blind spots that undermine confidence in coverage.

  • Time estimation gaps: In large environments, it’s hard to know how long scans will take because so many factors come into play — the number of assets, their size, the type of data - structured, semi-structured, or unstructured, and how much scanner capacity is available. As a result, predicting when you’ll reach full coverage is tricky. This becomes especially stressful when scans need to be completed before a fixed deadline, like a compliance audit. 

    "With Sentra’s Scan Dashboard, we were able to quickly scale up our scanners to meet a tight audit deadline, finish on time, and then scale back down to save costs. The visibility and control it gave us made the whole process seamless”, said CISO of Large Retailer.
  • Poor prioritization: Not all environments or assets carry the same importance. Yet without visibility into scan status, teams struggle to balance historical scans of existing assets with the ongoing influx of newly created data, making it nearly impossible to prioritize effectively based on risk or business value.

Sentra’s End-to-End Scanning Workflow

Managing scans at petabyte scale is complex. Sentra streamlines the process with a workflow built for scale, clarity, and control that features:

1. Comprehensive Asset Discovery

Before scanning even begins, Sentra automatically discovers assets across cloud platforms, on-premises systems, and SaaS applications. This ensures teams have a complete, up-to-date inventory and visual map of their data landscape, so no environment or data store is overlooked.

Example: New S3 buckets, a freshly deployed BigQuery dataset, or a newly connected SharePoint site are automatically identified and added to the inventory.

Comprehensive Asset Discovery with Sentra

2. Configurable Scan Management

Administrators can fine-tune how scans are executed to meet their organization’s needs. With flexible configuration options, such as number of scanners, sampling rates, and prioritization rules - teams can strike the right balance between scan speed, coverage, and cost control.

For instance, compliance-critical assets can be scanned at full depth immediately, while less critical environments can run at reduced sampling to save on compute consumption and costs.

3. Real-Time Scan Dashboard

Sentra’s unified Scan Dashboard provides a cockpit view into scanning operations, so teams always know where they stand. Key features include:

  • Daily scan throughput correlated with the number of active scanners, helping teams understand efficiency and predict completion times.
  • Coverage tracking that visualizes overall progress and highlights which assets remain unscanned.
  • Decision-making tools that allow teams to dynamically adjust, whether by adding scanner capacity, changing sampling rates, or reordering priorities when new high-risk assets appear.
Real-Time Scan Dashboard with Sentra

Handling Data Changes

The challenge doesn’t end once the initial scans are complete. Data is dynamic, new files are added daily, existing records are updated, and sensitive information shifts locations. Sentra’s activity feeds give teams the visibility they need to understand how their data landscape is evolving and adapt their data security strategies in real time.


Conclusion

Tracking scan status at scale is complex but critical to any data security strategy. Sentra provides an end-to-end view and unmatched scan control, helping organizations move from uncertainty to confidence with clear prediction of scan timelines, faster troubleshooting, audit-ready compliance, and smarter, cost-efficient decisions for securing data.

<blogcta-big>

Read More
decorative ball
Expert Data Security Insights Straight to Your Inbox
What Should I Do Now:
1

Get the latest GigaOm DSPM Radar report - see why Sentra was named a Leader and Fast Mover in data security. Download now and stay ahead on securing sensitive data.

2

Sign up for a demo and learn how Sentra’s data security platform can uncover hidden risks, simplify compliance, and safeguard your sensitive data.

3

Follow us on LinkedIn, X (Twitter), and YouTube for actionable expert insights on how to strengthen your data security, build a successful DSPM program, and more!

Before you go...

Get the Gartner Customers' Choice for DSPM Report

Read why 98% of users recommend Sentra.

Gartner Certificate for Sentra