All Resources
In this article:
minus iconplus icon
Share the Blog

Understanding Data Movement to Avert Proliferation Risks

April 10, 2024
4
Min Read
Data Sprawl

Understanding the perils your cloud data faces as it proliferates throughout your organization and ecosystems is a monumental task in the highly dynamic business climate we operate in. Being able to see data as it is being copied and travels, monitor its activity and access, and assess its posture allows teams to understand and better manage the full effect of data sprawl.

 

It ‘connects the dots’ for security analysts who must continually evaluate true risks and threats to data so they can prioritize their efforts. Data similarity and movement are important behavioral indicators in assessing and addressing those risks. This blog will explore this topic in depth.

What Is Data Movement

Data movement is the process of transferring data from one location or system to another – from A to B. This transfer can be between storage locations, databases, servers, or network locations. Copying data from one location to another is simple, however, data movement can get complicated when managing volume, velocity, and variety.

  • Volume: Handling large amounts of data.
  • Velocity: Overseeing the pace of data generation and processing.
  • Variety: Managing a variety of data types.

How Data Moves in the Cloud

Data is free and can be shared anywhere. The way organizations leverage data is an integral part of their success. Although there are many business benefits to moving and sharing data (at a rapid pace), there are also many concerns that arise, mainly dealing with privacy, compliance, and security. Data needs to move quickly, securely, and have the proper security posture at all times.  

These are the main ways that data moves in the cloud:

1. Data Distribution in Internal Services: Internal services and applications manage data, saving it across various locations and data stores.

2. ETLs: Extract, Transform, Load processes, involve combining data from multiple sources into a central repository known as a data warehouse. This centralized view supports applications in aggregating diverse data points for organizational use.

3. Developer and Data Scientist Data Usage: Developers and data scientists utilize data for testing and development purposes. They require both real and synthetic data to test applications and simulate real-life scenarios to drive business outcomes.

4. AI/ML/LLM and Customer Data Integration: The utilization of customer data in AI/ML learning processes is on the rise. Organizations leverage such data to train models and apply the results across various organizational units, catering to different use-cases.

What Is Misplaced Data

"Misplaced data" refers to data that has been moved from an approved environment to an unapproved environment. For example, a folder that is stored in the wrong location within a computer system or network. This can result from human error, technical glitches, or issues with data management processes.

 

When unauthorized data is stored in an environment that is not designed for the type of data, it can lead to data leaks, security breaches, compliance violations, and other negative outcomes.

With companies adopting more cloud services, and being challenged with properly managing the subsequent data sprawl, having misplaced data is becoming more common, which can lead to security, privacy, and compliance issues.

The Challenge of Data Movement and Misplaced Data

Organizations strive to secure their sensitive data by keeping it within carefully defined and secure environments. The pervasive data sprawl faced by nearly every organization in the cloud makes it challenging to effectively protect data, given its rapid multiplication and movement.

It is encouraged for business productivity to leverage data and use it for various purposes that can help enhance and grow the business. However, with the advantages, come disadvantages. There are risks to having multiple owners and duplicate data..

To address this challenge, organizations can leverage the analysis of similar data patterns to gain a comprehensive understanding on how data flows within the organization and help security teams first get visibility of those movement patterns, and then identify whether this movement is authorized. Then they can protect it accordingly and understand which unauthorized movement should be blocked.

This proactive approach allows them to position themselves strategically. It can involve ensuring robust security measures for data at each location, re-confining it by relocating, or eliminating unnecessary duplicates. Additionally, this analytical capability proves valuable in scenarios tied to regulatory and compliance requirements, such as ensuring GDPR - compliant data residency.

 Identifying Redundant Data and Saving Cloud Storage Costs

The identification of similarities empowers Chief Information Security Officers (CISOs) to implement best practices, steering clear of actions that lead to the creation of redundant data.

Detecting redundant data helps reduce cloud storage costs and drive up operational efficiency from targeted and prioritized remediation efforts that focus on the critical data risks that matter. 

This not only enhances data security posture, but also contributes to a more streamlined and efficient data management strategy.

“Sentra has helped us to reduce our risk of data breaches and to save money on cloud storage costs.”

-Benny Bloch, CISO at Global-e

Security Concerns That Arise

  1. Data Security Posture Variations Across Locations: Addressing instances where similar data, initially secure, experiences a degradation in security posture during the copying process (e.g., transitioning from private to public, or from encrypted to unencrypted).
  1. Divergent Access Profiles for Similar Data: Exploring scenarios where data, previously accessible by a limited and regulated set of identities, now faces expanded access by a larger number of identities (users), resulting in a loss of control.
  1. Data Localization and Compliance Violations: Examining situations where data, mandated to be localized in specific regions, is found to be in violation of organizational policies or compliance rules (with GDPR as a prominent example). By identifying similar sensitive data, we can pinpoint these issues and help users mitigate them.
  1. Anonymization Challenges in ETL Processes: Identifying issues in ETL processes where data is not only moved but also anonymized. Pinpointing similar sensitive data allows users to detect and mitigate anonymization-related problems.
  1. Customer Data Migration Across Environments: Analyzing the movement of customer data from production to development environments. This can be used by engineers to test real-life use-cases.
  2. Data Data Democratization and Movement Between Cloud and Personal Stores: Investigating instances where users export data from organizational cloud stores to personal drives (e.g., OneDrive) for purposes of development, testing, or further business analysis. Once this data is moved to personal data stores, it typically is less secure. This is due to the fact that these personal drives are less monitored and protected, and in control of the private entity (the employee), as opposed to the security/dev teams. These personal drives may be susceptible to security issues arising from misconfiguration, user mistakes or insufficient knowledge.

How Sentra’s DSPM Helps Navigate Data Movement Challenges

  1. Discover and accurately classify the most sensitive data and provide extensive context about it, for example:
  • Where it lives
  • Where it has been copied or moved to
  • Who has access to it
  1. Highlight misconfigurations by correlating similar data that has different security posture. This helps you pinpoint the issue and adjust it according to the right posture.
  2. Quickly identify compliance violations, such as GDPR - when European customer data moves outside of the allowed region, or when financial data moves outside a PCI compliant environment.
  3. Identify access changes, which helps you to understand the correct access profile by correlating similar data pieces that have different access profiles.

For example, the same data is well kept in a specific environment and can be accessed by 2 very specific users. When the same data moves to a developers environment, it can then be accessed by the whole data engineering team, which exposes more risks.

Leveraging Data Security Posture Management (DSPM) and Data Detection and Response (DDR) tools proves instrumental in addressing the complexities of data movement challenges. These tools play a crucial role in monitoring the flow of sensitive data, allowing for the swift remediation of exposure incidents and vulnerabilities in real-time. The intricacies of data movement, especially in hybrid and multi-cloud deployments, can be challenging, as public cloud providers often lack sufficient tooling to comprehend data flows across various services and unmanaged databases.

 

Our innovative cloud DLP tooling takes the lead in this scenario, offering a unified approach by integrating static and dynamic monitoring through DSPM and DDR. This integration provides a comprehensive view of sensitive data within your cloud account, offering an updated inventory and mapping of data flows. Our agentless solution automatically detects new sensitive records, classifies them, and identifies relevant policies. In case of a policy violation, it promptly alerts your security team in real time, safeguarding your crucial data assets.

In addition to our robust data identification methods, we prioritize the implementation of access control measures. This involves establishing Role-based Access Control (RBAC) and Attribute-based Access Control (ABAC) policies, so that the right users have permissions at the right times.

Identifying data movement with Sentra

Identifying Data Movement With Sentra

Sentra has developed different methods to identify data movements and similarities based on the content of two assets. Our advanced capabilities allow us to pinpoint fully duplicated data, identify similar data, and even uncover instances of partially duplicated data that may have been copied or moved across different locations. 

Moreover, we recognize that changes in access often accompany the relocation of assets between different locations. 

As part of Sentra’s Data Security Posture Management (DSPM) solution, we proactively manage and adapt access controls to accommodate these transitions, maintaining the integrity and security of the data throughout its lifecycle.

These are the 3 methods we are leveraging:

  1. Hash similarity - Using each asset unique identifier to locate it across the different data stores of the customer environment.
  2. Schema similarity - Locate the exact or similar schemas that indicated that there might be similar data in them and then leverage other metadata and statistical methods to simplify the data and find necessary correlations.
  3. Entity Matching similarity - Detects when parts of files or tables are copied to another data asset. For example, an ETL that extracts only some columns from a table into a new table in a data warehouse. 

Another example would be if PII is found in a lower environment, Sentra could detect if this is real or mock customer PII, based on whether this PII was also found in the production environment.

PII found in a lower environment

Conclusion

Understanding and managing data sprawl are critical tasks in the dynamic business landscape. Monitoring data movement, access, and posture enable teams to comprehend the full impact of data sprawl, connecting the dots for security analysts in assessing true risks and threats. 

Sentra addresses the challenge of data movement by utilizing advanced methods like hash, schema, and entity similarity to identify duplicate or similar data across different locations. Sentra's holistic Data Security Posture Management (DSPM) solution not only enhances data security but also contributes to a streamlined data management strategy. 

The identified challenges and Sentra's robust methods emphasize the importance of proactive data management and security in the dynamic digital landscape.

To learn more about how you can enhance your data security posture, schedule a demo with one of our experts.

<blogcta-big>

Ran is a passionate product and customer success leader with over 12 years of experience in the cybersecurity sector. He combines extensive technical knowledge with a strong passion for product innovation, research and development (R&D), and customer success to deliver robust, user-centric security solutions. His leadership journey is marked by proven managerial skills, having spearheaded multidisciplinary teams towards achieving groundbreaking innovations and fostering a culture of excellence. He started at Sentra as a Senior Product Manager and is currently the Head of Technical Account Management, located in NYC.

Subscribe

Latest Blog Posts

David Stuart
David Stuart
May 12, 2026
3
Min Read
AI and ML

Daybreak Answers the Vulnerability Question. Here's the One It Doesn't.

Daybreak Answers the Vulnerability Question. Here's the One It Doesn't.

A month after Anthropic announced Mythos and Project Glasswing, OpenAI launched Daybreak.

The competitive framing is hard to avoid. Two frontier AI labs, one month apart, both building systems designed to find and fix vulnerabilities before attackers can exploit them. The Hacker News called it OpenAI taking on Anthropic in the AI cybersecurity race. That framing is accurate but slightly misses the point for security teams evaluating what to do with either of them.

Both tools are solving a real and important problem: the window between a vulnerability being discoverable and being exploited has collapsed. As OpenAI's own announcement noted, AI can now compress hours of security analysis into minutes. The goal is to get defenders to vulnerabilities before attackers do. Daybreak builds editable threat models from actual codebases, validates findings in isolated environments, and proposes patches for human review. That is a genuinely useful capability.

But there are two separate questions in play here, and it's worth being precise about which one Daybreak answers.

THE QUESTION DAYBREAK ANSWERS

Daybreak answers, “What vulnerabilities exist in your code, and how do we fix them faster than an attacker can exploit them?”

That is the right question for a vulnerability management platform. It's the offense-versus-defense race that Mythos dramatized and Daybreak responds to. If you can identify and remediate a vulnerability before an attacker has a working exploit, you've won that exchange.

THE QUESTION DAYBREAK DOESN'T ANSWER

Daybreak doesn't answer, “If a vulnerability is exploited before it's patched, what does the attacker reach?”

This is the blast radius question. And it's the question that determines whether a successful exploit becomes a contained incident or a material breach.

The answer depends entirely on what sensitive data is accessible from the compromised position. What's in the codebase environment, what service accounts have access to, what data flows through the infrastructure Daybreak is analyzing. Vulnerability detection doesn't map sensitive data to identities. It doesn't tell you whether a compromised CI/CD pipeline has access to a production database containing customer PII. It doesn't tell you what an AI agent operating in that environment can reach and synthesize.

These are data governance questions. And they require a different kind of answer.

THE AI AGENT ACCESS PROBLEM

There's a second dimension here that I think is underappreciated in the Daybreak coverage.

Daybreak - like every AI security agent - needs access to your environment to do its job. Codebases, repositories, infrastructure configurations, build pipelines. That access is necessary for the tool to work. And it means that the data those environments contain becomes part of the access footprint of the AI agent operating in them.

Most organizations haven't fully inventoried what sensitive data lives in their development and security infrastructure. Production credentials in configuration files. Customer data in test environments that were never properly cleaned. PII that migrated into a repository through an integration nobody fully audited. This data exists in most large enterprise environments, not because of negligence, but because data accumulates faster than it gets classified.

Before you bring an AI agent into those environments - any AI agent, not just Daybreak - the governance question needs an answer. “What sensitive data is in here, who can reach it, and is that access picture appropriate for an AI system to operate within?

WHAT THIS MEANS FOR SECURITY TEAMS DEPLOYING DAYBREAK

Three things worth doing before or alongside a Daybreak deployment:

First, classify what's in the environments Daybreak will access. Codebases and CI/CD pipelines accumulate sensitive data that isn't always visible in a standard data inventory. Running a classification pass before bringing an AI agent in tells you what's there and what the exposure looks like if that environment is compromised.

Second, map what Daybreak's service account can reach. The blast radius of any compromise - including a compromise of Daybreak itself or a prompt injection against it - is bounded by what its operating identity can access. Scoping that access to the minimum necessary before deployment is the right architecture.

Third, know what patch you're protecting. Daybreak's value is highest when you know which vulnerabilities, if exploited, would expose the most sensitive data. That prioritization requires a continuous, current picture of where sensitive data lives in your environment - so that a critical vulnerability in a system with no sensitive data downstream gets triaged differently from one with a direct path to regulated records.

THE PACE OF THIS IS ACCELERATING

Mythos in April. Daybreak in May. The AI security capability race is compressing timelines for everyone.

Organizations that haven't yet built a continuous, current picture of their sensitive data estate are running out of runway to do it before AI security agents are operating inside their environments. The governance work - classification, identity-to-data mapping, access rationalization - is the foundation that makes all of these tools safer to deploy and more effective when they find something.

Vulnerability tools tell you where the door is. Data security tells you what's in the room. Both questions matter. The pace of the AI security race means you need to be working on both at the same time.

---

FREQUENTLY ASKED QUESTIONS

What is OpenAI Daybreak?

OpenAI Daybreak is a cybersecurity initiative launched May 11, 2026 that combines GPT-5.5 and Codex Security to help organizations identify, validate, and remediate software vulnerabilities. It builds editable threat models from enterprise codebases, validates likely vulnerabilities in isolated environments, and proposes patches for human review. Access is currently limited — organizations must request a vulnerability scan or contact OpenAI sales.

How is Daybreak different from Anthropic Mythos?

Both platforms use frontier AI to find and exploit vulnerabilities — Mythos focuses on autonomous zero-day discovery, while Daybreak is positioned more as a developer-integrated defense platform with a broader partner ecosystem. Anthropic has emphasized restricted access and high-risk vulnerability discovery; OpenAI is taking a broader platform approach tied to enterprise development workflows. Both address the vulnerability discovery question; neither addresses the blast radius question of what data is accessible if a vulnerability is exploited.

What does Daybreak mean for enterprise data security?

Daybreak requires feeding AI agents access to your codebase and infrastructure environments. Before deploying any AI security agent, organizations should classify what sensitive data lives in those environments, map what the agent's operating identity can access, and ensure that access reflects least privilege. The same access that makes these tools effective makes them part of your data attack surface.

What is the blast radius question in cybersecurity?

Blast radius refers to the scope of damage from a successful exploit — specifically, what sensitive data becomes accessible to an attacker who gains a foothold through a vulnerability. Vulnerability tools like Daybreak address how to find and fix vulnerabilities faster. Data Security Posture Management (DSPM) addresses what an attacker reaches if a vulnerability is exploited before it's patched — which is determined by how sensitive data is distributed, classified, and access-controlled across the environment.

How does DSPM complement AI vulnerability tools like Daybreak?

DSPM continuously discovers and classifies sensitive data across cloud, SaaS, and on-premises environments, maps which identities can access sensitive stores, and identifies overpermissioned access. In a Daybreak deployment, DSPM answers three questions: what sensitive data lives in the environments Daybreak will access, what can Daybreak's operating identity reach, and which vulnerabilities are highest priority because they have a direct path to regulated or sensitive data. DSPM and vulnerability management address sequential parts of the same problem — not competing solutions.

Daybreak and Mythos are compressing the vulnerability window on both sides. The organizations best positioned to respond aren't the ones scrambling to understand their data exposure after an exploit — they're the ones who already have a continuous, current picture of what sensitive data lives in their environments, what every identity can reach, and where access needs to be tightened before an AI agent touches it.

See how Sentra maps sensitive data across your cloud, SaaS, and development environments — and what your blast radius actually looks like today. Schedule a Demo →

Read More
Ron Reiter
Ron Reiter
May 8, 2026
3
Min Read
Data Security

Mythos Is Already Here. The Question Is What Attackers Will Find.

Mythos Is Already Here. The Question Is What Attackers Will Find.

I've spent a lot of time thinking about what Mythos actually changes — and what it doesn't.

The vulnerabilities Mythos found are not new in nature. They're variations of known vulnerability classes — buffer overflows, race conditions, memory corruption. These aren't novel attack categories. What's new is the speed and scale at which Mythos surfaces them. In pre-release testing, Mythos Preview autonomously developed working exploits for Mozilla Firefox vulnerabilities 181 times, compared to the prior model's two successful attempts out of several hundred. That isn't an incremental improvement. It's a different class of capability.

Modeled scenarios show attackers discovering the majority of new vulnerabilities within a few years, meaning defenders increasingly respond to issues adversaries may already know about. The core challenge shifts from finding vulnerabilities faster to fixing them faster.

That's the real strategic shift. And for data security specifically, it has a specific implication that I think is underappreciated in the current conversation.

PATCH SPEED IS NECESSARY. IT'S NOT SUFFICIENT.

When a Mythos-class tool helps an attacker gain initial access — through a zero-day in a browser, an OS, an unpatched server — the next thing that determines outcome is what they find. What data is accessible from the compromised position. What identities and service accounts can be traversed. What sensitive records sit in environments with overly broad permissions.

Most security conversations right now are about accelerating patch cycles, which is the right instinct. A 2025 report found that over 45% of discovered security vulnerabilities in large organizations remain unpatched after 12 months. Closing that gap matters enormously. But patching controls the entry point. It doesn't control the blast radius once someone is in.

The blast radius question is a data question.

WHAT ATTACKERS FIND WHEN THEY GET IN

The uncomfortable truth is that most organizations don't have a comprehensive, current answer to: what sensitive data is accessible from any given position in my environment?

Data accumulates in ways that security teams don't fully track. Salesforce orgs fill up with PII from integrations that nobody audited. Lakehouses absorb years of production data pipelines. Cloud storage buckets get misconfigured and forgotten. Service accounts accumulate permissions that outlive the workflows they were created for. And increasingly, AI agents and copilots run under those service accounts — meaning whatever a service account can reach, the AI can retrieve and synthesize.

This isn't a hypothetical. It's the operational reality in most enterprises I talk to.

When Mythos-class capabilities become more widely available — and Anthropic's own estimate is that similar capabilities will proliferate from other AI labs within six to eighteen months — the attack surface question becomes: not just "what vulnerabilities can be exploited" but "what data becomes accessible when they are." Those are different problems with different solutions.

WHAT ACTUALLY CHANGES YOUR RISK PROFILE

Assume breach. Not as a thought experiment, but as the operating reality it now is.

Given that, the most meaningful thing you can do in the next 90 days isn't buy another scanner. It's get a clear, continuous answer to where your sensitive data actually lives — across cloud, SaaS, data warehouses, and the AI systems layered on top of them — and make sure the access picture reflects least privilege, not accumulated permissions from three years of workflow changes.

That means:

Knowing what's in your environment, continuously. Not a quarterly scan. Not a point-in-time audit. When Mythos-class tools can find and exploit a vulnerability overnight, a quarterly data inventory is operationally useless. You need to know what sensitive data exists and where it lives as a continuous fact, not a periodic report.

Understanding what each identity can reach. The blast radius of any successful exploit is bounded by what the compromised identity — human or service account or AI agent — can access. If that access picture isn't mapped to sensitive data at the record level, you can't assess exposure or contain it quickly after a breach.

Eliminating data that shouldn't be where it is. The most effective way to reduce Mythos-era blast radius is to not have sensitive data sitting in places it doesn't need to be. Redundant copies of regulated records, production data that migrated to dev environments, PII sitting in SaaS tools it arrived in through integration workflows — this is the data that causes the notifications, the regulatory exposure, and the headlines. Getting rid of it before an attacker finds it is categorically better than discovering it during incident response.

THE PART OF THIS CONVERSATION THAT ISN'T GETTING ENOUGH ATTENTION

Most of the Mythos coverage has focused, reasonably, on the vulnerability discovery side. That's where the dramatic capability jump is visible. But the quieter implication is about what happens after discovery and exploitation — which is where data security actually determines outcome.

"The window between a vulnerability being discovered and being exploited by an adversary has collapsed — what once took months now happens in minutes with AI," according to one Project Glasswing partner. If that compression applies equally to time-to-exploit, it applies equally to time-to-data. The faster an attacker can reach a compromised system, the faster they reach whatever's accessible from it.

This is the Mythos implication for data security teams: the window for containment is shrinking, and continuous data visibility is how you make that window matter.

---

FREQUENTLY ASKED QUESTIONS

What is Claude Mythos Preview?

Claude Mythos Preview is an AI model announced by Anthropic in April 2026 capable of autonomously discovering and exploiting zero-day vulnerabilities across every major operating system and browser, at a speed and scale that significantly exceeds human security researchers.

Is Mythos publicly available?

Anthropic has withheld general release, citing offensive risk. Access has been granted to approximately 40 organizations through Project Glasswing, a defensive security consortium. Anthropic estimates comparable capabilities will emerge from other AI labs within 6 to 18 months.

What does "assume breach" mean in a Mythos context?

Assume breach means designing your security posture around the expectation that attackers will get in — focusing less on prevention at the perimeter and more on limiting what they find inside. In a Mythos context, where exploit development can happen overnight, assume breach shifts from a framework to an operating reality.

How does data visibility reduce breach blast radius?

Blast radius — the scope of damage from a successful breach — is determined by what sensitive data is accessible from a compromised position, not by the exploit itself. Organizations with continuous, comprehensive data classification and least-privilege access governance can identify what was exposed quickly and contain the damage. Organizations without it typically discover their exposure during incident response, when it's too late.

What is DSPM and how does it help with Mythos preparedness?

Data Security Posture Management (DSPM) is a continuous monitoring discipline that discovers and classifies sensitive data across cloud, SaaS, and on-premises environments, maps access to that data, and identifies where sensitive records are exposed to over-permissioned identities or misconfigured controls. In a Mythos-era threat model, DSPM provides the continuous data inventory that makes blast radius assessment and containment possible.

Read More
Yair Cohen
Yair Cohen
May 7, 2026
3
Min Read
Data Security

The Instructure Breach Was Salesforce. Again. Here's the Governance Problem Nobody Is Talking About.

The Instructure Breach Was Salesforce. Again. Here's the Governance Problem Nobody Is Talking About.

ShinyHunters breached Instructure - the company behind Canvas LMS - and claimed 275 million student and teacher records, 3.65 terabytes of data, and a ransom deadline of May 6, 2026. That alone is a significant breach. But the detail buried in the coverage is the more important story for every security team reading this.

This is the second time ShinyHunters has breached Instructure's Salesforce environment. In September 2025, the same group used social engineering to access Instructure's Salesforce instance. Instructure disclosed it, rotated credentials, and continued operating. Eight months later, the same attack surface was breached again.

That is not a story about ShinyHunters' sophistication. It is a story about incomplete remediation and about what happens when a breach response focuses on the credential and the vulnerability without addressing the underlying data exposure.

WHAT SALESFORCE ACTUALLY CONTAINS AND WHY SECURITY TEAMS MISS IT

Most organizations think of Salesforce as a CRM. Their security teams govern it like one - access controls at the application layer, SSO, maybe some DLP on outbound data. What they often don't account for is what accumulates inside Salesforce over years of integrations, workflow automations, and cross-platform data flows.

In the Instructure case, ShinyHunters claims the Salesforce instance contained student and teacher PII, private messages, and institutional records across nearly 9,000 schools. Some of that data flowed into Salesforce deliberately - CRM records, institutional contacts, support tickets. Some of it flowed in through integrations with Canvas that nobody fully audited. All of it was sitting in an environment that, based on the breach timeline, had its access controls reset after September 2025 but was not fundamentally rearchitected.

According to Security Magazine, ShinyHunters has used Salesforce misconfiguration as a repeating attack vector across multiple recent victims - the same playbook behind breaches at McGraw-Hill, Infinite Campus, Amtrak, and ADT. The vector is documented. The pattern is public. And yet organizations continue to treat Salesforce breach response as a credential rotation exercise rather than a data governance exercise.

WHAT "REMEDIATING" A SALESFORCE BREACH ACTUALLY REQUIRES

When a Salesforce environment is breached, the immediate response - revoke credentials, rotate API keys, patch the vulnerability - is necessary. It is not sufficient.

The harder question is: what data was in that Salesforce instance, who could access it, and should it have been there at all? Answering those questions requires classification. Without knowing what sensitive data exists in Salesforce, at the field and record level, there is no way to assess true exposure, implement meaningful least-privilege access, or identify which data flows need to be redesigned.

In Instructure's case, the breach response after September 2025 apparently did not include that step. The data, student PII, private messages, institutional records, remained in the environment, remained broadly accessible, and remained available to ShinyHunters when they returned.

This is the governance gap that keeps breached-and-remediated organizations on the repeat victim list.

THE IDENTITY AND ACCESS DIMENSION

ShinyHunters has also been linked to recent breaches at the University of Pennsylvania, Princeton, and Harvard - all of which share a pattern: large Salesforce deployments, institutional data accumulated over years, access controls managed at the application layer without deep visibility into what sensitive data each identity can actually reach at the data layer.

Sentra's approach to Salesforce governance maps exactly this. Classification runs continuously inside the Salesforce environment - identifying student PII, FERPA-regulated records, private communications, and institutional data that has accumulated through integrations. Access mapping connects each user, service account, and API integration to the sensitive data it can reach - not just the objects it has permissions to access, but the classified sensitive records within those objects. When an integration adds new data flows or permissions drift, the inventory updates in real time.

The output is a continuous answer to the question Instructure's security team could not have answered quickly enough in September 2025: what sensitive data is in Salesforce, what can each identity reach, and what needs to be removed or restricted before the next attempt.

WHAT TO CHECK IN YOUR OWN SALESFORCE ENVIRONMENT THIS WEEK

Three questions worth answering now, regardless of your industry:

First, what sensitive data has accumulated in your Salesforce org through integrations, workflow automations, and cross-platform data flows - beyond what was deliberately put there? Student records, healthcare data, financial records, and private communications all end up in Salesforce through integration patterns that were never evaluated for data sensitivity.

Second, what can each service account and API integration actually reach at the record level? Application-layer access controls in Salesforce do not prevent exfiltration by an attacker who has compromised a sufficiently-permissioned service account. Least-privilege at the data layer requires knowing what sensitive data each identity can access.

Third, if your Salesforce environment were breached today and you had to disclose within 72 hours, could you accurately characterize what data was exposed? FERPA, HIPAA, GDPR, and state-level privacy laws all require specific disclosure of data types. Without continuous classification, the answer in most environments is: not quickly, and not accurately.

FREQUENTLY ASKED QUESTIONS: SALESFORCE DATA SECURITY

What is the ShinyHunters Salesforce attack pattern?

ShinyHunters has repeatedly used Salesforce as an attack vector - typically gaining initial access through social engineering or credential theft, then exfiltrating data from the Salesforce org and using it for extortion. The pattern has appeared in breaches at Instructure (twice), McGraw-Hill, Infinite Campus, Amtrak, and others. The common thread is that Salesforce environments contain far more sensitive data than most security teams have classified or actively governed.

What data typically accumulates in enterprise Salesforce environments beyond CRM records?

In production Salesforce environments, continuous classification commonly surfaces PII from support ticket integrations, regulated financial or health data from cross-platform workflows, private communications stored in custom objects, API credentials and tokens in log fields, and institutional data from education or healthcare integrations. Most of this data arrives through legitimate integration patterns rather than misconfiguration.

How does DSPM apply to Salesforce environments?

Data Security Posture Management applied to Salesforce continuously classifies sensitive data at the field and record level within the Salesforce org; identifying regulated data types, mapping which identities can access them, and flagging access that exceeds least-privilege requirements. This runs inside the customer's environment without data leaving the Salesforce perimeter.

What is the difference between Salesforce's native security tools and DSPM?

Salesforce's native tools - Shield, field-level security, permission sets - control access at the object and field level. They do not classify data by sensitivity, identify regulated records that should not be in a given field, or map the sensitive data reachable by each integration or service account. DSPM fills that gap: it understands what the data is, not just who has permission to access it.

What does FERPA require in the event of an educational data breach?

FERPA requires institutions to protect the privacy of student education records. In a breach involving student PII, private messages, and institutional records - as in the Instructure case - affected institutions face notification obligations, potential loss of federal funding eligibility, and civil liability. Accurate and timely disclosure requires knowing exactly what records were exposed, which requires prior classification.

The Instructure breach happened twice because the data was never classified after the first incident. Credential rotation without data governance leaves the same exposure in place for the next attempt. Sentra continuously classifies sensitive data inside your Salesforce environment at the field and record level, maps what every identity and integration can reach, and flags access that exceeds least-privilege — so your breach response closes the actual gap, not just the credential.

See how Sentra governs Salesforce data → Schedule a Demo

Read More
Expert Data Security Insights Straight to Your Inbox
What Should I Do Now:
1

Get the latest GigaOm DSPM Radar report - see why Sentra was named a Leader and Fast Mover in data security. Download now and stay ahead on securing sensitive data.

2

Sign up for a demo and learn how Sentra’s data security platform can uncover hidden risks, simplify compliance, and safeguard your sensitive data.

3

Follow us on LinkedIn, X (Twitter), and YouTube for actionable expert insights on how to strengthen your data security, build a successful DSPM program, and more!

Before you go...

Get the Gartner Customers' Choice for DSPM Report

Read why 98% of users recommend Sentra.

White Gartner Peer Insights Customers' Choice 2025 badge with laurel leaves inside a speech bubble.