How Sensitive Cloud Data Gets Exposed
When organizations began migrating to the cloud, they did so with the promise that they’ll be able to build and adapt their infrastructures at speeds that would give them a competitive advantage. It also meant that they’d be able to use large amounts of data to gather insights about their users and customers to better understand their needs.
While this is all true - it does mean that there’s more data than ever that security teams are responsible for protecting more data than ever before. As data gets replicated, shared, and moved throughout the public cloud, sensitive data exposure becomes more common. These are the most common ways that sensitive cloud data is exposed and leaked - and what’s needed to mitigate the risks.
Causes of Cloud Data Exposure
Negligence: Accidentally leaving a data asset exposed to the internet shouldn’t happen. Cloud providers know it happens anyway - AWS’ first sentence in their best practices for S3 storage article says “Ensure that your Amazon S3 buckets use the correct policies and are not publicly accessible.” 5 years ago AWS added warnings to dashboards when a bucket was publicly exposed. Of course, S3 is just one of many data stores that contain sensitive data and are prone to accidental exposure. Despite the warnings, exposed data assets continue to be a cause of data breaches. Fortunately, these vulnerabilities are easily corrected- assuming you have perfect visibility into your cloud environment.
Data Movement: Even when sensitive data is properly secured, there’s always a risk that it could be moved or copied into an unsecured environment. A common example of this is taking sensitive data from a secured production environment and moving it to a developer environment with a lower security posture. In this case, the data’s owner did everything right - it was the second user who moved the data who accidentally put it at risk. Another example would be an organization which has a PCI environment where they keep all the payment information of their customers, and they need to prevent this extremely sensitive data from going to other data stores in less secured parts of their cloud environment.
Improper Access Management: Access to sensitive data should not be granted to users who don’t need it (see the example above). Improper IAM configurations and access control management increases the risk of accidental or malicious data leakage. More access means more potential shadow data being created and abandoned. For example, a user might copy sensitive data and then leave the company, creating data that no one is aware of. Limiting access to sensitive data to users who actually need it can help prevent a needless expansion of your organization’s ‘data attack surface’.
3rd Parties: It’s extremely easy to accidentally share sensitive data with a third party over email. Accidentally forwarding sensitive data or credentials is one of the simplest ways to leak sensitive data from your organization. In the public cloud, the equivalent of the accidental email is granting a 3rd party access to a data asset in your public cloud infrastructure, such as a CI/CD tool or a SaaS application for data analytics. It’s similar to improper access management, only now the over privileged access is granted outside of your organization entirely where you’re less able to mitigate the risks.
Another common way data is leaked to 3rd parties is when someone inside an organization shares something that isn't supposed to have sensitive data, but does. A good example of this is sharing log files with a 3rd party. Log files shouldn’t have sensitive data, but often it can include data like user emails, IP addresses, API credentials, etc.
ETL Errors: When extracting data that contains PII from one from a production database to a data lake or an analytics data warehouse, such as Redshift or Snowflake, sometimes the wrong warehouse might be specified. This is an easy mistake to miss, as data agnostic tools might not understand the sensitive nature of the data.
Why Can’t Cloud Data Security Solutions Stop Sensitive Data Exposure?
Simply put - they’re not looking at the data. They’re looking at the network, infrastructure, and perimeter. That’s how data leaks used to be prevented in the on-prem days - you’d just make sure the perimeter was secure, and because all your sensitive data was on-prem, you could secure it by securing everything.
For cloud-first companies, data isn’t staying behind the corporate perimeter. And while cloud platforms can identify infrastructure vulnerabilities, they’re missing the context around which data is sensitive. Remediating data vulnerabilities - finding sensitive data with an improper security posture remains a challenge.
Discovering and Classifying Cloud Data - The Data Security Posture Management (DSPM) Approach
Instead of trying to adapt on-prem strategies to cloud environments, DSPM (a new ‘on the rise’ category in Gartner’s™ latest hype cycle) takes a data first approach. By understanding the data’s proper context, DSPM secure sensitive cloud data by:
- Discovering all cloud data, including shadow data and abandoned data stores
- Classifying the different data types using standard and custom parameters
- Automatically detects when sensitive data’s security posture is changed - whether via data movement or duplication
- Detects who can access and who has accessed sensitive data
- Understands how data travels throughout the cloud environment
- Orchestrates remediation workflows between engineering and security teams
Data Security Posture Management solves many of the most common reasons sensitive cloud data gets leaked. By focusing on securing and following the data across the cloud, DSPM helps cloud security teams finally secure what we’re all supposed to be protecting - sensitive data.
To learn more about Data Security Posture Management, check out our full introduction to DSPM, or see it for yourself.