Roy Levine
Roy Levine is the VP R&D at Sentra. He brings nearly 20 years of experience in engineering, data, AI, and a strong background in senior management across startups and enterprises.
Name's Data Security Posts
How Contextual Data Classification Complements Your Existing DLP
How Contextual Data Classification Complements Your Existing DLP
Using data loss prevention (DLP) technology is a given for many organizations. Because these solutions have historically been the best way to prevent data exposure, many organizations already have DLP solutions deeply entrenched within their infrastructure and security systems to assist with data discovery and classification.
However, as we discussed in a previous blog post, traditional DLP often struggles to keep up with disparate cloud environments and the sheer volume of data that comes with them. As a result, many teams experience false alarms and alert fatigue — not to mention extensive manual tuning — as they try to get their DLP solutions to work with their cloud-based or hybrid data ecosystems. However, simply ripping out and replacing these solutions isn’t an option for most organizations, as they are costly and play such a significant role in security programs.
Many organizations need a complementary solution instead of a replacement for their DLP — something that will improve the effectiveness and accuracy of their existing data discovery and “border control” security technologies. Contextual data classification can play this role with cloud-aware functionality that can discover all data, identify what data is at risk, and gauge the actions that cloud users take and differentiate between routine activities and anomalies that could indicate actual threats. This can then be used to better harden the policies and controls governing data movement.
Why Cloud Data Security Requires More than DLP
While traditional data loss prevention (DLP) technology plays an integral role in many businesses’ data security approaches, it can start to falter when used within a cloud environment. Why? DLP uses pre-defined patterns to detect suspicious activity. Often, this doesn’t work in the context of regular cloud activities. Here are the two main ways that DLP conflicts with the cloud:
Perimeter-Based Security Controls
DLP was originally created for on-premise environments with a clearly defensible perimeter. A DLP solution can only see general patterns, such as a file getting sent, shared, or copied, and cannot capture nuanced information beyond this. So, a DLP solution often flags routine activities (e.g., sharing data with third-party applications) as suspicious in the data discovery process. When the DLP blocks these everyday actions, it impedes business velocity and alerts the security team needlessly.
In modern cloud-first organizations, data needs to move freely to / from the cloud in order to meet dynamic business demands. DLP often is too restrictive (or, conversely, too permissive) since it lacks a fundamental understanding of the data sensitivity and only sees data when it moves. As a result, it misses the opportunity to protect data at rest. If too restrictive, it can disrupt business. If too permissive, it can miss numerous insider, supply chain, or other threats that look like authorized activity to the DLP.
Limited Classification Engines
The classification engines built into traditional DLPs are limited to common data types, such as social security or credit card numbers. As a result, they can miss nuanced, sensitive data, which is more common in a cloud ecosystem. For example, passport numbers stored alongside the passport holders’ names could pose a risk if exposed, while either the names or numbers on their own are not a risk. Or, DLP solutions could miss intellectual property or trade secrets, a form of data that wasn’t even stored online twenty years ago but is now prevalent in cloud environments. Data unique to the industry or specific business may also be missed if proper classifiers don’t detect it. The ability to tailor classifiers for these proprietary data types is very important (but often absent in commercial DLP offerings!)
Because of these limitations, many businesses see a gap between traditional DLP solutions' discovery and classification patterns and the realities of a multi-cloud and/or hybrid data estate. Existing DLP solutions ultimately can’t comprehend what’s going on within a cloud environment because they don’t understand the following pieces of information:
- Where sensitive data exists, whether within structured or unstructured data.
- Who uses it and how they use it in an everyday business context.
- Which data is likely sensitive because of its origins, neighboring data, or other unique characteristics.
Without this information, the DLP technology will likely flag non-risky actions as suspicious (e.g., blocking services in IaaS/PaaS environments) and overlook legitimate threats (e.g., exfiltration of unstructured sensitive data).
Improve Data Security with Sentra’s Contextual Data Classification
Adding contextual data classification to your DLP can provide this much-needed context. Sentra’s data security posture management (DSPM) solution offers data classification functionality that can work alongside or feed your existing DLP technology. We leverage LLM-based algorithms to accurately understand the context of where and how data is used, then detect when any sensitive data is misplaced or misused based on this information. Applicable sensitivity tags can be sent via API directly to the DLP solution for actioning.
When you integrate Sentra into your existing DLP solution, our classification engine will tag and label files, and then add this rich, contextual information as metadata.
Here are some examples of how our technology complements and extends the abilities of DLP solutions:
- Sentra can discover nuanced proprietary, sensitive data and detect new identifiers such as “transaction ID” or “intellectual property.”
- Sentra can use exact data matching to detect whether data was partially copied from production and flag it as sensitive.
- Sentra can detect when a given file likely contains business context because of its owner, location, etc. For example, a file taken from the CEO’s Google Drive or from a customer’s data lake can be assumed to be sensitive.
In addition, we offer a simple, agentless deployment and prioritize the security of your data by keeping it all within your environment during scanning.
Watch a one-minute video to learn more about how Sentra discovers and classifies nuanced, sensitive data in a cloud environment.