All Resources
In this article:
minus iconplus icon
Share the Article

How Sentra Accurately Classifies Sensitive Data at Scale

July 30, 2024
5
 Min Read
Data Security

Background on Classifying Different Types of Data

It’s first helpful to review the primary types of data we need to classify - Structured and Unstructured Data and some of the historical challenges associated with analyzing and accurately classifying it.

What Is Structured Data?

Structured data has a standardized format that makes it easily accessible for both software and humans. Typically organized in tables with rows and/or columns, structured data allows for efficient data processing and insights. For instance, a customer data table with columns for name, address, customer-ID and phone number can quickly reveal the total number of customers and their most common localities.

Moreover, it is easier to conclude that the number under the phone number column is a phone number, while the number under the ID is a customer-ID. This contrasts with unstructured data, in which the context of each word is not straightforward. 

What Is Unstructured Data?

Unstructured data, on the other hand, refers to information that is not organized according to a preset model or schema, making it unsuitable for traditional relational databases (RDBMS). This type of data constitutes over 80% of all enterprise data, and 95% of businesses prioritize its management. The volume of unstructured data is growing rapidly, outpacing the growth rate of structured databases.

Examples of unstructured data include:

  • Various business documents
  • Text and multimedia files
  • Email messages
  • Videos and photos
  • Webpages
  • Audio files

While unstructured data stores contain valuable information that often is essential to the business and can guide business decisions, unstructured data classification has historically been challenging. However, AI and machine learning have led to better methods to understand the data content and uncover embedded sensitive data within them.

The division to structured and unstructured is not always a clear cut. For example, an unstructured object like a docx document can contain a table, while each structured data table can contain cells with a lot of text which on its own is unstructured. Moreover there are cases of semi-structured data. All of these considerations are part of the Sentra classification system and beyond the scope of this blog.

Data Classification Methods & Models 

Applying the right data classification method is crucial for achieving optimal performance and meeting specific business needs. Sentra employs a versatile decision framework that automatically leverages different classification models depending on the nature of the data and the requirements of the task. 

We utilize two primary approaches: 

  1. Rule-Based Systems
  2. Large Language Models (LLMs)

Rule-Based Systems 

Rule-based systems are employed when the data contains entities that follow specific, predictable patterns, such as email addresses or checksum-validated numbers. This method is advantageous due to its fast computation, deterministic outcomes, and simplicity, often  providing the most accurate results for well-defined scenarios.

Due to their simplicity, efficiency, and deterministic nature, Sentra uses rule-based models whenever possible for data classification. These models are particularly effective in structured data environments, which possess invaluable characteristics such as inherent structure and repetitiveness.

For instance, a table named "Transactions" with a column labeled "Credit Card Number" allows for straightforward logic to achieve high accuracy in determining that the document contains credit card numbers. Similarly, the uniformity in column values can help classify a column named "Abbreviations" as 'Country Name Abbreviations' if all values correspond to country codes.

Sentra also uses rule-based labeling for document and entity detection in simple cases, where document properties provide enough information. Customer-specific rules and simple patterns with strong correlations to certain labels are also handled efficiently by rule-based models.

Large Language Models (LLMs)

Large Language Models (LLMs) such as BERT, GPT, and LLaMa represent significant advancements in natural language processing, each with distinct strengths and applications. BERT (Bidirectional Encoder Representations from Transformers) is designed for fine-grained understanding of text by processing it bidirectionally, making it highly effective for tasks like Named Entity Recognition (NER) when trained on large, labeled datasets. In contrast, autoregressive models like the famous GPT (Generative Pre-trained Transformer) and Llama (Large Language Model Meta AI) excel in generating and understanding text with minimal additional training. These models leverage extensive pre-training on diverse data to perform new tasks in a few-shot or zero-shot manner. Their rich contextual understanding, ability to follow instructions, and generalization capabilities allow them to handle tasks with less dependency on large labeled datasets, making them versatile and powerful tools in the field of NLP. However, their great value comes with a cost of computational power, so they should be used with care and only when necessary.

Applications of LLMs at Sentra

Sentra uses LLMs for both Named Entity Recognition (NER) and document labeling tasks. The input to the models is similar, with minor adjustments, and the output varies depending on the task:

  • Named Entity Recognition (NER): The model labels each word or sentence in the text with its correct entity (which Sentra refers to as a data class).
  • Document Labels: The model labels the entire text with the appropriate label (which Sentra refers to as a data context).
  • Continuous Automatic Analysis: Sentra uses its LLMs to continuously analyze customer data, help our analysts find potential mistakes, and to suggest new entities and document labels to be added to our classification system.

Here you can see an example of how Sentra classifies personal information.
Note: Entity refers to data classes on our dashboard
Document labels refers to data context on our dashboard

Sentra’s Generative LLM Inference Approaches

An inference approach in the context of machine learning involves using a trained model to make predictions or decisions based on new data. This is crucial for practical applications where we need to classify or analyze data that wasn't part of the original training set. 

When working with complex or unstructured data, it's crucial to have effective methods for interpreting and classifying the information. Sentra employs Generative LLMs for classifying complex or unstructured data. Sentra’s main approaches to generative LLM inference are as follows:

Supervised Trained Models (e.g., BERT)

In-house trained models are used when there is a need for high precision in recognizing domain-specific entities and sufficient relevant data is available for training. These models offer customization to capture the subtle nuances of specific datasets, enhancing accuracy for specialized entity types. These models are transformer-based deep neural networks with a “classic” fixed-size input and a well-defined output size, in contrast to generative models. Sentra uses the BERT architecture, modified and trained on our in-house labeled data, to create a model well-suited for classifying specific data types. 

This approach is advantageous because:

  • In multi-category classification, where a model needs to classify an object into one of many possible categories, the model outputs a vector the size of the number of categories, n. For example, when classifying a text document into categories like ["Financial," "Sports," "Politics," "Science," "None of the above"], the output vector will be of size n=5. Each coordinate of the output vector represents one of the categories, and the model's output can be interpreted as the likelihood of the input falling into one of these categories.
  • The BERT model is well-designed for fine-tuning specific classification tasks. Changing or adding computation layers is straightforward and effective.
  • The model size is relatively small, with around 110 million parameters requiring less than 500MB of memory, making it both possible to fine-tune the model’s weights for a wide range of tasks, and more importantly - run in production at small computation costs.
  • It has proven state-of-the-art performance on various NLP tasks like GLUE (General Language Understanding Evaluation), and Sentra’s experience with this model shows excellent results.

Zero-Shot Classification

One of the key techniques that Sentra has recently started to utilize is zero-shot classification, which excels in interpreting and classifying data without needing pre-trained models. This approach allows Sentra to efficiently and precisely understand the contents of various documents, ensuring high accuracy in identifying sensitive information. The comprehensive understanding of English (and almost any language) enables us to classify objects customized to a customer's needs without creating a labeled data set. This not only saves time by eliminating the need for repetitive training but also proves crucial in situations where defining specific cases for detection is challenging. When handling sensitive or rare data, this zero-shot and few-shot capability is a significant advantage.

Our use of zero-shot classification within LLMs significantly enhances our data analysis capabilities. By leveraging this method, we achieve an accuracy rate with a false positive rate as low as three to five percent, eliminating the need for extensive pre-training.

Sentra’s Data Sensitivity Estimation Methodologies

Accurate classification is only a (very crucial) step to determine if a document is sensitive. At the end of the day, a customer must be able to also discern whether a document contains the addresses, phone numbers or emails of the company’s offices, or the company’s clients.

Accumulated Knowledge

Sentra has developed domain expertise to predict which objects are generally considered more sensitive. For example, documents with login information are more sensitive compared to documents containing random names. 

Sentra has developed the main expertise based on our collected AI analysis over time.

How does Sentra accumulate the knowledge? (is it via AI/ML?)

Sentra accumulates knowledge both from combining insights from our experience with current customers and their needs with machine learning models that continuously improve based on the data they are trained with over time.

Customer-Specific Needs

Sentra tailors sensitivity models to each customer’s specific needs, allowing feedback and examples to refine our models for optimal results. This customization ensures that sensitivity estimation models are precisely tuned to each customer’s requirements.

What is an example of a customer-specific need?

For instance, one of our customers required a particular combination of PII (personally identifiable information) and NPPI (nonpublic personal information). We tailored our solution by creating a composite classifier to meet their needs by designating documents containing these combinations as having a higher sensitivity level.

Sentra’s sensitivity assessment (that drives classification definition) can be based on detected data classes, document labels, and detection volumes, which triggers extra analysis from our system if needed.

Conclusion

In summary, Sentra’s comprehensive approach to data classification and sensitivity estimation ensures precise and adaptable handling of sensitive data, supporting robust data security at scale. With accurate, granular data classification, security teams can confidently proceed to remediation steps without need for further validation - saving time and streamlining processes.  Further, accurate tags allow for automation - by sharing contextual sensitivity data with upstream controls (ex. DLP systems) and remediation workflow tools (ex. ITSM or SOAR).

Additionally, our research and development teams stay abreast of the rapid advancements in Generative AI, particularly focusing on Large Language Models (LLMs). This proactive approach to data classification ensures our models not only meet but often exceed industry standards, delivering state-of-the-art performance while minimizing costs. Given the fast-evolving nature of LLMs, it is highly likely that the models we use today—BERT, GPT, Mistral, and Llama—will soon be replaced by even more advanced, yet-to-be-published technologies.

After earning a BSc in Mathematics and a BSc in Industrial Engineering, followed by an MSc in Computer Science with a thesis in Machine Learning theory, Hanan has spent the last five years training models for feature-based and computer vision problems. Driven by the motivation to deliver real-world value through his expertise, he leverages his strong theoretical background and hands-on experience to explore and implement new methodologies and technologies in machine learning. At Sentra, one of his main focuses is leveraging large language models (LLMs) for advanced classification and analysis tasks.

Subscribe

Latest Blog Posts

Gilad Golani
December 16, 2024
4
Min Read
Data Security

Best Practices: Automatically Tag and Label Sensitive Data

Best Practices: Automatically Tag and Label Sensitive Data

The Importance of Data Labeling and Tagging

In today's fast-paced business environment, data rarely stays in one place. It moves across devices, applications, and services as individuals collaborate with internal teams and external partners. This mobility is essential for productivity but poses a challenge: how can you ensure your data remains secure and compliant with business and regulatory requirements when it's constantly on the move?

Why Labeling and Tagging Data Matters

Data labeling and tagging provide a critical solution to this challenge. By assigning sensitivity labels to your data, you can define its importance and security level within your organization. These labels act as identifiers that abstract the content itself, enabling you to manage and track the data type without directly exposing sensitive information. With the right labeling, organizations can also control access in real-time.

For example, labeling a document containing social security numbers or credit card information as Highly Confidential allows your organization to acknowledge the data's sensitivity and enforce appropriate protections, all without needing to access or expose the actual contents.

Why Sentra’s AI-Based Classification Is a Game-Changer

Sentra’s AI-based classification technology enhances data security by ensuring that the sensitivity labels are applied with exceptional accuracy. Leveraging advanced LLM models, Sentra enhances data classification with context-aware capabilities, such as:

  • Detecting the geographic residency of data subjects.
  • Differentiating between Customer Data and Employee Data.
  • Identifying and treating Synthetic or Mock Data differently from real sensitive data.

This context-based approach eliminates the inefficiencies of manual processes and seamlessly scales to meet the demands of modern, complex data environments. By integrating AI into the classification process, Sentra empowers teams to confidently and consistently protect their data—ensuring sensitive information remains secure, no matter where it resides or how it is accessed.

Benefits of Labeling and Tagging in Sentra

Sentra enhances your ability to classify and secure data by automatically applying sensitivity labels to data assets. By automating this process, Sentra removes the manual effort required from each team member—achieving accuracy that’s only possible through a deep understanding of what data is sensitive and its broader context.

Here are some key benefits of labeling and tagging in Sentra:

  1. Enhanced Security and Loss Prevention: Sentra’s integration with Data Loss Prevention (DLP) solutions prevents the loss of sensitive and critical data by applying the right sensitivity labels. Sentra’s granular, contextual tags help to provide the detail necessary to action remediation automatically so that operations can scale.
  2. Easily Build Your Tagging Rules: Sentra’s Intuitive Rule Builder allows you to automatically apply sensitivity labels to assets based on your pre-existing tagging rules and or define new ones via the builder UI (see screen below). Sentra imports discovered Microsoft Purview Information Protection (MPIP) labels to speed this process.
  1. Labels Move with the Data: Sensitivity labels created in Sentra can be mapped to Microsoft Purview Information Protection (MPIP) labels and applied to various applications like SharePoint, OneDrive, Teams, Amazon S3, and Azure Blob Containers. Once applied, labels are stored as metadata and travel with the file or data wherever it goes, ensuring consistent protection across platforms and services.
  2. Automatic Labeling: Sentra allows for the automatic application of sensitivity labels based on the data's content. Auto-tagging rules, configured for each sensitivity label, determine which label should be applied during scans for sensitive information.
  3. Support for Structured and Unstructured Data: Sentra enables labeling for files stored in cloud environments such as Amazon S3 or EBS volumes and for database columns in structured data environments like Amazon RDS. By implementing these labeling practices, your organization can track, manage, and protect data with ease while maintaining compliance and safeguarding sensitive information. Whether collaborating across services or storing data in diverse cloud environments, Sentra ensures your labels and protection follow the data wherever it goes.

Applying Sensitivity Labels to Data Assets in Sentra

In today’s rapidly evolving data security landscape, ensuring that your data is properly classified and protected is crucial. One effective way to achieve this is by applying sensitivity labels to your data assets. Sensitivity labels help ensure that data is handled according to its level of sensitivity, reducing the risk of accidental exposure and enabling compliance with data protection regulations.

Below, we’ll walk you through the necessary steps to automatically apply sensitivity labels to your data assets in Sentra. By following these steps, you can enhance your data governance, improve data security, and maintain clear visibility over your organization's sensitive information.

The process involves three key actions:

  1. Create Sensitivity Labels: The first step in applying sensitivity labels is creating them within Sentra. These labels allow you to categorize data assets according to various rules and classifications. Once set up, these labels will automatically apply to data assets based on predefined criteria, such as the types of classifications detected within the data. Sensitivity labels help ensure that sensitive information is properly identified and protected.
  2. Connect Accounts with Data Assets: The next step is to connect your accounts with the relevant data assets. This integration allows Sentra to automatically discover and continuously scan all your data assets, ensuring that no data goes unnoticed. As new data is created or modified, Sentra will promptly detect and categorize it, keeping your data classification up to date and reducing manual efforts.
  3. Apply Classification Tags: Whenever a data asset is scanned, Sentra will automatically apply classification tags to it, such as data classes, data contexts, and sensitivity labels. These tags are visible in Sentra’s data catalog, giving you a comprehensive overview of your data’s classification status. By applying these tags consistently across all your data assets, you’ll have a clear, automated way to manage sensitive data, ensuring compliance and security.

By following these steps, you can streamline your data classification process, making it easier to protect your sensitive information, improve your data governance practices, and reduce the risk of data breaches.

Applying MPIP Labels

In order to apply Microsoft Purview Information Protection (MPIP) labels based on Sentra sensitivity labels, you are required to follow a few additional steps:

  1. Set up the Microsoft Purview integration - which will allow Sentra to import and sync MPIP sensitivity labels.
  2. Create tagging rules - which will allow you to map Sentra sensitivity labels to MPIP sensitivity labels (for example “Very Confidential” in Sentra would be mapped to “ACME - Highly Confidential” in MPIP), and choose to which services this rule would apply (for example, Microsoft 365 and Amazon S3).

Using Sensitivity Labels in Microsoft DLP

Microsoft Purview DLP (as well as all other industry-leading DLP solutions) supports MPIP labels in its policies so admins can easily control and prevent data loss of sensitive data across multiple services and applications.For instance, a MPIP ‘highly confidential’ label may instruct Microsoft Purview DLP to restrict transfer of sensitive data outside a certain geography. Likewise, another similar label could instruct that confidential intellectual property (IP) is not allowed to be shared within Teams collaborative workspaces.Labels can be used to help control access to sensitive data as well. Organizations can set a rule with read permission only for specific tags. For example, only production IAM roles can access production files. Further, for use cases where data is stored in a single store, organizations can estimate the storage cost for each specific tag.

Build a Stronger Foundation with Accurate Data Classification

Effectively tagging sensitive data unlocks significant benefits for organizations, driving improvements across accuracy, efficiency, scalability, and risk management. With precise classification exceeding 95% accuracy and minimal false positives, organizations can confidently label both structured and unstructured data. Automated tagging rules reduce the reliance on manual effort, saving valuable time and resources. Granular, contextual tags enable confident and automated remediation, ensuring operations can scale seamlessly. Additionally, robust data tagging strengthens DLP and compliance strategies by fully leveraging Microsoft Purview’s capabilities. By streamlining these processes, organizations can consistently label and secure data across their entire estate, freeing resources to focus on strategic priorities and innovation.

Read More
Aviv Zisso
November 21, 2024
4
Min Read
Data Security

Achieving Exabyte Scale Enterprise Data Security

Achieving Exabyte Scale Enterprise Data Security

The Growing Challenge for Enterprise Data Security

Enterprises are facing a unique set of challenges when it comes to managing and protecting their data. From my experience with customers, I’ve seen these challenges intensify as data governance frameworks struggle to keep up with evolving environments. Data is not confined to a single location - it’s scattered across different environments, from cloud platforms to on-premises servers and various SaaS applications. This distributed and siloed data stores model, while beneficial for flexibility and scalability, complicates data governance and introduces new security and privacy risks.

Many organizations now manage petabytes of constantly changing information, with new data being created, updated, or shared every second. As this volume expands into the hundreds or even thousands of petabytes (exabytes!), keeping track of it all becomes an overwhelming challenge.

The situation is further complicated by the rapid movement of data. Employees and applications copy, modify, or relocate sensitive information in seconds, often across diverse environments. This includes on-premises systems, multiple cloud platforms, and technologies like PaaS and IaaS. Such rapid data sprawl makes it increasingly difficult to maintain visibility and control over the data, and to keep the data protected with all the required controls, such as encryption and access controls.

The Complexities of Access Control

Alongside data sprawl, there’s also the challenge of managing access. Enterprise data ecosystems support thousands of identities (users, apps, machines) each with different levels of access and permissions. These identities may be spread across multiple departments and accounts, and their data needs are constantly evolving. Tracking and controlling which identity can access which data sets becomes a complex puzzle, one that can expose an organization to risks if not handled with precision.

For any enterprise, having an accurate, up-to-date view of who or what has access to what data (and why) is essential to maintaining security and ensuring compliance. Without this visibility and control, organizations run the risk of unauthorized access and potential data breaches.

The Need for Automated Data Risk Assessment 

In today’s data-driven world, security analysts often discover sensitive data in misconfigured environments—sometimes only after a breach—leading to a time-consuming process of validating data sensitivity, identifying business owners, and initiating remediation. In my work with enterprises, I’ve noticed this process is often further complicated by unclear ownership and inconsistent remediation practices.

With data constantly moving and accessed across diverse environments, organizations face critical questions: 

  • Where is our sensitive data?
  • Who has access? 
  • Are we compliant? 

Addressing these challenges requires a dynamic, always-on approach with trusted classification and automated remediation to monitor risks and enforce protection 24/7.

The Scale of the Problem

For enterprise organizations, scale amplifies every data management challenge. The larger the organization, the more complex it becomes to ensure data visibility, secure access, and maintain compliance. Traditional, human-dependent security approaches often struggle to keep up, leaving gaps that malicious actors exploit. Enterprises need robust, scalable solutions that can adapt to their expanding data needs and provide real-time insights into where sensitive data resides, how it’s used, and where the risks lie.

The Solution: Data Security Platform (DSP)

Sentra’s Cloud-native Data Security Platform (DSP) provides a solution designed to meet these challenges head-on. By continuously identifying sensitive data, its posture, and access points, DSP gives organizations complete control over their data landscape.

Sentra enables security teams to gain full visibility and control of their data while proactively protecting against sensitive data breaches across the public cloud. By locating all data, properly classifying its sensitivity, analyzing how it’s secured (its posture), and monitoring where it’s moving, Sentra helps reduce the “data attack surface” - the sum of all places where sensitive or critical data is stored.

Based on a cloud-native design, Sentra’s platform combines robust capabilities, including Data Discovery and Classification, Data Security Posture Management (DSPM), Data Access Governance (DAG), and Data Detection and Response (DDR). This comprehensive approach to data security ensures that Sentra’s customers can achieve enterprise-scale protection and gain crucial insights into their data. Sentra’s DSP offers a distinct layer of data protection that goes beyond traditional, infrastructure-dependent approaches, making it an essential addition to any organization’s security strategy.

By scaling data protection across multiple clouds and on-premises, Sentra enables organizations to meet the demands of enterprise growth and keep up with evolving business needs. And it does so efficiently, without creating unnecessary burdens on the security teams managing it.

determine the sensitivity of the data timeline

How a Robust DSP Can Handle Scale Efficiently

When selecting a DSP solution, it's essential to consider: How does this product ensure your sensitive data is kept secure no matter where it moves? And how can it scale effectively without driving up costs by constantly combing through every bit of data?

The key is in tailoring the DSP to your unique needs. Each organization, with its variety of environments and security requirements, needs a DSP that can adapt to specific demands. At Sentra, we’ve developed a flexible scanning engine that puts you in control, allowing you to customize what data is scanned, how it is tagged, and when. Our platform incorporates advanced optimization algorithms to keep scanning costs low without compromising on quality.

Priority Scanning

Do you really need to scan all the organization’s data? Do all data stores and assets hold the same priority? A smart DLP solution puts you in control, allowing you to adjust your scanning strategy based on the organization's specific priorities and sensitive data locations and uses. 

For example, some organizations may prioritize scanning employee-generated content, while others might focus on their production environment and perform more frequent scans there. Tailoring your scanning strategy ensures that the most important data is protected without overwhelming resources.

Smart Sampling

Is it necessary to scan every database record and every character in every file? The answer depends on your organization’s risk tolerance. For instance, in a PCI production environment, you might reduce the amount of sampling and scan every byte, while in a development environment you can group and sample data sets that share similar characteristics, allowing for more efficient scanning without compromising on security.

Edit Scan Configuration for data warehouse bucket

Delta scanning (tracking data changes) 

Delta scanning focuses on what matters most by selectively scanning data that poses a higher risk. Instead of re-scanning data that hasn’t changed, delta scanning prioritizes new or modified data, ensuring that resources are used efficiently. This approach helps to reduce scanning costs while keeping your data protection efforts focused on what has changed or been added.

A smart DLP will run efficiently and prioritize “new data” over “old data”, allowing you to optimize your scanning costs.  

On-Demand Data Scans

As you build your scanning strategy, it is important to keep the ability to trigger an immediate scan request. This is handy when you’re fixing security risks and want a short feedback loop to verify your changes. 

This also gives you the ability to prepare for compliance audits effectively by ensuring readiness and accurate and fresh classification.

Data warehouse bucket from Sentra's data security platform

Balancing Scan Speed and Cost

Smart sampling enables a balance between scan speed and cost. By focusing scans on relevant data and optimizing the scanning process, you can keep costs down while maintaining high accuracy and efficiency across your data landscape.

Achieve Scalable Data Protection with Cloud-Native DSPs

As enterprise organizations continue to navigate the complexities of managing vast amounts of data across multiple environments, the need for effective data security strategies becomes increasingly critical. The challenges of access control, risk analysis, and scaling security efforts can overwhelm traditional approaches, making it clear that a more automated, comprehensive solution is essential. A cloud-native Data Security Platform (DSP) offers the agility and efficiency required to meet these demands. 

By incorporating advanced features like smart sampling, delta scanning, and on-demand scan requests, Sentra’s DSP ensures that organizations can continuously monitor, protect, and optimize their data security posture without unnecessary resource strain. Balancing scan frequency, sensitivity and cost efficiency further enhances the ability to scale effectively, providing organizations with the tools they need to manage data risks, remain compliant, and protect sensitive information in an ever-evolving digital landscape.

If you want to learn more, talk to our data security experts and request a demo today.

Read More
David Stuart
October 21, 2024
5
Min Read
Data Sprawl

How Sentra Built a Data Security Platform for the AI Era

How Sentra Built a Data Security Platform for the AI Era

In just three years, Sentra has witnessed the rapid evolution of the data security landscape. What began with traditional on-premise Data Loss Prevention (DLP) solutions has shifted to a cloud-native focus with Data Security Posture Management (DSPM). This marked a major leap in how organizations protect their data, but the evolution didn’t stop there.

The next wave introduced new capabilities like Data Detection and Response (DDR) and Data Access Governance (DAG), pushing the boundaries of what DSPM could offer. Now, we’re entering an era where SaaS Security Posture Management (SSPM) and Artificial Intelligence Security Posture Management (AI-SPM) are becoming increasingly important. 

These shifts are redefining what we’ve traditionally called Data Security Platform (DSP) solutions, marking a significant transformation in the industry. The speed of this evolution speaks to the growing complexity of data security needs and the innovation required to meet them.

The Evolution of Data Security

What Is Driving The Evolution of Data Security?

The evolution of the data security market is being driven by several key macro trends:

  • Digital Transformation and Data Democratization: Organizations are increasingly embracing digital transformation, making data more accessible to various teams and users.
  • Rapid Cloud Adoption: Businesses are moving to the cloud at an unprecedented pace to enhance agility and responsiveness.
  • Explosion of Siloed Data Stores: The growing number of siloed data stores, diverse data technologies, and an expanding user base is complicating data management.
  • Increased Innovation Pace: The rise of artificial intelligence (AI) is accelerating the pace of innovation, creating new opportunities and challenges in data security.
  • Resource Shortages: As organizations grow, the need for automation to keep up with increasing demands has never been more critical.
  • Stricter Data Privacy Regulations: Heightened data privacy laws and stricter breach disclosure requirements are adding to the urgency for robust data protection measures.
Rapid cloud adoption

Similarly, there has been an evolution in the roles involved with the management, governance, and protection of data. These roles are increasingly intertwined and co-dependent as described in our recent blog entitled “Data: The Unifying Force Behind Disparate GRC Functions”. We identify that today each respective function operates within its own domain yet shares ownership of data at its core. As the co-dependency on data increases so does the need for a unifying platform approach to data security.

Sentra has adapted to these changes to align our messaging with industry expectations, buyer requirements, and product/technology advancements.

A Data Security Platform for the AI Era

Sentra is setting the standard with the leading Data Security Platform for the AI Era.

With its cloud-native design, Sentra seamlessly integrates powerful capabilities like Data Discovery and Classification, Data Security Posture Management (DSPM), Data Access Governance (DAG), and Data Detection and Response (DDR) into a comprehensive solution. This allows our customers to achieve enterprise-scale data protection while addressing critical questions about their data.

data security cycle - visibility, context, access, risks, threats

What sets Sentra apart is its connector-less, cloud-native architecture, which effortlessly scales to accommodate multi-petabyte, multi-cloud environments without the administrative burdens typical of connector-based legacy systems. These more labor-intensive approaches often struggle to keep pace and frequently overlook shadow data.

Moreover, Sentra harnesses the power of AI and machine learning to accurately interpret data context and classify data. This not only enhances data security but also ensures the privacy and integrity of data used in Gen- AI applications. We recognized the critical need for accurate and automated Data Discovery and Classification, along with Data Security Posture Management (DSPM), to address the risks associated with data proliferation in a multi-cloud landscape. Based on our customers' evolving needs, we expanded our capabilities to include DAG and DDR. These tools are essential for managing data access, detecting emerging threats, and improving risk mitigation and data loss prevention.

DAG maps the relationships between cloud identities, roles, permissions, data stores, and sensitive data classes. This provides a complete view of which identities and data stores in the cloud may be overprivileged. Meanwhile, DDR offers continuous threat monitoring for suspicious data access activity, providing early warnings of potential breaches.

We grew to support SaaS data repositories including Microsoft 365 (SharePoint, OneDrive, Teams, etc.), G Suite (Gdrive) and leveraged AI/ML to accurately classify data hidden within unstructured data stores.

Sentra’s accurate data sensitivity tagging and granular contextual details allows organizations to enhance the effectiveness of their existing tools, streamline workflows, and automate remediation processes. Additionally, Sentra offers pre-built integrations with various analysis and response tools used across the enterprise, including data catalogs, incident response (IR) platforms, IT service management (ITSM) systems, DLPs, CSPMs, CNAPPs, IAM, and compliance management solutions.

How Sentra Redefines Enterprise Data Security Across Clouds

Sentra has architected a solution that can deliver enterprise-scale data security without the traditional constraints and administrative headaches. Sentra’s cloud-native design easily scales to petabyte data volumes across multi-cloud and on-premises environments. 

The Sentra platform incorporates a few major differentiators that distinguish it from other solutions including:

  • Novel Scanning Technology: Sentra uses inventory files and advanced automatic grouping to create a new entity called “Data Asset”, a group of files that have the same structure, security posture and business function. Sentra automatically reduces billions of files into thousands of data assets (that represent different types of data) continuously, enabling full coverage of 100% of cloud data of petabytes to just several hundreds of thousands of files which need to be scanned (5-6 orders of magnitude less scanning required). Since there is no random sampling involved in the process, all types of data are fully scanned and for differentials on a daily basis. Sentra supports all leading IaaS, PaaS, SaaS and On-premises stores.
  • AI-powered Autonomous Classification: Sentra’s use of AI-powered classification provides approximately 97% classification accuracy of data within unstructured documents and structured data. Additionally, Sentra provides rich data context (distinct from data class or type) about multiple aspects of files, such as data subject residency, business impact, synthetic or real data, and more. Further, Sentra’s classification uses LLMs (inside the customer environment) to automatically learn and adapt based on the unique business context, false positive user inputs, and allows users to add AI-based classifiers using natural language (powered by LLMs). This autonomous learning means users don’t have to customize the system themselves, saving time and helping to keep pace with dynamic data.
  • Data Perimeters / Movement: Sentra DataTreks™ provides the ability to understand data perimeters automatically and detect when data is moving (e.g. copied partially or fully) to a different perimeter. For example, it can detect data similarity/movement from a well protected production environment to a less- protected development environment. This is important for highly dynamic cloud environments and promoting secure data democratization.
  • Data Detection and Response (DDR): Sentra’s DDR module highlights anomalies such as unauthorized data access or unusual data movements in near real-time, integrating alerts into existing tools like ServiceNow or JIRA for quick mitigation.
  • Easy Customization: In addition to ‘learning’ of a customer's unique data types, with Sentra it’s easy to create new classifiers, modify policies, and apply custom tagging labels.

As AI reshapes the digital landscape, it also creates new vulnerabilities, such as the risk of data exposure through AI training processes. The Sentra platform addresses these AI-specific challenges, while continuing to tackle the persistent security issues from the cloud era, providing an integrated solution that ensures data security remains resilient and adaptive.

Use Cases: Solving Complex Problems with Unique Solutions

Sentra’s unique capabilities allow it to serve a broad spectrum of challenging data security, governance and compliance use cases. Two frequently cited DSPM use cases are preventing data breaches and facilitating GenAI technology deployments. With the addition of data privacy compliance, these represent the top three.  

Let's dive deeper into how Sentra's platform addresses specific challenges:

Data Risk Visibility

Sentra’s Data Security Platform enables continuous analysis of your security posture and automates risk assessments across your entire data landscape. It identifies data vulnerabilities across cloud-native and unmanaged databases, data lakes, and metadata catalogs. By automating the discovery and classification of sensitive data, teams can prioritize actions based on the sensitivity and policy guidelines related to each asset. This automation not only saves time but also enhances accuracy, especially when leveraging large language models (LLMs) for detailed data classification.

Security and Compliance Audit

Sentra Data Security Platform can also automate the process of identifying regulatory violations and ensuring adherence to custom and pre-built policies (including policies that map to common compliance frameworks). 

The platform automates the identification of regulatory violations, ensuring compliance with both custom and established policies. It helps keep sensitive data in the right environments, preventing it from traveling to regions that violate retention policies or lack encryption. Unlike manual policy implementation, which is prone to errors, Sentra’s automated approach significantly reduces the risk of misconfiguration, ensuring that teams don’t miss critical activities.

Data Access Governance

Sentra enhances data access governance (DAG) by enforcing appropriate permissions for all users and applications within an organization. By automating the monitoring of access permissions, Sentra mitigates risks such as excessive permissions and unauthorized access. This ensures that teams can maintain least privilege access control, which is essential in a growing data ecosystem.

Minimizing Data and Attack Surface

The platform’s capabilities also extend to detecting unmanaged sensitive data, such as shadow or duplicate assets. By automatically finding and classifying these unknown data points, Sentra minimizes the attack surface, controls data sprawl, and enhances overall data protection.

Secure and Responsible AI

As organizations build new Generative AI applications, Sentra extends its protection to LLM applications, treating them as part of the data attack surface. This proactive management, alongside monitoring of prompts and outputs, addresses data privacy and integrity concerns, ensuring that organizations are prepared for the future of AI technologies.

Insider Risk Management

Sentra effectively detects insider risks by monitoring user access to sensitive information across various platforms. Its Data Detection and Response (DDR) capabilities provide real-time threat detection, analyzing user activity and audit logs to identify unusual patterns.

Data Loss Prevention (DLP)

The platform integrates seamlessly with endpoint DLP solutions to monitor all access activities related to sensitive data. By detecting unauthorized access attempts from external networks, Sentra can prevent data breaches before they escalate, all while maintaining a positive user experience.

Sentra’s robust Data Security Platform offers solutions for these use cases and more, empowering organizations to navigate the complexities of data security with confidence. With a comprehensive approach that combines visibility, governance, and protection, Sentra helps businesses secure their data effectively in today’s dynamic digital environment.

From DSPM to a Comprehensive Data Security Platform

Sentra has evolved beyond being the leading Data Security Posture Management (DSPM) solution; we are now a Cloud-native Data Security Platform (DSP). Today, we offer holistic solutions that empower organizations to locate, secure, and monitor their data against emerging threats. Our mission is to help businesses move faster and thrive in today’s digital landscape.

What sets the Sentra DSP apart is its unique layer of protection, distinct from traditional infrastructure-dependent solutions. It enables organizations to scale their data protection across ever-expanding multi-cloud environments, meeting enterprise demands while adapting to ever-changing business needs—all without placing undue burdens on the teams managing it.

And we continue to progress. In a world rapidly evolving with advancements in AI, the Sentra Data Security Platform stands as the most comprehensive and effective solution to keep pace with the challenges of the AI age. We are committed to developing our platform to ensure that your data security remains robust and adaptive.

 Sentra's Cloud-Native Data Security Platform provides comprehensive data protection for the entire data estate.
 Sentra Cloud-Native Data Security Platform provides comprehensive data protection for the entire data estate.
Read More
decorative ball