All Resources
In this article:
minus iconplus icon
Share the Article

Data Leakage Detection for AWS Bedrock

July 15, 2024
4
 Min Read
Data Security

Amazon Bedrock is a fully managed service that streamlines access to top-tier foundation models (FMs) from premier AI startups and Amazon, all through a single API. This service empowers users to leverage cutting-edge generative AI technologies by offering a diverse selection of high-performance FMs from innovators like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon itself. Amazon Bedrock allows for seamless experimentation and customization of these models to fit specific needs, employing techniques such as fine-tuning and Retrieval Augmented Generation (RAG). 

Additionally, it supports the development of agents capable of performing tasks with enterprise systems and data sources. As a serverless offering, it removes the complexities of infrastructure management, ensuring secure and easy deployment of generative AI features within applications using familiar AWS services, all while maintaining robust security, privacy, and responsible AI standards.

Why Are Enterprises Using AWS Bedrock

Enterprises are increasingly using AWS Bedrock for several key reasons:

  • Diverse Model Selection: Offers access to a curated selection of high-performing foundation models (FMs) from both leading AI startups and Amazon itself, providing a comprehensive range of options to suit various use cases and preferences. This diversity allows enterprises to select the most suitable models for their specific needs, whether they require language generation, image processing, or other AI capabilities.
  • Streamlined Integration: Simplifies the process of adopting and integrating generative AI technologies into existing systems and applications. With its unified API and serverless architecture, enterprises can seamlessly incorporate these advanced AI capabilities without the need for extensive infrastructure management or specialized expertise. This streamlines the development and deployment process, enabling faster time-to-market for AI-powered solutions.
  • Customization Capabilities: Facilitates experimentation and customization, allowing enterprises to fine-tune and adapt the selected models to better align with their unique requirements and data environments. Techniques such as fine-tuning and Retrieval Augmented Generation (RAG) enable enterprises to refine the performance and accuracy of the models, ensuring optimal results for their specific use cases.
  • Security and Compliance Focus: Prioritizes security, privacy, and responsible AI practices, providing enterprises with the confidence that their data and AI deployments are protected and compliant with regulatory standards. By leveraging AWS's robust security infrastructure and compliance measures, enterprises can deploy generative AI applications with peace of mind.

AWS Bedrock Data Privacy & Security Concerns

The rise of AI technologies, while promising transformative and major benefits, also introduces significant security risks. As enterprises increasingly integrate AI into their operations, like with AWS Bedrock, they face challenges related to data privacy, model integrity, and ethical use. AI systems, particularly those involving generative models, can be susceptible to adversarial attacks, unintended data extraction, and unintended biases, which can lead to compromised data security and regulatory violations. 

Training Data Concerns

Training data is the backbone of machine learning and artificial intelligence systems. The quality, diversity, and integrity of this data are critical for building robust models. However, there are significant risks associated with inadvertently using sensitive data in training datasets, as well as the unintended retrieval and leakage of such data. 

These risks can have severe consequences, including breaches of privacy, legal repercussions, and erosion of public trust.

Accidental Usage of Sensitive Data in Training Sets

Inadvertently including sensitive data in training datasets can occur for various reasons, such as insufficient data vetting, poor anonymization practices, or errors in data aggregation. Sensitive data may encompass personally identifiable information (PII), financial records, health information, intellectual property, and more. 

The consequences of training models on such data are multifaceted:

  • Data Privacy Violations: When models are trained on sensitive data, they might inadvertently learn and reproduce patterns that reveal private information. This can lead to direct privacy breaches if the model outputs or intermediate states expose this data.
  • Regulatory Non-Compliance: Many jurisdictions have stringent regulations regarding the handling and processing of sensitive data, such as GDPR in the EU, HIPAA in the US, and others. Accidental inclusion of sensitive data in training sets can result in non-compliance, leading to heavy fines and legal actions.
  • Bias and Ethical Concerns: Sensitive data, if not properly anonymized or aggregated, can introduce biases into the model. For instance, using demographic data can inadvertently lead to models that discriminate against certain groups.

These risks require strong security measures and responsible AI practices to protect sensitive information and comply with industry standards. AWS Bedrock provides a ready solution to power foundation models and Sentra provides a complementary solution to ensure compliance and integrity of data these models use and output. Let’s explore how this combination and each component delivers its respective capility.

Prompt Response Monitoring With Sentra

Sentra can detect sensitive data leakage in near real-time by scanning and classifying all prompt responses generated by AWS Bedrock, by analyzing them using Sentra’s Data Detection and Response (DDR) security module.

Data exfiltration might occur if AWS Bedrock prompt responses are used to return data outside of an organization - for example using a chatbot interface connected directly to a user facing application.

By analyzing the prompt responses, Sentra can ensure that both sensitive data acquired through fine-tuning models and data retrieved using Retrieval-Augmented Generation (RAG) methods are protected. This protection is effective within minutes of any data exfiltration attempt.

To activate the detection module, there are 3 prerequisites:

  1. The customer should enable AWS Bedrock Model Invocation Logging to an S3 destination(instructions here) in the customer environment.
  2. A new Sentra tenant for the customer should be created/set up.
  3. The customer should install the Sentra copy Lambda using Sentra’s Cloudformation template for its DDR module (documentation provided by Sentra).

Once the prerequisites are fulfilled, Sentra will automatically analyze the prompt responses and will be able to provide real-time security threat alerts based on the defined set of policies configured for the customer at Sentra.

Here is the full flow which describes how Sentra scans the prompts in near real-time:

  1. Sentra’s setup involves using AWS Lambda to handle new files uploaded to the Sentra S3 bucket configured in customer cloud, which logs all responses from AWS Bedrock prompts. When a new file arrives, our Lambda function copies it into Sentra’s prompt response buckets.
  2. Next, another S3 trigger kicks off enrichment of each response with extra details needed for detecting sensitive information.
  3. Our real-time data classification engine then gets to work, sorting the data from the responses into categories like emails, phone numbers, names, addresses, and credit card info. It also identifies the context, such as intellectual property or customer data.
  4. Finally, Sentra uses this classified information to spot any sensitive data. We then generate an alert and notify our customers, also sending the alert to any relevant downstream systems.
Data Flow Customer AWS Cloud Sentra

Sentra can push these alerts downstream into 3rd party systems, such as SIEMs, SOARs, ticketing systems, and messaging systems (Slack, Teams, etc.).

Sentra’s data classification engine provides three methods of classification:

  • Regular expressions
  • List classifiers
  • AI models

Further, Sentra allows the customer to add its own classifiers for their own business-specific needs, apart from the 150+ data classifiers which Sentra provides out of the box.

Sentra’s sensitive data detection also provides control for setting a threshold of the amount of sensitive data exfiltrated through Bedrock over time (similar to a rate limit) to reduce the rate of false positives for non-critical exfiltration events.

Example threat sensitive customer data found in Amazon Bedrock response

Conclusion

There is a pressing push for AI integration and automation to enable businesses to improve agility, meet growing cloud service and application demands, and improve user experiences  - but to do so while simultaneously minimizing risks. Early warning to potential sensitive data leakage or breach is critical to achieving this goal.

Sentra's data security platform can be used in the entire development pipeline to classify, test and verify that models do not leak sensitive information, serving the developers, but also helping them to increase confidence among their buyers. By adopting Sentra, organizations gain the ability to build out automation for business responsiveness and improved experiences, with the confidence knowing their most important asset — their data — will remain secure.

If you want to learn more, request a live demo with our data security experts.

Discover Ron’s expertise, shaped by over 20 years of hands-on tech and leadership experience in cybersecurity, cloud, big data, and machine learning. As a serial entrepreneur and seed investor, Ron has contributed to the success of several startups, including Axonius, Firefly, Guardio, Talon Cyber Security, and Lightricks, after founding a company acquired by Oracle.

Subscribe

Latest Blog Posts

Yair Cohen
January 28, 2025
5
Min Read
Data Security

Data Protection and Classification in Microsoft 365

Data Protection and Classification in Microsoft 365

Imagine the fallout of a single misstep—a phishing scam tricking an employee into sharing sensitive data. The breach doesn’t just compromise information; it shakes trust, tarnishes reputations, and invites compliance penalties. With data breaches on the rise, safeguarding your organization’s Microsoft 365 environment has never been more critical.

Data classification helps prevent such disasters. This article provides a clear roadmap for protecting and classifying Microsoft 365 data. It explores how data is saved and classified, discusses built-in tools for protection, and covers best practices for maintaining  Microsoft 365 data protection.

How Is Data Saved and Classified in Microsoft 365? 

Microsoft 365 stores data across tools and services. For example, emails are stored in Exchange Online, while documents and data for collaboration are found in Sharepoint and Teams, and documents or files for individual users are stored in OneDrive. This data is primarily unstructured—a format ideal for documents and images but challenging for identifying sensitive information.

All of this data is largely stored in an unstructured format typically used for documents and images. This format not only allows organizations to store large volumes of data efficiently; it also enables seamless collaboration across teams and departments. However, as unstructured data cannot be neatly categorized into tables or columns, it becomes cumbersome to discern what data is sensitive and where it is stored. 

To address this, Microsoft 365 offers a data classification dashboard that helps classify data of varying levels of sensitivity and data governed by different regulatory compliance frameworks. But how does Microsoft identify sensitive information with unstructured data? 

Microsoft employs advanced technologies such as RegEx scans, trainable classifiers, Bloom filters, and data classification graphs to identify and classify data as public, internal, or confidential. Once classified, data protection and governance policies are applied based on sensitivity and retention labels.

Data classification is vital for understanding, protecting, and governing data. With your ​​Microsoft 365 data classified appropriately, you can ensure seamless collaboration without risking data exposure.

Why data classification is important
Figure 1: Why data classification is important

Microsoft 365 Data Protection and Classification Tools

Microsoft 365 includes several key tools and frameworks for classifying and securing data. Here are a few. 

Microsoft Purview 

Microsoft Purview is a cornerstone of data classification and protection within Microsoft 365.

Key Features: 

  • Over 200+ prebuilt classifiers and the ability to create custom classifiers tailored to specific business needs.
  • Purview auto-classifies data across Microsoft 365 and other supported apps, such as Adobe Photoshop and Adobe PDF, while users work on them.
  • Sensitivity labels that apply encryption, watermarks, and access restrictions to secure sensitive data.
  • Double Key Encryption to ensure that sensitivity labels persist even when file formats change.
Sensitivity watermarks in M365
Figure 2: Sensitivity watermarks in Microsoft 365 (Source: Microsoft)
Figure 3: Sensitivity labels for information protection policies in Microsoft 365 (Source: Microsoft)

Purview autonomously applies sensitivity labels like "confidential" or "highly confidential" based on preconfigured policies, ensuring optimal access control. These labels persist even when files are shared or converted to other formats, such as from Word to PDF.

Additionally, Purview’s data loss prevention (DLP) policies prevent unauthorized sharing or deletion of sensitive data by flagging and reporting violations in real time. For example, if a sensitive file is shared externally, Purview can immediately block the transfer and alert your security team.

Sensitivity labeling for announcements in M365
Figure 4: Preventing data loss by using sensitivity labels (Source: Microsoft)

Microsoft Defender 

Microsoft Defender for Cloud Apps strengthens security by providing a cloud app discovery window to identify applications accessing data. Once identified, it classifies files within these applications based on sensitivity, applying appropriate protections as per preconfigured policies.

Microsoft Defender for Cloud - data sensitivity classification
Figure 5: Microsoft Defender data sensitivity classification (Source: Microsoft)

Key Features:

  • Data Sensitivity Classification: Defender identifies sensitive files and assigns protection based on sensitivity levels, ensuring compliance and reducing risk. For example, it labels files containing credit card numbers, personal identifiers, or confidential business information with sensitivity classifications like "Highly Confidential."
  • Threat Detection and Response: Defender detects known threats targeted at sensitive data in emails, collaboration tools (like SharePoint and Teams), URLs, file attachments, and OneDrive. If an admin account is compromised, Microsoft Defender immediately spots the threat, disables the account, and notifies your IT team to prevent significant damage.
  • Automation: Defender automates incident response, ensuring that malicious activities are flagged and remediated promptly.

Intune 

Microsoft Intune provides comprehensive device management and data protection, enabling organizations to enforce policies that safeguard sensitive information on both managed and unmanaged smartphones, computers, and other devices.

Key Features:

  • Customizable Compliance Policies: Intune allows organizations to enforce device compliance policies that align with internal and regulatory standards. For example, it can block non-compliant devices from accessing sensitive data until issues are resolved.
  • Data Access Control: Intune disallows employees from accessing corporate data on compromised devices or through insecure apps, such as those not using encryption for emails.
  • Endpoint Security Management: By integrating with Microsoft Defender, Intune provides endpoint protection and automated responses to detected threats, ensuring only secure devices can access your organization’s network.
Endpoint security overview
Figure 6: Intune device management portal (Source: Microsoft)

Intune supports organizations by enabling the creation and enforcement of device compliance policies tailored to both internal and regulatory standards. These policies detect non-compliant devices, issue alerts, and restrict access to sensitive data until compliance is restored. Conditional access ensures that only secure and compliant devices connect to your network.

Microsoft 365-managed apps like Outlook, Word, and Excel. These policies define which apps can access specific data, such as emails, and regulate permissible actions, including copying, pasting, forwarding, and taking screenshots. This layered security approach safeguards critical information while maintaining seamless app functionality.

Does Microsoft have a DLP Solution?

Microsoft 365’s data loss prevention (DLP) policies represent the implementation of the zero-trust framework. These policies aim to prevent oversharing, accidental deletion, and data leaks across Microsoft 365 services, including Exchange Online, SharePoint, Teams, and OneDrive, as well as Windows and macOS devices.

Retention policies, deployed via retention labels, help organizations manage the data lifecycle effectively.These labels ensure that data is retained only as long as necessary to meet compliance requirements, reducing the risks associated with prolonged data storage.

How DLP policies work
Figure 7: How DLP policies work (Source: Microsoft)

What is the Microsoft 365 Compliance Center?

The Microsoft 365 compliance center offers tools to manage policies and monitor data access, ensuring adherence to regulations. For example, DLP policies allow organizations to define specific automated responses when certain regulatory requirements—like GDPR or HIPAA—are violated.

Microsoft Purview Compliance Portal: This portal ensures sensitive data is classified, stored, retained, and used in adherence to relevant compliance regulations. Meanwhile, Microsoft 365’s MPIP ensures that only authorized users can access sensitive information, whether collaborating on Teams or sharing files in SharePoint. Together, these tools enable secure collaboration while keeping regulatory compliance at the forefront.

12 Best Practices for Microsoft 365 Data Protection and Classification

To achieve effective Microsoft 365 data protection and classification, organizations should follow these steps:

  1. Create precise labels, tags, and classification policies; don’t rely solely on prebuilt labels and policies, as definitions of sensitive data may vary by context.
  2. Automate labeling to minimize errors and quickly capture new datasets.
  3. Establish and enforce data use policies and guardrails automatically to reduce risks of data breaches, compliance failures, and insider threat risks. 
  4. Regularly review and update data classification and usage policies to reflect evolving threats, new data storage, and changing compliance laws.o policies must stay up to date to remain effective.
  5. Define context-appropriate DLP policies based on your business needs; factoring in remote work, ease of collaboration, regional compliance standards, etc.
  6. Apply encryption to safeguard data inside and outside your organization.
  7. Enforce role-based access controls (RBAC) and least privilege principles to ensure users only have access to data and can perform actions within the scope of their roles. This limits the risk of accidental data exposure, deletion, and cyberattacks.
  8. Create audit trails of user activity around data and maintain version histories to prevent and track data loss.
  9. Follow the 3-2-1 backup rule: keep three copies of your data, store two on different media, and one offsite.
  10. Leverage the full suite of Microsoft 365 tools to monitor sensitive data, detect real-time threats, and secure information effectively.
  11. Promptly resolve detected risks to mitigate attacks early.
  12. Ensure data protection and classification policies do not impede collaboration to prevent teams from creating shadow data, which puts your organization at risk of data breaches.

For example, consider #3. If a disgruntled employee starts transferring sensitive intellectual property to external devices in preparation for a ransomware attack, having the right data use policies in place will allow your organization to stop the threat before it escalates. 

Microsoft 365 Data Protection and Classification Limitations

Despite Microsoft 365’s array of tools, there are some key gaps. AI/ML-powered data security posture management (DSPM) and data detection and response (DDR) solutions fill these easily.

The top limitations of Microsoft 365 data protection and classification are the following:

  • Limitations Handling Large Volumes of Unstructured Data: Purview struggles to automatically classify and apply sensitivity labels to diverse and vast datasets, particularly in Azure services or non-Microsoft clouds. 
  • Contextless Data Classification: Without considering context, Microsoft Purview’s MPIP can lead to false positives (over-labeling non-sensitive data) or false negatives (missing sensitive data). 
  • Inconsistent Labeling Across Providers: Microsoft tools are limited to its ecosystem, making it difficult for enterprises using multi-cloud environments to enforce consistent organization-wide labeling.
  • Minimal Threat Response Capabilities: Microsoft Defender relies heavily on IT teams for remediation and lacks robust autonomous responses.
  • Sporadic Interruption of User Activity: Inaccurate DLP classifications can disrupt legitimate data transfers in collaboration channels, frustrating employees and increasing the risk of shadow IT workarounds.

Sentra Fills the Gap: Protection Measures to Address Microsoft 365 Data Risks

Today’s businesses must get ahead of data risks by instituting Microsoft 365 data protection and classification best practices such as least privilege access and encryption. Otherwise, they risk data exposure, damaging cyberattacks, and hefty compliance fines. However, implementing these best practices depends on accurate and context-sensitive data classification in Microsoft 365. 

Sentra’s Cloud-native Data Security Platform enables secure collaboration and file sharing across all Microsoft 365 services including SharePoint, OneDrive, Teams, OneNote, Office, Word, Excel, and more. Sentra provides data access governance, shadow data detection, and privacy audit automation for M365 data. It also evaluates risks and alerts for policy or regulatory violations.

Specifically, Sentra complements Purview in the following ways:

  1. Sentra Data Detection & Response (DDR): Continuously monitors for threats such as data exfiltration, weakening of data security posture, and other suspicious activities in real time. While Purview Insider Risk Management focuses on M365 applications, Sentra DDR extends these capabilities to Azure and non-Microsoft applications.
  2. Data Perimeter Protection: Sentra automatically detects and identifies an organization’s data perimeters across M365, Azure, and non-Microsoft clouds. It alerts “organizations when sensitive data leaves its boundaries, regardless of how it is copied or exported.
  3. Shadow Data Reduction: Using context-based analysis powered by Sentra’s DataTreks™, the platform identifies unnecessary shadow data, reducing the attack surface and improving data governance.
  4. Training Data Monitoring: Sentra monitors training datasets continuously, identifying privacy violations of sensitive PII or real-time threats like training data poisoning or suspicious access.
  5. Data Access Governance: Sentra adds to Purview’s data catalog by including metadata on users and applications with data access permissions, ensuring better governance.
  6. Automated Privacy Assessments: Sentra automates privacy evaluations aligned with frameworks like GDPR and CCPA, seamlessly integrating them into Purview’s data catalog.
  7. Rich Contextual Insights: Sentra delivers detailed data context to understand usage, sensitivity, movement, and unique data types. These insights enable precise risk evaluation, threat prioritization, and remediation, and they can be consumed via an API by DLP systems, SIEMs, and other tools.

By addressing these gaps, Sentra empowers organizations to enhance their Microsoft 365 data protection and classification strategies. Request a demo to experience Sentra’s innovative solutions firsthand.

Read More
Team Sentra
December 26, 2024
5
Min Read
Data Security

Create an Effective RFP for a Data Security Platform & DSPM

Create an Effective RFP for a Data Security Platform & DSPM

This RFP Guide is designed to help organizations create their own RFP for selection of Cloud-native Data Security Platform (DSP) & Data Security Posture Management (DSPM) solutions. The purpose is to identify key essential requirements  that will enable effective discovery, classification, and protection of sensitive data across complex environments, including in public cloud infrastructures and in on-premises environments.

Instructions for Vendors

Each section provides essential and recommended requirements to achieve a best practice capability. These have been accumulated over dozens of customer implementations.  Customers may also wish to include their own unique requirements specific to their industry or data environment.

1. Data Discovery & Classification

Requirement Details
Shadow Data Detection Can the solution discover and identify shadow data across any data environment (IaaS, PaaS, SaaS, OnPrem)?
Sensitive Data Classification Can the solution accurately classify sensitive data, including PII, financial data, and healthcare data?
Efficient Scanning Does the solution support smart sampling of large file shares and data lakes to reduce and optimize the cost of scanning, yet provide full scan coverage in less time and lower cloud compute costs?
AI-based Classification Does the solution leverage AI/ML to classify data in unstructured documents and stores (Google Drive, OneDrive, SharePoint, etc) and achieve more than 95% accuracy?
Data Context Can the solution discern and ‘learn’ the business purpose (employee data, customer data, identifiable data subjects, legal data, synthetic data, etc.) of data elements and tag them accordingly?
Data Store Compatibility Which data stores (e.g., AWS S3, Google Cloud Storage, Azure SQL, Snowflake data warehouse, On Premises file shares, etc.) does the solution support for discovery?
Autonomous Discovery Can the solution discover sensitive data automatically and continuously, ensuring up to date awareness of data presence?
Data Perimeters Monitoring Can the solution track data movement between storage solutions and detect risky and non-compliant data transfers and data sprawl?

2. Data Access Governance

Requirement Details
Access Controls Does the solution map access of users and non-human identities to data based on sensitivity and sensitive information types?
Location Independent Control Does the solution help organizations apply least privilege access regardless of data location or movement?
Identity Activity Monitoring Does the solution identify over-provisioned, unused or abandoned identities (users, keys, secrets) that create unnecessary exposures?
Data Access Catalog Does the solution provide an intuitive map of identities, their access entitlements (read/write permissions), and the sensitive data they can access?
Integration with IAM Providers Does the solution integrate with existing Identity and Access Management (IAM) systems?

3. Posture, Risk Assessment & Threat Monitoring

Requirement Details
Risk Assessment Can the solution assess data security risks and assign risk scores based on data exposure and data sensitivity?
Compliance Frameworks Does the solution support compliance with regulatory requirements such as GDPR, CCPA, and HIPAA?
Similar Data Detection Does the solution identify data that has been copied, moved, transformed or otherwise modified that may disguise its sensitivity or lessen its security posture?
Automated Alerts Does the solution provide automated alerts for policy violations and potential data breaches?
Data Loss Prevention (DLP) Does the solution include DLP features to prevent unauthorized data exfiltration?
3rd Party Data Loss Prevention (DLP) Does the solution integrate with 3rd party DLP solutions?
User Behavior Monitoring Does the solution track and analyze user behaviors to identify potential insider threats or malicious activity?
Anomaly Detection Does the solution establish a baseline and use machine learning or AI to detect anomalies in data access or movement?

4. Incident Response & Remediation

Requirement Details
Incident Management Can the solution provide detailed reports, alert details, and activity/change history logs for incident investigation?
Automated Response Does the solution support automated incident response, such as blocking malicious users or stopping unauthorized data flows (via API integration to native cloud tools or other)?
Forensic Capabilities Can the solution facilitate forensic investigation, such as data access trails and root cause analysis?
Integration with SIEM Can the solution integrate with existing Security Information and Event Management (SIEM) or other analysis systems?

5. Infrastructure & Deployment

Requirement Details
Deployment Models Does the solution support flexible deployment models (on-premise, cloud, hybrid)? Is the solution agentless?
Cloud Native Does the solution keep all data in the customer’s environment, performing classification via serverless functions? (ie. no data is ever removed from customer environment - only metadata)
Scalability Can the solution scale to meet the demands of large enterprises with multi-petabyte data volumes?
Performance Impact Does the solution work asynchronously without performance impact on the data production environment?
Multi-Cloud Support Does the solution provide unified visibility and management across multiple cloud providers and hybrid environments?

6. Operations & Support

Requirement Details
Onboarding Does the solution vendor assist customers with onboarding? Does this include assistance with customization of policies, classifiers, or other settings?
24/7 Support Does the vendor provide 24/7 support for addressing urgent security issues?
Training & Documentation Does the vendor provide training and detailed documentation for implementation and operation?
Managed Services Does the vendor (or its partners) offer managed services for organizations without dedicated security teams?
Integration with Security Tools Can the solution integrate with existing security tools, such as firewalls, DLP systems, and endpoint protection systems?

7. Pricing & Licensing

Requirement Details
Pricing Model What is the pricing structure (e.g., per user, per GB, per endpoint)?
Licensing What licensing options are available (e.g., subscription, perpetual)?
Additional Costs Are there additional costs for support, maintenance, or feature upgrades?

Conclusion

This RFP template is designed to facilitate a structured and efficient evaluation of DSP and DSPM solutions. Vendors are encouraged to provide comprehensive and transparent responses to ensure an accurate assessment of their solution’s capabilities.

Sentra’s cloud-native design combines powerful Data Discovery and Classification, DSPM, DAG, and DDR capabilities into a complete Data Security Platform (DSP). With this, Sentra customers achieve enterprise-scale data protection and do so very efficiently - without creating undue burdens on the personnel who must manage it.

To learn more about Sentra’s DSP, request a demo here and choose a time for a meeting with our data security experts. You can also choose to download the RFP as a pdf.

Read More
Gilad Golani
December 16, 2024
4
Min Read
Data Security

Best Practices: Automatically Tag and Label Sensitive Data

Best Practices: Automatically Tag and Label Sensitive Data

The Importance of Data Labeling and Tagging

In today's fast-paced business environment, data rarely stays in one place. It moves across devices, applications, and services as individuals collaborate with internal teams and external partners. This mobility is essential for productivity but poses a challenge: how can you ensure your data remains secure and compliant with business and regulatory requirements when it's constantly on the move?

Why Labeling and Tagging Data Matters

Data labeling and tagging provide a critical solution to this challenge. By assigning sensitivity labels to your data, you can define its importance and security level within your organization. These labels act as identifiers that abstract the content itself, enabling you to manage and track the data type without directly exposing sensitive information. With the right labeling, organizations can also control access in real-time.

For example, labeling a document containing social security numbers or credit card information as Highly Confidential allows your organization to acknowledge the data's sensitivity and enforce appropriate protections, all without needing to access or expose the actual contents.

Why Sentra’s AI-Based Classification Is a Game-Changer

Sentra’s AI-based classification technology enhances data security by ensuring that the sensitivity labels are applied with exceptional accuracy. Leveraging advanced LLM models, Sentra enhances data classification with context-aware capabilities, such as:

  • Detecting the geographic residency of data subjects.
  • Differentiating between Customer Data and Employee Data.
  • Identifying and treating Synthetic or Mock Data differently from real sensitive data.

This context-based approach eliminates the inefficiencies of manual processes and seamlessly scales to meet the demands of modern, complex data environments. By integrating AI into the classification process, Sentra empowers teams to confidently and consistently protect their data—ensuring sensitive information remains secure, no matter where it resides or how it is accessed.

Benefits of Labeling and Tagging in Sentra

Sentra enhances your ability to classify and secure data by automatically applying sensitivity labels to data assets. By automating this process, Sentra removes the manual effort required from each team member—achieving accuracy that’s only possible through a deep understanding of what data is sensitive and its broader context.

Here are some key benefits of labeling and tagging in Sentra:

  1. Enhanced Security and Loss Prevention: Sentra’s integration with Data Loss Prevention (DLP) solutions prevents the loss of sensitive and critical data by applying the right sensitivity labels. Sentra’s granular, contextual tags help to provide the detail necessary to action remediation automatically so that operations can scale.
  2. Easily Build Your Tagging Rules: Sentra’s Intuitive Rule Builder allows you to automatically apply sensitivity labels to assets based on your pre-existing tagging rules and or define new ones via the builder UI (see screen below). Sentra imports discovered Microsoft Purview Information Protection (MPIP) labels to speed this process.
  1. Labels Move with the Data: Sensitivity labels created in Sentra can be mapped to Microsoft Purview Information Protection (MPIP) labels and applied to various applications like SharePoint, OneDrive, Teams, Amazon S3, and Azure Blob Containers. Once applied, labels are stored as metadata and travel with the file or data wherever it goes, ensuring consistent protection across platforms and services.
  2. Automatic Labeling: Sentra allows for the automatic application of sensitivity labels based on the data's content. Auto-tagging rules, configured for each sensitivity label, determine which label should be applied during scans for sensitive information.
  3. Support for Structured and Unstructured Data: Sentra enables labeling for files stored in cloud environments such as Amazon S3 or EBS volumes and for database columns in structured data environments like Amazon RDS. By implementing these labeling practices, your organization can track, manage, and protect data with ease while maintaining compliance and safeguarding sensitive information. Whether collaborating across services or storing data in diverse cloud environments, Sentra ensures your labels and protection follow the data wherever it goes.

Applying Sensitivity Labels to Data Assets in Sentra

In today’s rapidly evolving data security landscape, ensuring that your data is properly classified and protected is crucial. One effective way to achieve this is by applying sensitivity labels to your data assets. Sensitivity labels help ensure that data is handled according to its level of sensitivity, reducing the risk of accidental exposure and enabling compliance with data protection regulations.

Below, we’ll walk you through the necessary steps to automatically apply sensitivity labels to your data assets in Sentra. By following these steps, you can enhance your data governance, improve data security, and maintain clear visibility over your organization's sensitive information.

The process involves three key actions:

  1. Create Sensitivity Labels: The first step in applying sensitivity labels is creating them within Sentra. These labels allow you to categorize data assets according to various rules and classifications. Once set up, these labels will automatically apply to data assets based on predefined criteria, such as the types of classifications detected within the data. Sensitivity labels help ensure that sensitive information is properly identified and protected.
  2. Connect Accounts with Data Assets: The next step is to connect your accounts with the relevant data assets. This integration allows Sentra to automatically discover and continuously scan all your data assets, ensuring that no data goes unnoticed. As new data is created or modified, Sentra will promptly detect and categorize it, keeping your data classification up to date and reducing manual efforts.
  3. Apply Classification Tags: Whenever a data asset is scanned, Sentra will automatically apply classification tags to it, such as data classes, data contexts, and sensitivity labels. These tags are visible in Sentra’s data catalog, giving you a comprehensive overview of your data’s classification status. By applying these tags consistently across all your data assets, you’ll have a clear, automated way to manage sensitive data, ensuring compliance and security.

By following these steps, you can streamline your data classification process, making it easier to protect your sensitive information, improve your data governance practices, and reduce the risk of data breaches.

Applying MPIP Labels

In order to apply Microsoft Purview Information Protection (MPIP) labels based on Sentra sensitivity labels, you are required to follow a few additional steps:

  1. Set up the Microsoft Purview integration - which will allow Sentra to import and sync MPIP sensitivity labels.
  2. Create tagging rules - which will allow you to map Sentra sensitivity labels to MPIP sensitivity labels (for example “Very Confidential” in Sentra would be mapped to “ACME - Highly Confidential” in MPIP), and choose to which services this rule would apply (for example, Microsoft 365 and Amazon S3).

Using Sensitivity Labels in Microsoft DLP

Microsoft Purview DLP (as well as all other industry-leading DLP solutions) supports MPIP labels in its policies so admins can easily control and prevent data loss of sensitive data across multiple services and applications.For instance, a MPIP ‘highly confidential’ label may instruct Microsoft Purview DLP to restrict transfer of sensitive data outside a certain geography. Likewise, another similar label could instruct that confidential intellectual property (IP) is not allowed to be shared within Teams collaborative workspaces. Labels can be used to help control access to sensitive data as well. Organizations can set a rule with read permission only for specific tags. For example, only production IAM roles can access production files. Further, for use cases where data is stored in a single store, organizations can estimate the storage cost for each specific tag.

Build a Stronger Foundation with Accurate Data Classification

Effectively tagging sensitive data unlocks significant benefits for organizations, driving improvements across accuracy, efficiency, scalability, and risk management. With precise classification exceeding 95% accuracy and minimal false positives, organizations can confidently label both structured and unstructured data. Automated tagging rules reduce the reliance on manual effort, saving valuable time and resources. Granular, contextual tags enable confident and automated remediation, ensuring operations can scale seamlessly. Additionally, robust data tagging strengthens DLP and compliance strategies by fully leveraging Microsoft Purview’s capabilities. By streamlining these processes, organizations can consistently label and secure data across their entire estate, freeing resources to focus on strategic priorities and innovation.

Read More
decorative ball