All Resources
In this article:
minus iconplus icon
Share the Article

How Sentra Accurately Classifies Sensitive Data at Scale

July 30, 2024
5
 Min Read
Data Security

Background on Classifying Different Types of Data

It’s first helpful to review the primary types of data we need to classify - Structured and Unstructured Data and some of the historical challenges associated with analyzing and accurately classifying it.

What Is Structured Data?

Structured data has a standardized format that makes it easily accessible for both software and humans. Typically organized in tables with rows and/or columns, structured data allows for efficient data processing and insights. For instance, a customer data table with columns for name, address, customer-ID and phone number can quickly reveal the total number of customers and their most common localities.

Moreover, it is easier to conclude that the number under the phone number column is a phone number, while the number under the ID is a customer-ID. This contrasts with unstructured data, in which the context of each word is not straightforward. 

What Is Unstructured Data?

Unstructured data, on the other hand, refers to information that is not organized according to a preset model or schema, making it unsuitable for traditional relational databases (RDBMS). This type of data constitutes over 80% of all enterprise data, and 95% of businesses prioritize its management. The volume of unstructured data is growing rapidly, outpacing the growth rate of structured databases.

Examples of unstructured data include:

  • Various business documents
  • Text and multimedia files
  • Email messages
  • Videos and photos
  • Webpages
  • Audio files

While unstructured data stores contain valuable information that often is essential to the business and can guide business decisions, unstructured data classification has historically been challenging. However, AI and machine learning have led to better methods to understand the data content and uncover embedded sensitive data within them.

The division to structured and unstructured is not always a clear cut. For example, an unstructured object like a docx document can contain a table, while each structured data table can contain cells with a lot of text which on its own is unstructured. Moreover there are cases of semi-structured data. All of these considerations are part of Sentra’s data classification tool and beyond the scope of this blog.

Data Classification Methods & Models 

Applying the right data classification method is crucial for achieving optimal performance and meeting specific business needs. Sentra employs a versatile decision framework that automatically leverages different classification models depending on the nature of the data and the requirements of the task. 

We utilize two primary approaches: 

  1. Rule-Based Systems
  2. Large Language Models (LLMs)

Rule-Based Systems 

Rule-based systems are employed when the data contains entities that follow specific, predictable patterns, such as email addresses or checksum-validated numbers. This method is advantageous due to its fast computation, deterministic outcomes, and simplicity, often  providing the most accurate results for well-defined scenarios.

Due to their simplicity, efficiency, and deterministic nature, Sentra uses rule-based models whenever possible for data classification. These models are particularly effective in structured data environments, which possess invaluable characteristics such as inherent structure and repetitiveness.

For instance, a table named "Transactions" with a column labeled "Credit Card Number" allows for straightforward logic to achieve high accuracy in determining that the document contains credit card numbers. Similarly, the uniformity in column values can help classify a column named "Abbreviations" as 'Country Name Abbreviations' if all values correspond to country codes.

Sentra also uses rule-based labeling for document and entity detection in simple cases, where document properties provide enough information. Customer-specific rules and simple patterns with strong correlations to certain labels are also handled efficiently by rule-based models.

Large Language Models (LLMs)

Large Language Models (LLMs) such as BERT, GPT, and LLaMa represent significant advancements in natural language processing, each with distinct strengths and applications. BERT (Bidirectional Encoder Representations from Transformers) is designed for fine-grained understanding of text by processing it bidirectionally, making it highly effective for tasks like Named Entity Recognition (NER) when trained on large, labeled datasets.

In contrast, autoregressive models like the famous GPT (Generative Pre-trained Transformer) and Llama (Large Language Model Meta AI) excel in generating and understanding text with minimal additional training. These models leverage extensive pre-training on diverse data to perform new tasks in a few-shot or zero-shot manner. Their rich contextual understanding, ability to follow instructions, and generalization capabilities allow them to handle tasks with less dependency on large labeled datasets, making them versatile and powerful tools in the field of NLP. However, their great value comes with a cost of computational power, so they should be used with care and only when necessary.

Applications of LLMs at Sentra

Sentra uses LLMs for both Named Entity Recognition (NER) and document labeling tasks. The input to the models is similar, with minor adjustments, and the output varies depending on the task:

  • Named Entity Recognition (NER): The model labels each word or sentence in the text with its correct entity (which Sentra refers to as a data class).
  • Document Labels: The model labels the entire text with the appropriate label (which Sentra refers to as a data context).
  • Continuous Automatic Analysis: Sentra uses its LLMs to continuously analyze customer data, help our analysts find potential mistakes, and to suggest new entities and document labels to be added to our classification system.

Here you can see an example of how Sentra classifies personal information.
Note: Entity refers to data classes on our dashboard
Document labels refers to data context on our dashboard

Sentra’s Generative LLM Inference Approaches

An inference approach in the context of machine learning involves using a trained model to make predictions or decisions based on new data. This is crucial for practical applications where we need to classify or analyze data that wasn't part of the original training set. 

When working with complex or unstructured data, it's crucial to have effective methods for interpreting and classifying the information. Sentra employs Generative LLMs for classifying complex or unstructured data. Sentra’s main approaches to generative LLM inference are as follows:

Supervised Trained Models (e.g., BERT)

In-house trained models are used when there is a need for high precision in recognizing domain-specific entities and sufficient relevant data is available for training. These models offer customization to capture the subtle nuances of specific datasets, enhancing accuracy for specialized entity types. These models are transformer-based deep neural networks with a “classic” fixed-size input and a well-defined output size, in contrast to generative models. Sentra uses the BERT architecture, modified and trained on our in-house labeled data, to create a model well-suited for classifying specific data types. 

This approach is advantageous because:

  • In multi-category classification, where a model needs to classify an object into one of many possible categories, the model outputs a vector the size of the number of categories, n. For example, when classifying a text document into categories like ["Financial," "Sports," "Politics," "Science," "None of the above"], the output vector will be of size n=5. Each coordinate of the output vector represents one of the categories, and the model's output can be interpreted as the likelihood of the input falling into one of these categories.
  • The BERT model is well-designed for fine-tuning specific classification tasks. Changing or adding computation layers is straightforward and effective.
  • The model size is relatively small, with around 110 million parameters requiring less than 500MB of memory, making it both possible to fine-tune the model’s weights for a wide range of tasks, and more importantly - run in production at small computation costs.
  • It has proven state-of-the-art performance on various NLP tasks like GLUE (General Language Understanding Evaluation), and Sentra’s experience with this model shows excellent results.

Zero-Shot Classification

One of the key techniques that Sentra has recently started to utilize is zero-shot classification, which excels in interpreting and classifying data without needing pre-trained models. This approach allows Sentra to efficiently and precisely understand the contents of various documents, ensuring high accuracy in identifying sensitive information.

The comprehensive understanding of English (and almost any language) enables us to classify objects customized to a customer's needs without creating a labeled data set. This not only saves time by eliminating the need for repetitive training but also proves crucial in situations where defining specific cases for detection is challenging. When handling sensitive or rare data, this zero-shot and few-shot capability is a significant advantage.

Our use of zero-shot classification within LLMs significantly enhances our data analysis capabilities. By leveraging this method, we achieve an accuracy rate with a false positive rate as low as three to five percent, eliminating the need for extensive pre-training.

Sentra’s Data Sensitivity Estimation Methodologies

Accurate classification is only a (very crucial) step to determine if a document is sensitive. At the end of the day, a customer must be able to also discern whether a document contains the addresses, phone numbers or emails of the company’s offices, or the company’s clients.

Accumulated Knowledge

Sentra has developed domain expertise to predict which objects are generally considered more sensitive. For example, documents with login information are more sensitive compared to documents containing random names. 

Sentra has developed the main expertise based on our collected AI analysis over time.

How does Sentra accumulate the knowledge? (is it via AI/ML?)

Sentra accumulates knowledge both from combining insights from our experience with current customers and their needs with machine learning models that continuously improve based on the data they are trained with over time.

Customer-Specific Needs

Sentra tailors sensitivity models to each customer’s specific needs, allowing feedback and examples to refine our models for optimal results. This customization ensures that sensitivity estimation models are precisely tuned to each customer’s requirements.

What is an example of a customer-specific need?

For instance, one of our customers required a particular combination of PII (personally identifiable information) and NPPI (nonpublic personal information). We tailored our solution by creating a composite classifier to meet their needs by designating documents containing these combinations as having a higher sensitivity level.

Sentra’s sensitivity assessment (that drives classification definition) can be based on detected data classes, document labels, and detection volumes, which triggers extra analysis from our system if needed.

Conclusion

In summary, Sentra’s comprehensive approach to data classification and sensitivity estimation ensures precise and adaptable handling of sensitive data, supporting robust data security at scale. With accurate, granular data classification, security teams can confidently proceed to remediation steps without need for further validation - saving time and streamlining processes.  Further, accurate tags allow for automation - by sharing contextual sensitivity data with upstream controls (ex. DLP systems) and remediation workflow tools (ex. ITSM or SOAR).

Additionally, our research and development teams stay abreast of the rapid advancements in Generative AI, particularly focusing on Large Language Models (LLMs). This proactive approach to data classification ensures our models not only meet but often exceed industry standards, delivering state-of-the-art performance while minimizing costs. Given the fast-evolving nature of LLMs, it is highly likely that the models we use today—BERT, GPT, Mistral, and Llama—will soon be replaced by even more advanced, yet-to-be-published technologies.

<blogcta-big>

After earning a BSc in Mathematics and a BSc in Industrial Engineering, followed by an MSc in Computer Science with a thesis in Machine Learning theory, Hanan has spent the last five years training models for feature-based and computer vision problems. Driven by the motivation to deliver real-world value through his expertise, he leverages his strong theoretical background and hands-on experience to explore and implement new methodologies and technologies in machine learning. At Sentra, one of his main focuses is leveraging large language models (LLMs) for advanced classification and analysis tasks.

Romi is the senior marketing manager at Sentra, bringing years of experience in various marketing roles in the cybersecurity field.

Subscribe

Latest Blog Posts

Nikki Ralston
Nikki Ralston
March 16, 2026
4
Min Read

S3 Bucket Security Best Practices

S3 Bucket Security Best Practices

Amazon S3 is one of the most widely used cloud storage services in the world, and with that scale comes real security responsibility. Misconfigured buckets remain a leading cause of sensitive data exposure in cloud environments, from accidentally public objects to overly permissive policies that go unnoticed for months. Whether you're hosting static assets, storing application data, or archiving compliance records, getting S3 bucket security right is not optional. This guide covers foundational defaults, policy configurations, and practical checklists to give you an actionable reference as of early 2026.

How S3 Bucket Security Works by Default

A common misconception is that S3 buckets are inherently risky. In reality, all S3 buckets are private by default. When you create a new bucket, no public access is granted, and AWS automatically enables Block Public Access settings at the account level.

Access is governed by a layered permission model where an explicit Deny always overrides an Allow, regardless of where it's defined. Understanding this hierarchy is the foundation of any secure configuration:

  • IAM identity-based policies, control what actions a user or role can perform
  • Bucket resource-based policies, define who can access a specific bucket and under what conditions
  • Access Control Lists (ACLs), legacy object-level permissions (AWS now recommends disabling these entirely)
  • VPC endpoint policies, restrict which buckets and actions are reachable from within a VPC

AWS recommends setting S3 Object Ownership to "bucket owner enforced," which disables ACLs. This simplifies permission management significantly, instead of managing object-level ACLs across millions of objects, all access flows through bucket policies and IAM, which are far easier to audit.

AWS S3 Security Best Practices

A defense-in-depth approach means layering multiple controls rather than relying on any single setting. Here is the current AWS-recommended baseline:

Practice Details
Block public access Enable S3 Block Public Access at both bucket and account levels. Enforce via Service Control Policies (SCPs) in AWS Organizations.
Least-privilege IAM Grant only specific actions each role needs. Avoid "Action": "s3:*" in production. Use presigned URLs for temporary access. Learn more about AWS IAM.
Encrypt at rest and in transit Configure default SSE-S3 or SSE-KMS encryption. Enforce HTTPS by denying requests where aws:SecureTransport is false.
Enable versioning & Object Lock Versioning preserves object history for recovery. Object Lock enforces WORM for compliance-critical data.
Unpredictable bucket names Append a GUID or random identifier to reduce risk of bucket squatting.
VPC endpoints Route internal workload traffic through VPC endpoints so it never traverses the public internet.

S3 Bucket Policy Examples for Common Security Scenarios

Bucket policies are JSON documents attached directly to a bucket that define who can access it and under what conditions. Below are the most practically useful examples.

Enforce HTTPS-Only Access

{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "RestrictToTLSRequestsOnly",
    "Effect": "Deny",
    "Principal": "*",
    "Action": "s3:*",
    "Resource": [
      "arn:aws:s3:::your-bucket-name",
      "arn:aws:s3:::your-bucket-name/*"
    ],
    "Condition": { "Bool": { "aws:SecureTransport": "false" } }
  }]
}

Deny Unencrypted Uploads (Enforce KMS)

{

"Version": "2012-10-17",

"Statement": [{

"Sid": "DenyObjectsThatAreNotSSEKMS",

"Principal": "*",

"Effect": "Deny",

"Action": "s3:PutObject",

"Resource": "arn:aws:s3:::your-bucket-name/*",

"Condition": {

"Null": {

"s3:x-amz-server-side-encryption-aws-kms-key-id": "true" } } }]}

Other Common Patterns

  • Restrict to a specific VPC endpoint: Use the aws:sourceVpce condition key to ensure the bucket is only reachable from a designated private network.
  • Grant CloudFront OAI access: Allow only the Origin Access Identity principal, keeping objects private from direct URL access while serving them through the CDN.
  • IP-based restrictions: Use NotIpAddress with aws:SourceIp to deny requests from outside a trusted CIDR range.

Always use "Version": "2012-10-17" and validate policies through IAM Access Analyzer before deployment to catch unintended access grants.

Enforcing SSL with the s3-bucket-ssl-requests-only Policy

Forcing all S3 traffic over HTTPS is one of the most straightforward, high-impact controls available. The AWS Config managed rule s3-bucket-ssl-requests-only checks whether your bucket policy explicitly denies HTTP requests, flagging non-compliant buckets automatically.

The policy evaluates the aws:SecureTransport condition key. When a request arrives over plain HTTP, this key evaluates to false, and the Deny statement blocks it. This applies to all principals, AWS services, cross-account roles, and anonymous requests alike. Adding the HTTPS-only Deny statement shown in the policy examples section above satisfies both the AWS Config rule and common compliance requirements under PCI-DSS and HIPAA.

Using an S3 Bucket Policy Generator Safely

The AWS Policy Generator is a useful starting point, but generated policies require careful review before going into production. Follow these steps:

  • Select "S3 Bucket Policy" as the policy type, then fill in the principal, actions, resource ARN, and conditions (e.g., aws:SecureTransport or aws:SourceIp).
  • Check for overly broad principals, avoid "Principal": "*" unless intentional.
  • Verify resource ARNs are scoped correctly (bucket-level vs. object-level).
  • Use IAM Access Analyzer's "Preview external access" feature to understand the real-world effect before saving.

The generator is a scaffold, security judgment still applies. Never paste generated JSON directly into production without review.

S3 Bucket Security Checklist

Use this consolidated checklist to audit any S3 bucket configuration:

Control Status
Block Public Access Enabled at account and bucket level
ACLs disabled Object Ownership set to "bucket owner enforced"
Default encryption SSE-S3 or SSE-KMS configured
HTTPS enforced Bucket policy denies aws:SecureTransport: false
Least-privilege IAM No wildcard actions in production policies
Versioning Enabled; Object Lock for sensitive data
Bucket naming Includes unpredictable identifiers
VPC endpoints Configured for internal workloads
Logging & monitoring Server access logging, CloudTrail, GuardDuty, and IAM Access Analyzer active
AWS Config rules s3-bucket-ssl-requests-only and related rules enabled
Disaster recovery Cross-region replication configured where required

How Sentra Strengthens S3 Bucket Security at Scale

Applying the right bucket policies and IAM controls is necessary, but at enterprise scale, knowing which buckets contain sensitive data, how that data moves, and who can access it becomes the harder problem. This is where cloud data exposure typically occurs: not from a single misconfigured bucket, but from data sprawl across hundreds of buckets that no one has a complete picture of.

Sentra discovers and classifies sensitive data at petabyte scale directly within your environment, data never leaves your control. It maps data movement across S3, identifies shadow data and over-permissioned buckets, and enforces data-driven guardrails aligned with compliance requirements. For organizations adopting AI, Sentra provides the visibility needed to ensure sensitive training data or model outputs in S3 are properly governed. Eliminating redundant and orphaned data typically reduces cloud storage costs by around 20%.

S3 bucket security is not a one-time configuration task. It's an ongoing practice spanning access control, encryption, network boundaries, monitoring, and data visibility. The controls covered here, from enforcing SSL and disabling ACLs to using policy generators safely and maintaining a security checklist, give you a comprehensive framework. As your environment grows, pairing these technical controls with continuous data discovery ensures your security posture scales with your data, not behind it.

Read More
Nikki Ralston
Nikki Ralston
March 15, 2026
4
Min Read

How to Evaluate DSPM and DLP for Copilot and Gemini: A Security Architect’s Buyer’s Guide

How to Evaluate DSPM and DLP for Copilot and Gemini: A Security Architect’s Buyer’s Guide

Most security architects didn’t sign up to be AI product managers. Yet that’s what Copilot and Gemini rollouts feel like: “We want this in every business unit, as soon as possible. Make sure it’s safe.”

If you’re being asked to recommend or validate a DSPM platform, or to justify why your existing DLP stack is or isn’t enough, you need a realistic, vendor‑agnostic set of criteria that maps to how Copilot and Gemini actually work.

This guide is written from that perspective: what matters when you evaluate DSPM and DLP for AI assistants, what’s table stakes vs. differentiating, and what you should ask every vendor before you bring them to your steering committee.

1. Start with the AI use cases you actually have

Before you look at tools, clarify your Copilot and/or Gemini scope:

  • Are you rolling out Microsoft 365 Copilot to a pilot group, or planning an org‑wide deployment?
  • Are you enabling Gemini in Workspace only, or also Gemini for dev teams (Vertex AI, custom LLM apps, RAG)?
  • Do you have existing AI initiatives (third‑party SaaS copilots, homegrown assistants) that will access M365 or Google data?

This matters because different tools have very different coverage:

  • Some are M365‑centric with shallow Google support.
  • Others focus on cloud infrastructure and data warehouses, and barely touch SaaS.
  • Very few provide deep, in‑environment visibility across both SaaS and cloud platforms, which is what you need if Copilot/Gemini are just the tip of your AI iceberg.

Define the boundary first; evaluate tools second.

2. Non‑negotiable DSPM capabilities for Copilot and Gemini

When Copilot and Gemini are in scope, “generic DSPM” is not enough. You need specific capabilities that touch how those assistants see and use data.

2.1 Native visibility into M365 and Workspace

At minimum, a viable DSPM platform must:

  • Discover and classify sensitive data across SharePoint, OneDrive, Exchange, Teams and Google Drive / shared drives.
  • Understand sharing constructs (public/org‑wide links, external guests, shared drives) and relate them to data sensitivity.
  • Support unstructured formats including Office docs, PDFs, images, and audio/video files.

Ask vendors:

  • “Show me, live, how you discover sensitive data in Teams chats and OneDrive/Drive folders that are Copilot/Gemini‑accessible.”
  • “Show me how you handle PDFs, audio, and meeting recordings - not just Word docs and spreadsheets.”

Sentra, for example, was explicitly built to discover sensitive data across IaaS, PaaS, SaaS, and on‑prem, and to handle formats like audio/video and complex PDFs as first‑class sources.

2.2 In‑place, agentless scanning

For many organizations, it’s now a hard requirement that data never leaves their cloud environment for scanning. Evaluate if the vendor scan in‑place within your tenants, using cloud APIs and serverless functions or do they require copying data or metadata into their infrastructure?

Sentra’s architecture is explicitly “data stays in the customer environment”, which is why large, regulated enterprises have standardized on it.

2.3 AI‑grade classification accuracy and context

Copilot and Gemini are only as safe as your labels and identity model. That requires:

  • High‑accuracy classification (>98%) across structured and unstructured content.
  • The ability to distinguish synthetic vs. real data and to attach rich context: department, geography, business function, sensitivity, owner.

Ask:

  • “How do you measure classification accuracy, and on what datasets?”
  • “Can you show me how your platform treats, for example, a Zoom recording vs. a scanned PDF vs. a CSV export?”

Sentra uses AI‑assisted models and granular context classes at both file and entity level, which is why customers report >98% accuracy and trust the labels enough to drive enforcement.

3. Evaluating DLP in an AI‑first world

Most enterprises already have DLP: endpoint, email, web, CASB. The question is whether it can handle AI assistants and the honest answer is that DLP alone usually can’t, because:

  • It operates blind to real data context, relying on regex and static rules.
  • It usually doesn’t see unstructured SaaS stores or AI outputs reliably.
  • Policies quickly become so noisy that they get weakened or disabled.

The evaluation question is not “DLP or DSPM?” It’s:

“Which DSPM platform can make my DLP stack effective for Copilot and Gemini, without a rip‑and‑replace?”

Look for:

  • Tight integration with Microsoft Purview (for MPIP labels and Copilot DLP) and, where relevant, Google DLP.
  • The ability to auto‑apply and maintain labels that DLP actually enforces.
  • Support for feeding data context (sensitivity + business impact + access graphs) into enforcement decisions.

Sentra becomes the single source of truth for sensitivity and business impact that existing DLP tools rely on.

4. Scale, performance, and operating cost

AI rollouts increase data volumes and usage faster than most teams expect. A DSPM that looks fine on 50 TB may struggle at 5 PB.

Evaluation questions:

  • “What’s your largest production deployment by data volume? How many PB?”
  • “How long does an initial full scan take at that scale, and what’s the recurring scan pattern?”
  • “What does cloud compute spend look like at 10 PB, 50 PB, 100 PB?”

Sentra customer tests prove ability to scan 9 PB in under 72 hours at 10–1000x greater scan efficiency than legacy platforms, with projected scanning of 100 PB at roughly $40,000/year in cloud compute.

If a vendor can’t answer those questions quantitatively, assume you’ll be rationing scans, which undercuts the whole point of DSPM for AI.

5. Governance, reporting, and “explainability” for architects

Your stakeholders, security leadership, compliance, boards, will ask three things:

  1. “Where, exactly, can Copilot and Gemini see regulated data?”
  2. “How do we know permissions and labels are correct?”
  3. “Can you prove we’re compliant right now, not just at audit time?”

A strong DSPM platform helps you answer those questions without building custom reporting in a SIEM:

  • AI‑specific risk views that show AI assistants, datasets, and identities in one place.
  • Compliance mappings to frameworks like GLBA, SOX, FFIEC, GDPR, HIPAA, PCI DSS, and state privacy laws.
  • Executive‑ready summaries of AI‑related data risk and progress over time (e.g., percentage of regulated data coverage, number of Copilot‑accessible high‑risk stores before vs. after remediation).

Sentra’s AI Data Readiness and continuous compliance materials give a good template for what “explainable DSPM” looks like in practice.

6. Putting it together: A concise RFP checklist

When you boil it down, your evaluation criteria for DSPM/DLP for Copilot and Gemini should include:

  • In‑place, multi‑cloud/SaaS discovery with strong M365 and Workspace coverage
  • Proven high‑accuracy classification and rich business context for unstructured data
  • Identity‑to‑data mapping with least‑privilege insights
  • Native integrations with MPIP/Purview and Google DLP, with label automation
  • Real‑world scale (PB‑level) and quantified cloud cost
  • AI‑aware risk views, compliance mappings, and reporting

Use those as your “table stakes” in RFPs and technical deep dives. You can add vendor‑specific questions on top, but if a tool can’t clear this bar, it will not make Copilot and Gemini genuinely safe - it will just give you more dashboards.

<blogcta-big>

Read More
Nikki Ralston
Nikki Ralston
February 22, 2026
4
Min Read

Cloud Data Protection Solutions

Cloud Data Protection Solutions

As enterprises scale cloud adoption and AI integration in 2026, protecting sensitive data across complex environments has never been more critical. Data sprawls across IaaS, PaaS, SaaS, and on-premise systems, creating blind spots that regulators and threat actors are eager to exploit. Cloud data protection solutions have evolved well beyond simple backup and recovery, today's leading platforms combine AI-powered discovery, real-time data movement tracking, access control analysis, and compliance support into unified architectures. Choosing the right solution determines how confidently your organization can operate in the cloud.

Best Cloud Data Protection Solutions

The market spans two distinct categories, each addressing different layers of cloud security.

Backup, Recovery, and Data Resilience

  • Druva Data Security Cloud, Rated 4.9 on Gartner with "Customer's Choice" recognition. Centralized backup, archival, disaster recovery, and compliance across endpoints, servers, databases, and SaaS in hybrid/multicloud environments.
  • Cohesity DataProtect, Rated 4.7. Automates backup and recovery across on-premises, cloud, and hybrid infrastructures with policy-based management and encryption.
  • Veeam Data Platform, Rated 4.6. Combines secure backup with intelligent data insights and built-in ransomware defenses.
  • Rubrik Security Cloud, Integrates backup, recovery, and automated policy-driven protection against ransomware and compliance gaps across mixed environments.
  • Dell Data Protection Suite, Rated 4.7. Addresses data loss, compliance, and ransomware through backup, recovery, encryption, and deduplication.

Cloud-Native Security and DSPM

  • Sentra, Discovers and governs sensitive data at petabyte scale inside your own environment, with agentless architecture, real-time data movement tracking, and AI-powered classification.
  • Wiz, Agentless scanning, real-time risk prioritization, and automated mapping to 100+ regulatory frameworks across multi-cloud environments.
  • BigID, Comprehensive data discovery and classification with automated remediation, including native Snowflake integration for dynamic data masking.
  • Palo Alto Networks Prisma Cloud, Scalable hybrid and multi-cloud protection with AI analytics, DLP, and compliance enforcement throughout the development lifecycle.
  • Microsoft Defender for Cloud, Integrated multi-cloud security with continuous vulnerability assessments and ML-based threat detection across Azure, AWS, and Google Cloud.

What Users Say About These Platforms

User feedback as of early 2026 reveals consistent themes across the leading platforms.

Sentra

Pros:

  • Data discovery accuracy and automation capabilities are standout strengths
  • Compliance and audit preparation becomes significantly smoother, one user described HITECH audits becoming "a breeze"
  • Classification engine reduces manual effort and improves overall efficiency

Cons:

  • Initial dashboard experience can feel overwhelming
  • Some limitations in on-premises coverage compared to cloud environments
  • Third-party sync delays flagged by a subset of users

Rubrik

Pros:

  • Strong visibility across fragmented environments with advanced encryption and data auditing
  • Frequently described as a top choice for cybersecurity professionals managing multi-cloud

Cons:

  • Scalability limitations noted by some reviewers
  • Integration challenges with mature SaaS solutions

Wiz

Pros:

  • Agentless deployment and multi-cloud visibility surface risk context quickly

Cons:

  • Alert overload and configuration complexity require careful tuning

BigID

Pros:

  • Comprehensive data discovery and privacy automation with responsive customer service

Cons:

  • Delays in technical support and slower DSAR report generation reported

As of February 2026, none of these platforms have published Trustpilot scores with sufficient review counts to generate a verified aggregate rating.

How Leading Platforms Compare on Core Capabilities

Capability Sentra Rubrik Wiz BigID
Unified view (IaaS, PaaS, SaaS, on-prem) Yes, in-environment, no data movement Yes, unified management Yes, aggregated across environments Yes, agentless, identity-aware
In-place scanning Yes, purely in-place Yes Yes, raw data stays in your cloud Yes
Agentless architecture Purely agentless, zero production latency Primarily agentless via native APIs Agentless (optional eBPF sensor) Primarily agentless, hybrid option
Data movement tracking Yes, DataTreks™ maps full lineage Limited, not explicitly confirmed Yes, lineage mapping via security graph Yes, continuous dynamic tracking
Toxic combination detection Yes, correlates sensitivity with access controls Yes, automated risk assignment Yes, Security Graph with CIEM mapping Yes, AI classifiers + permission analysis
Compliance framework mapping Not confirmed Not confirmed Yes, 100+ frameworks (GDPR, HIPAA, EU AI Act) Not confirmed
Automated remediation Sensitivity labeling via Microsoft Purview Label correction via MIP Contextual workflows, no direct masking Native masking in Snowflake; labeling via MIP
Petabyte-scale cost efficiency Proven, 9PB in 72 hours, 100PB at ~$40K Yes, scale-out architecture Per-workload pricing, not proven at PB scale Yes, cost by data sources, not volume

Cloud Data Security Best Practices

Selecting the right platform is only part of the equation. How you configure and operate it determines your actual security posture.

  • Apply the shared responsibility model correctly. Cloud providers secure infrastructure; you are responsible for your data, identities, and application configurations.
  • Enforce least-privilege access. Use role-based or attribute-based access controls, require MFA, and regularly audit permissions.
  • Encrypt data at rest and in transit. Use TLS 1.2+ and manage keys through your provider's KMS with regular rotation.
  • Implement continuous monitoring and logging. Real-time visibility into access patterns and anomalous behavior is essential. CSPM and SIEM tools provide this layer.
  • Adopt zero-trust architecture. Continuously verify identities, segment workloads, and monitor all communications regardless of origin.
  • Eliminate shadow and ROT data. Redundant, obsolete, and trivial data increases your attack surface and storage costs. Automated identification and removal reduces risk and cloud spend.
  • Maintain and test an incident response plan. Documented playbooks with defined roles and regular simulations ensure rapid containment.

Top Cloud Security Tools for Data Protection

Beyond the major platforms, several specialized tools are worth integrating into a layered defense strategy:

  • Check Point CloudGuard, ML-powered threat prevention for dynamic cloud environments, including ransomware and zero-day mitigation.
  • Trend Micro Cloud One, Intrusion detection, anti-malware, and firewall protections tailored for cloud workloads.
  • Aqua Security, Specializes in containerized and cloud-native environments, integrating runtime threat prevention into DevSecOps workflows for Kubernetes, Docker, and serverless.
  • CrowdStrike Falcon, Comprehensive CNAPP unifying vulnerability management, API security, and threat intelligence.
  • Sysdig, Secures container images, Kubernetes clusters, and CI/CD pipelines with runtime threat detection and forensic analysis.
  • Tenable Cloud Security, Continuous monitoring and AI-driven threat detection with customizable security policies.

Complementing these tools with CASB, DSPM, and IAM solutions creates a layered defense addressing discovery, access control, threat detection, and compliance simultaneously.

How Sentra Approaches Cloud Data Protection

For organizations that need to go beyond backup into true cloud data security, Sentra offers a fundamentally different architecture. Rather than routing data through an external vendor, Sentra scans in-place, your sensitive data never leaves your environment. This is particularly relevant for regulated industries where data residency and sovereignty are non-negotiable.

Key Capabilities

  • Purely agentless onboarding, No sidecars, no agents, zero impact on production latency
  • Unified view across IaaS, PaaS, SaaS, and on-premise file shares with continuous discovery and classification at petabyte scale
  • DataTreks™, Creates an interactive map of your data estate, tracking how sensitive data moves through ETL processes, migrations, backups, and AI pipelines
  • Toxic combination detection, Correlates data sensitivity with access controls, flagging high-sensitivity data behind overly permissive policies
  • AI governance guardrails, Prevents unauthorized AI access to sensitive data as enterprises integrate LLMs and other AI systems

In documented deployments, Sentra has processed 9 petabytes in under 72 hours and analyzed 100 petabytes at approximately $40,000. Its data security posture management approach also eliminates shadow and ROT data, typically reducing cloud storage costs by around 20%.

Choosing the Right Fit

The right solution depends on the problem you're solving. If your primary need is backup, recovery, and ransomware resilience, Druva, Veeam, Cohesity, and Rubrik are purpose-built for that. If your challenge is discovering where sensitive data lives and how it moves, particularly for AI adoption or regulatory audits, DSPM-focused platforms like Sentra and BigID are better aligned. For automated compliance mapping across GDPR, HIPAA, and the EU AI Act, Wiz's 100+ built-in framework assessments offer a clear advantage.

Most mature security programs layer multiple tools: a backup platform for resilience, a DSPM solution for data visibility and governance, and a CNAPP or CSPM tool for infrastructure-level threat detection. The key is ensuring these tools share context rather than creating additional silos. As data environments grow more complex and AI workloads introduce new vectors for exposure, investing in cloud data protection solutions that provide genuine visibility, not just coverage, will define which organizations operate with confidence.

<blogcta-big>

Read More
Expert Data Security Insights Straight to Your Inbox
What Should I Do Now:
1

Get the latest GigaOm DSPM Radar report - see why Sentra was named a Leader and Fast Mover in data security. Download now and stay ahead on securing sensitive data.

2

Sign up for a demo and learn how Sentra’s data security platform can uncover hidden risks, simplify compliance, and safeguard your sensitive data.

3

Follow us on LinkedIn, X (Twitter), and YouTube for actionable expert insights on how to strengthen your data security, build a successful DSPM program, and more!

Before you go...

Get the Gartner Customers' Choice for DSPM Report

Read why 98% of users recommend Sentra.

White Gartner Peer Insights Customers' Choice 2025 badge with laurel leaves inside a speech bubble.