Case StudyFeb 12, 20263 Min Read

How a Consumer App Company Secured Over 130 Petabytes in Weeks

A global Consumer App company manages vast, complex cloud environments spanning multiple continents and hundreds of petabytes of sensitive customer and operational data. But their legacy data classification tools were not designed for the massive scale and speed of their cloud data, especially when it came to identifying sensitive information buried deep in complex file formats like JSON and Parquet. 
 

Faced with multiple, complex compliance requirements and ballooning data security costs, the company turned to Sentra.
  

By adopting Sentra’s AI-powered Data Security Posture Management (DSPM) platform, they accelerated and scaled their data security strategy, achieving 98% classification accuracy and full visibility across cloud-scale infrastructure, and enabling faster compliance - all while reducing operational overhead and cutting cloud costs. 

The Challenge: Massive Data, Complex Formats, and Untenable Costs 

The data security team’s existing classification tools were never built for the scale and complexity of a data estate over 130 petabytes. As regulatory requirements increased, and data structures became more nested and dynamic, manual tagging and legacy solutions became expensive, inaccurate, and unsustainable.

 

The team also faced an immense data security challenge: how to accurately classify sensitive information across an enormous cloud environment, while keeping operational costs in check. 
 Their existing legacy tools lacked the precision and scalability to handle complex, nested file formats like JSON and Parquet, which are common in modern data engineering pipelines. Manual tagging was not only time-consuming but also inaccurate, resulting in low coverage and high compliance risk. With regulatory deadlines rapidly approaching, the security team needed a way to gain complete visibility into sensitive data, improve classification accuracy, and implement a scalable architecture that wouldn’t break the budget. 

"Our previous solutions simply couldn't keep pace with the sheer volume and complexity of our cloud data. We needed a robust, cloud-native approach that was both effective and economically sound across our entire digital footprint." 

— Deputy CISO

After evaluating multiple vendors, the company selected Sentra for its unique combination of deep technical sophistication and practical efficiency. 

What stood out:
  

AI-Driven Classification at Scale: Sentra’s multi-model architecture, including GLiNER for Named 
 Entity Recognition and embedding-based contextual detection, enabled granular, column-level
 classification, even inside deeply nested Parquet structures.

 

Cost-Efficient Ephemeral Scanning: Unlike always-on tools, Sentra’s ephemeral EC2 architecture  
 scales to zero when not scanning. Combined with S3 inventory-based change detection and AI- 
 driven smart sampling, it enables fast classification across hundreds of petabytes, at a fraction of 
 the time and cost, and without impacting performance. 

 

Seamless Terraform Deployment: Rapid deployment via infrastructure-as-code made it easy to  
 scale Sentra across multiple environments while enforcing least-privilege access through 
 dual-role AWS authentication. 

Why Sentra: Accuracy and Efficiency at Cloud-Native Scale 

"Sentra accurately uncovered mislabeled sensitive customer data, enabling rapid validation and remediation. It is now an indispensable element of our data protection strategy allowing us to stay compliant and keep our data protection promise to millions of customers around the world."

— Deputy CISO 

Sentra was deployed and delivering results in the customer’s environment in just 12 days. During the initial proof of concept, the data security team was able to select where they wanted scanning to begin and easily configure the platform, allowing the solution to scan 1 terabyte of high-risk data across complex file formats to achieve over 98% classification accuracy. Sentra’s smart sampling approach prioritized the most sensitive and high-impact datasets, optimizing performance without sacrificing precision. The platform was deployed seamlessly using Terraform, integrating directly into the customer’s existing AWS architecture. A secure two-role access model, one for metadata access and another for scanning, ensured strict least-privilege control throughout the process. 

 

Following the successful POC, the security team decided to continue scaling Sentra’s coverage across their vast data estate to cover hundreds of petabytes. The data security team was able to easily roll out Sentra according to their data priorities and leverage automation to minimize manual effort and dramatically accelerate risk remediation.

Our previous solutions simply couldn't keep pace with the sheer volume and complexity of our cloud data. We needed a robust, cloud-native approach that was both effective and economically sound across our entire digital footprint.

Deputy CISO

Let’s get your data AI ready.