Data Leakage: Common Causes, Examples & Tips for Prevention

What Is Data Leakage?

Data leakage occurs when sensitive data gets unintentionally exposed to the public in transit, at rest, or in use. Here are common examples:

Data exposed in transit — Data transmitted via emails, API calls, chat rooms, and other communications.
Data exposed at rest — Can occur due to misconfigured cloud storage, insecure databases, or unattended or lost devices.
Data exposed in use — Such as data on printers, screenshots, clipboards, and USB drives.

A data leak and a data breach may sound similar, but the two are not synonymous. A data breach is often the result of an external intrusion attempt, while a data leak results from employee negligence. A data leak can result in a data breach.

This is part of an extensive series of guides about data breach.

Data Leak vs. Data Breach

A data leak and a data breach can have critical consequences, including financial losses. However, a data leak involves more negligence than a data breach, typically resulting from insider threats. These actions are usually unintentional but can be as harmful as a data breach.

Data breaches occur due to unknown hardware, software, or security infrastructure vulnerabilities or unforeseen risks. It requires cybercriminals to find a vulnerability and exploit it. Administrators mitigate this risk by continually updating outdated software and immediately installing security patches or updates.

A data leak can result in a data breach but does not require exploiting unknown vulnerabilities. A human error is usually the culprit behind a data leak. For example, a misconfigured Amazon Web Services (AWS) S3 bucket can cause a leak. S3 buckets provide cloud storage space for uploading files and data.

You can configure S3 buckets for public access or restrict access to only authorized users. However, administrators often misconfigure access, disclosing data to third parties. There are so many misconfigured S3 buckets there are websites that look for misconfigured S3 buckets, posting them for anyone to review.

Top 4 Causes of Data Leakage

Here are some of the issues that can result in data leakage.

1. Misconfiguration Issues

Configuring a networked data system is complex, especially when it includes application software, cloud services, and machine learning tools. Data configuration processes are essential for ensuring ML algorithms can access the data they need while avoiding unnecessary data exposure. The increasing complexity of the system often results in configuration errors.

To help reduce the risk of misconfiguration, you can leverage tools that automate many configuration processes (although you also need to ensure these tools have the right configurations). A single misconfigured router may result in a data leak, depending on the network.

2. Social Engineering Attacks

Malicious actors often use social engineering techniques to trick privileged users, such as employees, into providing sensitive information. Cybercriminals often use deception — for example, by posing as a co-worker or a member of the IT department and fabricating a reason to provide access credentials.

Social engineering attacks often try to steal login data, phone numbers, or the names of employees with privileged access. Users must avoid exposing sensitive information to legitimate users to prevent employees and malicious actors from accessing data they shouldn’t.

3. Zero-Day Vulnerabilities

Software often contains zero-day vulnerabilities, exposing your organization to risks without your knowledge. Zero-day vulnerabilities can result in persistent threats that leak data undetected for months or years before someone discovers them. Many organizations only become aware of these threats when the news publishes a major breach.

4. Legacy Techniques and Tools

Despite the various new threats to your data, it’s important to address older attack methods that exploit legacy systems and tools. Modern organizations usually use legacy technologies and physical devices like desktops, USBs, and printers, not just cloud-based tools and Outside SaaS offerings.

While you might need these tools to perform legitimate actions (i.e., allowing employees to print out presentations at home), they also pose a major risk. For example, employees could misplace a USB or external storage device containing sensitive information. A malicious actor could steal the device to circumvent the organization’s security perimeter.

What Types of Information Can Be Exposed in a Data Leak?

Cybercriminals look for information that offers value. It is typically confidential and sensitive information that can be traded on the dark web. Here are the data types often found in data leaks:

Personally identifiable information (PII) — Information or records that enable identifying or locating a person. Common PII include names, phone numbers, physical addresses, social security numbers, and email addresses. Cybercriminals exploit PIIs for identity theft, scams, and fraud. PII often appears in data leaks.
Financial data — Any data related to a person’s finances or banking, such as credit card numbers, tax information, bank statements and records, invoices, and receipts.
Account credentials — User account login information, including usernames, passwords, and emails. Compromised credentials are highly sought-after commodities because they enable cybercriminals to perform social account takeovers (ATOs) and data breaches.
Medical information — Any private data that can disclose a patient’s physical or mental condition. Medical information is typically created and stored by healthcare providers.
Company, federal, or business information — Internal, non-public facing information created and stored by a corporation or federal entity. It typically includes critical business information such as internal communications, classified records, performance metrics, meeting notes, HR records, and company roadmaps.
Trade secrets and intellectual property (IP) — Highly confidential and guarded information that can put a company’s livelihood at stakes, such as classified research, patents, plans, testing material, documentation for scrapped or unfinished products, designs for upcoming projects, source code for proprietary software and technology, and strategic company information.

5 Data Leakage Examples

1. Volkswagen Group of America

Volkswagen disclosed a data leak in June 2021 — malicious actors exploited an unsecured third-party vendor to obtain data about Canadian and US customers. Between 2014 and 2019, the company collected data mainly for marketing and sales purposes.

However, Volkswagen failed to protect this database, leaving it exposed from August 2019 to May 2021 and allowing the leak of information about around 3.3 million individuals. Driver’s licenses and car numbers were exposed during the leak, as well as the loan and social insurance numbers of a smaller set of customers.

2. Infinity Insurance Company

Infinity Insurance disclosed a data leak in March 2021—attackers temporarily achieved unauthorized access to files on the company’s servers for two days during December 2020. Here is the information leaked during this event:

Employee information — The exposed servers housed PII of existing and former Infinity Insurance employees, including names, driver’s license numbers, social security numbers, compensation claims, and medical leave information.
Customers’ information — Servers containing conventional customer data were accessible, exposing millions of driver’s license and social security numbers to the public.

3. Jefit

Jefit, an app that tracks workouts, discovered a bug in March 2021 — the security vulnerability impacted the customer accounts created before September 2020. Cybercriminals exploited the issue to gain unauthorized access to the data of over 9 million users, obtaining account usernames, encrypted passwords, email addresses, and IP addresses.

However, Jefit doesn’t store customer payment information, so there was no sensitive financial data on the breached servers.

4. ParkMobile

ParkMobile uncovered a cybersecurity incident in March 2021—it had exploited a weakness in third-party software. The security team investigated immediately, but the cybercriminal had already accessed and possibly downloaded basic user information, such as license plate numbers, phone numbers, email addresses, and mailing addresses.

The security issue could allow cybercriminals to steal encrypted passwords, but the intruders could not get the keys thanks to the company's key management practices.

5. Apple Data Leak

On January 14, 2022, researchers from Fingerprint.js publicly disclosed information about a bug in the WebKit browser engine that allowed Apple data leaks, such as browsing history and Google IDs. The bug was discovered in an IndexedDB implementation, a Javascript API for data storage.

This vulnerability allowed malicious websites to use the exploit to see URLs that a user recently visited and their Google User ID, which can allow cybercriminals to find personal user info. Apple patched this vulnerability, later recognized as CVE-2022-22594, in Safari 15.3 for iOS and macOS.

Data Leakage Prevention Tools

You can prevent data leakage using data loss prevention (DLP) tools, which continuously monitor and analyze your data to identify potential violations of security policies. In addition to identifying policy violations, a DLP tool also works to stop them.

There are various DLP tools, some focused on one part of the organization, like laptops or email services, and others specialized in data backup, archiving, and restoration. Here are the most essentials features of enterprise DLP:

Automation — DLP tools automatically identify, inventory, and classify types of sensitive data metadata. Since data is constantly created and modified, a DLP tool must keep pace with possible data leaks. Automation enables tools to detect and respond to threats rapidly.
Analytics — DLP tools enable analyzing data in any state (in use, in transit, or at rest), location (user endpoints, networks, cloud services, and on-premises servers), and application (email, messaging platforms, web, file sharing, and social media).
Context — To accurately find issues, DLP tools derive context from multiple sources when analyzing the communication. Activity defined as normal in one context can become suspicious in another. Examples include looking for suspicious values (using “confidential,” for example), finding copies of known sensitive data, doing complex pattern-matching (to find credit card numbers, for example), studying user behavior, and performing statistical analysis of data activity.
Response — Once the DLP tool discovers a potential policy violation, it should initiate responses following predefined rules.

What Is a Data Leakage Prevention Policy?

Data loss prevention (DLP) tools help categorize and protect data. A DLP policy outlines how an organization should implement its DLP tools.

DLP tools classify an organization’s critical and confidential data to help prioritize the data leakage prevention strategy. They isolate policy violations based on the rules specified by the organization or using predefined policy packages.

A DLP policy usually addresses the requirements of regulations like the GDPR, HIPAA, and PCI DSS to ensure compliance. When the DLP solution identifies a policy violation, it remediates the issue by applying encryption, sending alerts, or taking other measures to prevent end-users from exposing data.

Data leakage prevention policies are important for the following reasons:

DLP policies are the basis of compliance —Teams can provide accurate reports for auditing purposes. DLP tools usually focus on a specific industry standard or regulation’s requirements.
Data is often a company’s highest-value asset —Intangible assets like trade secrets, customer information, and organizational strategies may be of higher value than an organization’s physical assets. Losing or exposing this data information can damage the organization’s reputation and result in legal and financial penalties.
DLP policies help understand how sensitive data is used —Organizations can identify how end-users and stakeholders use data, enabling them to implement better protection measures, maintain visibility over their data, and control who can use it.

Data Leakage Detection and Prevention Best Practices

Use the following best practices to identify and prevent data breaches and exposures.

Locate Critical Assets and Data

Every organization must know where its sensitive and business-critical data resides. You cannot secure the network if you don’t know the location of your data or how to protect it. This stage involves identifying the number of data assets (quantifying) and determining where all data is.

A universal data coding standard will help ensure everyone labels and understands sensitive data clearly. Use a DLP solution to protect sensitive information in your network and identify data leaks and disruptions.

Encrypt Your Data

Data encryption involves translating data into a different format or code so only users with valid passwords or decryption keys can access it. Encryption is an important measure for preventing hackers from reading confidential data during a data breach. However, while encryption mitigates the impact of data leakage, it is not sufficient to block a data breach.

Implement Endpoint Protection

Endpoints are devices connected to the corporate network, enabling data transfers. Most organizations today embrace remote working options and have a growing number of endpoints exposing the system to the Internet, making endpoint protection more difficult.

All devices connected to the network present a security risk. A malicious actor could easily compromise an endpoint device and infiltrate your network if you don't adequately secure all endpoints.

One way to prevent data leaks is to educate employees about endpoint security risks, reducing the risk of employee negligence that allows attackers to break through security controls.

Evaluate the Vendor Security Posture

Working with a third-party vendor means accepting its security vulnerabilities. Conduct a third-party risk assessment before choosing a vendor to identify business risks and plan the right mitigation strategy. Your organization is responsible for continuing to monitor the security posture of each vendor to identify new third-party vulnerabilities.

The cybersecurity landscape is constantly changing, so vendors must regularly update their systems to ensure compliance and keep up with emerging threats. However, ensuring compliance is your responsibility; you must check every third-party vendor.

See Additional Guides on Key Data Breach Topics

Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of data breach.

Cyber Defense Platform