Businesses large and small depend on access to information in order to make smarter, data-driven decisions. And much of that data is personal, sensitive, or confidential. So how can you balance this demand for big data with the need to protect the individuals whom this data describes?
When it comes to personally identifiable information (PII), there are multiple very good reasons why you should keep it securely under lock and key. The term "PII" refers to any data that can uniquely identify a person or imply that person's identity.
In this article, we discuss data masking, one popular technique for protecting PII, and why you should always mask PII when you're working with sensitive information.
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
What is Data Masking?
The term data masking (also known as data obfuscation) refers to a collection of techniques to pseudonymize and anonymize sensitive information. Which one of these different data masking methods you leverage will depend on the nature of the data and the use case involved. Techniques for data masking include:
Nulling: A blank or placeholder character "nulls out" sensitive data. For example, a nine-digit Social Security number converts into the representation "XXX-XX-1957" to let the user verify that the SSN is correct while only exposing the last 4 digits.
Substitution: Instead of placeholder characters, a randomly selected dummy value replaces the data. For instance, a customer's first and last name changes to "Jane Doe."
Blurring: Small amounts of "noise" slightly "blurs," or alters, the original data, making it no longer possible to identify the original individual.
Scrambling: This randomly scrambles or shuffles the data. So, the name "Laura" and the number "7189" might become "Raalu" and "8917," respectively.
Encryption: Data converts into an encoded format through the use of an encryption key. The encrypted data is unreadable and meaningless for anyone without the decryption key that converts it back to the original format.
In addition to using these different data masking methods, you can also perform data masking statically or dynamically:
-
Static data masking applies to the source data set, making it impossible to reverse.
-
Dynamic data masking applies on the fly when data moves out of its original location (e.g. after an API call). The type of data that comes back will depend on the user's access level.
Data Masking for PII
Although data masking can work on any type of data, it's most valuable for personally identifiable information. Whereas some PII data can directly identify an individual (e.g. credit card numbers, phone numbers, driver's license numbers, etc.), other types of PII can combine to piece together an individual's identity.
A study by Harvard's Data Privacy Lab, for example, found that 87 percent of people in the U.S. are uniquely identifiable with just three pieces of personal data: gender, date of birth, and ZIP code.
Data security and data governance regulations such as the European Union's GDPR (General Data Protection Regulation) and California's CCPA govern how organizations can handle the PII of individuals under their jurisdiction. Other regulations, such as HIPAA and PCI DSS, are specifically for compliance in certain industries and use cases (healthcare and payment card information, respectively).
These regulations place strict constraints on how you can store and work with PII — unless you make use of pseudonymization techniques such as PII masking. According to GDPR's Recital 26, the document's restrictions do not apply to data that is anonymous (i.e. that can no longer uniquely identify an individual person).
Since penalties for violating these regulations are harsh — in the case of GDPR, up to 20 million euros or 4 percent of your global annual revenue — data masking and pseudonymization have become crucial practices for organizations using PII. But that's not the only danger of failing to properly mask PII data: you can also fall victim to a data breach.
According to IBM's "Cost of a Data Breach" report, the average total financial impact of suffering a data breach is over $3.8 million, including downtime, lost business, reputational damage, and lawsuits and payouts to affected consumers. Given the extreme harm that can follow a data breach, data privacy measures such as PII data masking are an essential business best practice. Even if a malicious actor manages to break into your IT environment, the masked data will be nearly worthless.
How Integrate.io Can Help with PII Masking
It's so important to use PII masking to protect your business and keep your sensitive and confidential information away from prying eyes. But there's still one big question: how do you actually put data masking for PII in practice?
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
The answer lies in robust, sophisticated ETL tools like Integrate.io. Integrate.io is a user-friendly ETL and data integration platform. After extracting your data from one or more sources, you can use the ETL transformation stage to dynamically mask data before loading it into a target location. With more than 100 pre-built third-party connectors, it's easy to build pipelines to your cloud data warehouse or data lake — whether that's AWS, Microsoft Azure, Oracle, or something else.
Ready to protect your most sensitive and valuable data? Get in touch with our team of data experts today for a chat about your business objectives, or to start your 14-day pilot of the Integrate.io platform.