Businesses of all sizes and entire industries rely on analyzing personal data to beat their competitors and better serve their customers. For example, information from your CRM (customer relationship management) software can help you understand how to convert more leads in a particular area or from a specific demographic.
Yet mishandling personally identifiable information (PII) or sensitive data can be a disaster for your organization — not only because of the financial penalties you might face but also because of the reputational damage you may suffer in the eyes of customers and investors.
Pseudonymizing this data can help protect you from the consequences of a data breach. But how do you go about this process? In this article, we discuss why pseudonymizing PII is so important, and the six steps for achieving it.
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
What is PII?
Personally identifiable information is any data that identifies or infers the identity of a specific individual or household. The types of PII include (but are not limited to):
-
Identifiers: First, middle and last names; home address; phone number; age and birthday; gender; race or ethnicity; nationality; ID numbers (e.g. Social Security number or passport ID)
-
Work and education: Employee or student ID; workplace or school address; years of work or study
-
Biometric data: Fingerprints, retina scans, facial recognition data, etc.
-
Internet data: Browsing history, search history, IP addresses, mobile app activity, geolocation data
Why Should You Pseudonymize PII?
Organizations that handle PII must take steps to reinforce data security by pseudonymizing it when the true data values are unnecessary. This ensures that the information does not fall into the wrong hands.
Insufficiently pseudonymized PII is susceptible to "re-identification," a process that would allow a malicious party (e.g. after a data breach) to connect the dots to uncover an individual's identity. According to Harvard's Data Privacy Lab, 87 percent of people in the U.S. are identifiable with just three data points: ZIP code, gender, and date of birth.
Regulations such as the European Union's GDPR (the General Data Protection Regulation) and California's CCPA (California Consumer Privacy Act) control how organizations use PII. For example, according to the GDPR:
- "Personal data" is defined as "any information relating to an identified or identifiable natural person," who is also known as the "data subject."
-
"Pseudonymisation" is defined as "the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information."
- According to GDPR's Recital 26, anonymous data (data that can no longer uniquely identify an individual person) does not need to be protected.
GDPR compliance is necessary for any organization that handles the PII of EU citizens and residents, while the CCPA mandates the same for California residents. Other industries may have their own data privacy regulations. HIPAA, for example, controls how healthcare organizations manage and process sensitive patient information.
6 Steps for Data Pseudonymization
We've established that pseudonymizing your PII is important — but how do you actually pseudonymize or anonymize your data? Note that the terms "pseudonymization" and "anonymization" are different, though sometimes used interchangeably:
- In pseudonymization, artificial identifiers replace an individual's real identifiers.
- In anonymization, encryption or removal of the identifiers from the data set occurs.
- The term "de-identification" is a general term for either or both processes.
One important note: before you start the pseudonymization process, you should remove duplicate, inaccurate and out-of-date information from your data sets. Not pseudonymizing any data that are no longer useful will save you time, money, and effort.
You should also appoint an organization-wide "data controller." Under the GDPR, a data controller is an individual who oversees data collection, storage, and processing. Appointing a figure to take responsibility for data privacy and cybersecurity is an essential component of good data management.
Below are 6 data pseudonymization and data anonymization techniques:
1. Data masking: Data masking "masks" sensitive information with random or dummy characters. One example of data masking is replacing all but the last four digits of a 16-digit credit card number, e.g. "XXXX XXXX XXXX 7534." Data masking seeks to make the masked data unrecognizable to outsiders, while still as identifiable as possible for people with the original data set.
2. Data scrambling: Data scrambling mixes or obfuscates data fields (e.g. the last name "Williams" might become "Wsliailm"). Data scrambling alone is unlikely enough protection for PII since a savvy attacker might be able to unscramble it.
3. Data encryption: Data encryption translates information into an encoded format with the help of an encryption algorithm. Unlike data masking, encryption is reversible; the encrypted data is incomprehensible to anyone without the corresponding decryption key, which allows conversion of the data back to its original format.
4. Data tokenization: Data tokenization replaces PII and sensitive data with random strings of characters, known as "tokens." Each token serves as a unique reference back to the original data, which is stored in a "token vault."
5. Data blurring: Data blurring "blurs" the original data (i.e. changes or perturbs it slightly by adding small amounts of noise). For example, the day, month, or year of a person's birthday might be inaccurate, or a few digits of a phone number could be incorrect.
6. Data bucketing: Data bucketing "buckets" individual values together. For instance, a person with an age of 27 might have this value replaced with "20s."
How Integrate.io Can Help with PII
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
We've gone over how to pseudonymize your PII — but how do you build a system that can create pseudonymized data and anonymized data? Integrate.io can help.
Integrate.io is a feature-rich data integration platform that makes it easy to run secure ETL (Extract, Transform, Load) workloads. During ETL, you can designate that transformations automatically pseudonymize and anonymize your data before loading it into the target data warehouse. We fully comply with data privacy regulations such as GDPR, CCPA, and HIPAA, so your PII enjoys protection throughout the ETL process.
Want to learn more about how Integrate.io can help protect your PII? Get in touch with our team of data experts today for a chat about your business needs and objectives, or to start your 14-day pilot of the Integrate.io platform.