-
Introduction to Data Security
-
Chapter 1
Developing your Data Security Policy
-
Chapter 2
Understanding Data Security Compliance Laws
-
Chapter 3
Classifying Data by Sensitivity
-
Chapter 4
Building a Security Strategy on Identity
-
Chapter 5
Working with a Trusted ETL Partner
-
Chapter 6
Essential Cloud ETL Data Security Features
-
Chapter 7
6 Security Questions to Ask Your ETL Vendor
- How can your platform help protect our PII, PHI, and other sensitive data?
- What examples can you share of how you have helped other clients with their data security?
- What features does your platform have to maintain compliance with regulations such as GDPR, CCPA, HIPAA?
- How can your data security team assist with our data security strategy and implementation?
- How do you remove/encrypt sensitive data in Europe for GDPR before moving data to the U.S. or elsewhere for centralized analysis?
- Does your platform support field-level encryption for sensitive data fields?
-
Conclusion
Essential Cloud ETL Data Security Features
There are many cloud-based ETL providers on the market, each offering a range of attractive features. For example, Integrate.io offers benefits like:
- High-speed transformations on a staging server
- Automated integration with most major production systems and data repositories
- No-code data pipeline automation
- 24-hour support and error recovery
But security is the most crucial aspect of any ETL solution. If a vendor can’t offer a full suite of security options, then it’s worth shopping around for someone you trust.
Key data security features
As discussed in the previous section, there are certain things that you should verify about each vendor, such as SOC compliance, physical security and reputation.
It’s also a good idea to look at the data security features they offer to users. The most important ones are:
- Secure login: Your team will access the Cloud ETL service through a web interface. This interface should offer a secure connection with outstanding authentication features, including 2FA and suspicious activity detection. Do they offer Single Sign On (SSO)?
- SSH/Reverse SSH tunnel: The best ETL vendors will allow you to connect without compromising your security. This usually involves an SSH tunnel or reverse SSH if you can’t provide port access. Integrate.io supports both SSH and reverse SSH.
- Non-persistent data: ETL should transport your data from A to B with no records in-between. This means no copies, no archives, no logs – nothing that might inadvertently cause a risk of a data breach. Look for a service like Integrate.io that guarantees the non-persistence of all data passing through the pipeline.
- Data encrypted in transit, and at rest: Within the ETL process itself, data is sometimes at rest or moving between locations. The vendor should be able to guarantee robust encryption for in transit and at rest throughout the ETL process.
- Regular penetration testing: As per SOC 2 requirements, penetration testing occurs regularly. You might request the PenTest reports when signing up, but remember to keep checking them each year while you’re signed up for the service.
Security through data transformation
ETL can also improve your overall level of data security by offering transformation functions that protect sensitive data.
- Field-level encryption: Field level encryption means that data is always encrypted when it leaves your network. Decryption is impossible without the key, which you hold on your side. Should anyone intercept or access data while it’s outside of your network, they won’t be able to decrypt it. Integrate.io offers field-level encryption using Amazon’s Key Management Service (KMS), and you can use this to encrypt data at any stage in the ETL process.
- Hashing: Hashing is a one-way cryptographic function that replaces sensitive data with a meaningless value. For instance, you can configure your ETL to replace social security numbers with a set of random characters.
- Masking: Masking is commonly used in testing and analytics scenarios, where your team might need large volumes of representative data, but they don’t need genuine personal information. An ETL masking layer will produce an arbitrary value that meets requirements but doesn’t expose personal information. For example, the ETL platform could replace Social Security numbers with a random 9-digit number.
- Obfuscation: Obfuscation is a way of hiding data values that is often reversible. For instance, the ETL may replace certain values with codes from a lookup table. That lookup table later makes it possible to restore the original values.
ETL can help to minimize risk to data in transit by hiding or removing sensitive information. This is the ultimate data security strategy – if someone does intercept or access the data, they won’t find anything of value.