Every data-driven business is terrified of the prospect of a data breach. Exposing sensitive data could mean reputational damage, loss of clients, and heavy fines under emerging privacy laws.
But every data-driven business also wants to make use of its data. Business intelligence (BI) platforms allow anyone to build complex and detailed dashboards that help them understand the organization’s current state.
How do you resolve this tension? One approach is to build a privacy-first data structure.
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
Table Of Contents
- What Is Privacy by Design?
- What Are the Principles of Privacy by Design?
- How Can Restructuring Help BI Compliance?
- Integrate.io: Privacy in Your Bones
What Is Privacy by Design?
Privacy by Design (PbD) is a concept that dates back to the '90s, and it has become popular again thanks to GDPR. The GDPR text specifically encourages data controllers to protect data by taking a privacy by design approach.
The PbD philosophy involves thinking about data security and privacy right from the beginning rather than at the end. Many data architects and engineers start with the project requirements, then later try to implement privacy solutions.
Privacy by design means starting out with the question, “how do we keep data safe?” When you have a privacy-oriented structure, it’s easier to stay safe when deploying new applications — such as a BI platform.
What Are the Principles of Privacy by Design?
Privacy by Design follows seven core principles:
Proactive, Not Reactive
PbD architecture is about anticipating future issues and building remedies into the structure. PbD doesn’t wait for threats to appear — it prevents threats from becoming an issue.
Privacy as the Default Setting
Users tend to stick with the default settings. If privacy is an option, then most people won’t choose it. PbD flips the equation by treating privacy as the default and allowing people to choose to be more open.
Privacy Embedded into Design
In PbD, privacy is not an additional step in the data process. Instead, privacy is the data process. The designers aim to deliver solutions that provide security and functionality in equal measure.
Positive-Sum, Not Zero-Sum
Privacy can add to a system, by streamlining data flows and helping to improve data quality. PbD aims for these kinds of win-win, rather than treating privacy measures as a necessary price to pay.
Full Lifecycle Protection
Exposure to private data is always bad, no matter where it occurs. PbD takes a whole-of-life perspective of data, from the moment of ingestion until deletion.
Visibility and Transparency
All stakeholders should understand how data is stored and processed. When there are plenty of eyes on the process, any privacy issues will soon come to light.
Keep It User-Centric
Users are the ultimate authority in a PbD system. They can ask you to amend or delete any information that relates to them, and they can be confident that you’ll follow their orders.
The user-centric principle is at the heart of GDPR and similar privacy laws around the world. If your users have full control of their data, there’s a good chance you’ll stay compliant.
How Can Restructuring Help BI Compliance?
BI poses a very specific compliance problem. How do you hide personal data from analysts, while also empowering those analysts to create meaningful reports?
A privacy-by-design restructure can help achieve this goal. Here’s what you need to do.
Conduct a Full Data Audit
First, you’ll need to get a picture of your current state. This involves doing a full sweep of your network to clarify your current processes. Pay attention to:
-
Data sources: Production systems like your CMS will hold substantial amounts of personal information.
-
Data repositories: Your data lakes and data warehouses are full of rich historical data, which may or may not contain sensitive information.
-
Integration: Data has to move from one system to another, which it might do via a data pipeline. You might also have direct one-to-one connections between systems.
-
BI platforms: Your BI tools might access sources and repositories directly. They might also store copies of data on their own local servers.
Once you’ve mapped out this architecture, you should understand where data originates, where it ends up, and how it travels between those points.
Identify and Categorize Sensitive Information
There are different levels of sensitivity, and these require different processes. Each organization has its own taxonomy, but you’ll generally end up with tiers like this:
-
High sensitivity: Would cause substantial damage if leaked. This includes market-sensitive internal information or personal details like Social Security numbers or bank details.
-
Medium sensitivity: Information that might cause some damage if leaked, like customer names and contact details.
-
Low sensitivity: Information that won’t cause much damage, like internal memos or a customer’s vehicle registration number.
These categories can vary according to circumstances, and laws like GDPR and CCPA have defined data categories. The important thing is to have a consistent system that applies to all data.
Normalize Data Tables
Relational database structure can impact data privacy. For instance, if your Order table contains customer data, then BI analysts might see that information.
You can reduce this risk with full normalization. That means building a relational database structure where all data entities have their own tables and are linked by keys. Your Order table won’t have any sensitive info — just a link to the relevant entry in the Customer table.
If you can’t change the tables in a proprietary system, you can use an ETL (extract, transform, and load process) to convert data into a more secure structure and then store it in a data warehouse.
Mask and Obfuscate Personal Data Where Possible
ETL can also help to remove or hide sensitive information within your database. When you use ETL, data passes through a transformation layer. Here, you can apply data masking rules, or you can delete private information entirely.
An ETL data pipeline will extract data from your sources and then load it into a repository, like a data warehouse. Your BI platform connects to this repository and finds the information it requires.
If the data in the warehouse is obfuscated, then there’s no way that you BI analysts can commit a data breach. They simply don’t have access. And yet, they can use BI tools to create useful reports.
This is the ultimate goal of privacy by design.
Integrate.io: Privacy in Your Bones
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
The right tools can make this job much easier. Integrate.io is a powerful no-code ETL that connects your data sources to a secure repository. You can transform and obfuscate at will, and you can make data directly available to most popular BI platforms.
Best of all, Integrate.io understands the importance of compliance in the age of CCPA and GDPR. Our solution is designed to help you meet BI compliance requirements and to keep your data safe, always.
Want to see Integrate.io in action? Get a 14-day demo of Integrate.io.