Data access is an approach to data management that organizes the way people read, write and edit data within an organization.
As an organization amasses data, it quickly runs into some common data management challenges. These issues revolve around questions like:
- Who should we allow to access data?
- Who should we prevent from accessing data?
- How do we implement these rules?
- How do we monitor and log data transactions?
Data access policies help to address these questions in a consistent and transparent way. The term “data access” describes all the processes, rules, and best practices related to data access management.
Basic Principles of Data Access
When organizations are putting together data governance frameworks, they have to manage a variety of concerns, some of which seem to contradict each other.
There are three primary concerns that organizations always have to consider when forming data access policies. This is the CIA Triad:
-
Confidentiality: No one should be able to access data unless authorized to do so.
-
Integrity: The system should never allow a data operation that will cause errors or data loss.
-
Availability: Whenever someone has a legitimate business need, they should have unfettered access to data.
Addressing these concerns is often a delicate balancing act. There has to be a careful equilibrium between security and ease of use.
How Do Companies Approach Data Access?
Each organization devises a data access policy that suits its specific needs. These policies often emerge over time, and they can evolve as the business grows.
When building a data access policy, companies will typically walk through the following steps:
Categorize the Data
Not all data is the same. There are different categories, and each of these categories will have its own data access policy. The main categories are:
PII: Personally Identifiable Information is sensitive information about actual people, such as their names, addresses, or social security numbers. Companies may have a regulatory responsibility to protect this data. As such, this data requires tight access controls.
Sensitive business information: Leaked internal information can threaten a company’s position. This might include unpublished financial records or analytics results. Such data calls for very a strict access policy.
Low-risk data: Some data may not present a major security risk. For example, pseudonymized customer data or publicly available company information. A more relaxed access policy might be appropriate here.
System information: This is data that other systems generate automatically, such as network logs and error reports. This information rarely requires strong access controls.
Data access is rarely one-size-fits-all. Typically, organizations build a flexible policy that suits various circumstances.
Review Compliance Requirements
Regulations can have a tremendous impact on data access policy. For instance, Europe’s GDPR rules specify that employees can only access PII when they have a legitimate business purpose. The law also restricts international data transfers, which could affect data access if an organization uses a cloud service based abroad.
In general, companies need to contemplate the following questions:
- What local laws impact our data access policy? (i.e., CCPA)
- What industry laws impact data access (i.e., HIPAA)
- Are we transacting in areas with tougher laws? (for example, GDPR applies to American companies that do business with European customers)
- How can we anticipate new laws and future-proof our data access policy?
Build a Centralized Data Structure
Data access management is tricky, but it’s easier with centralized data. For instance, picture a company with a dozen discrete systems. That company may need to establish a dozen individual data access policies to cover each one of those systems.
Alternatively, companies can store all data in a central repository, such as a data warehouse. They would usually implement this by using an Extract, Transform, Load (ETL) process. The ETL will pull data from each source, integrate it, and then load it to a central location.
This approach is perhaps the best way to tackle the CIA Triad. Centralized data is of good quality and easily available. Companies can also limit access to the data warehouse, which helps ensure confidentiality.
Grant Role-Based Access
So who gets to access the data? This is the biggest question concerning data access, and it gets harder to answer as an organization grows. If a company has five staff members, then a database administrator can set personal access levels. When there are 5,000 employees, that’s not possible.
Role-based access is the most elegant solution to this problem. The administrators create a set of roles based on job title, seniority, and other factors. They then assign each user to one of these roles. If the user changes positions, the admins don’t manually reconfigure their permissions. Instead, they just assign them a new role.
This helps get the data access balance right. An example of this is customer data. Salespeople and service agents will both need to see this data, but they might look at different subsets. With role-based access, you can configure data access so that service agents can view active customers but not view unconverted leads.
Log Data Transactions
The last element of data access is logging. Organizations should build their data infrastructure in a way that offers visibility and accountability. When someone performs an unauthorized data action, the system should have a log of exactly what happened.
This is another area where ETL makes a difference. ETL can power a data pipeline that connects each database. It can also help to keep track of transactions that occur within the ETL pipeline so that admins have a paper trail if something goes wrong. This is an important step in ensuring a strong, functional data access policy.