Data enrichment is the process of completing partial records or expanding existing records, by appending another database or by otherwise filling in blank fields.
What is the Benefit of Data Enrichment?
Data can describe a real-world entity, such as a customer or an employee. However, that data may not always be exhaustive for each entity.
Consider a customer database like this one:
Each of these records is incomplete to some extent, which can hamper outbound marketing campaigns, as well as impacting the organization’s ability to perform detailed analytics.
Data enrichment is the process of filling out all relevant fields so that there is a detailed database record. In this instance, that would be a customer record with full name, email address, phone number, and zip code.
Data can also be enriched by adding additional fields. In the example above, the data could be expanded by adding a full postal address to each customer record.
The data enrichment process can apply to anything that is described in data. At the end of the enrichment process, each record should contain more detailed information about the entity it describes.
What is the Data Enrichment Process?
Data enrichment requires two data sources: the target source, and the source of the new data.
New data can be acquired in three ways:
-
Direct: For personal data, the data subject can provide any missing information. For example, a customer can respond to a survey or complete their online profile, and this information will be added to the CMS.
-
Internal: Internal databases such as the CMS, ERP, digital services, or other production systems might hold data that can be used for enrichment. Once identified, this data can be exported, cleansed, and merged with the target data source.
-
External: Some enterprises may choose to purchase data from a third party. This data will typically be delivered as a data file such as JSON or CSV. Once new data is received, the organization must cleanse it, removing any corrupted data or invalid entries. If the data is from an external source, it must be validated to ensure its accuracy.
After the new data has been cleaned up, it’s ready to merge with the existing database. This step can be performed before the Extract, Transform, Load (ETL) process to ensure that any production systems have the most up-to-date information.
Before beginning the merge process, the data owner should decide how to handle conflicts. It might be best to overwrite the existing record or ignore the new entry, depending on the reliability of the data source. If the data enrichment results in a new column being added to an existing table, the data owner will need to ensure that this new field is accurately referenced in views and queries.
Enriched data should be tagged as such for auditing purposes. The data owner needs to keep changelog with a full record of the data sources used, especially when dealing with personal information.