Extract, Load, Transform (ELT) technology makes it easy for organizations to pull data from databases, applications, and other sources, and move it into a data lake. But companies pay for this convenience in many ways. ELT solutions can have a negative impact on data privacy, data quality, and data management.
What is ELT?
ELT technology is a process that handles large-scale data extraction, loading, and transformation. Organizations adopt this type of solution as part of a data pipeline that supports their analytics and business intelligence by moving data into a data lake. Once the data moves into the data lake, its transformation occurs on an ad hoc basis.
How Does ELT Work?
ELT solutions use a three-step process for the data’s journey from the source to the data store.
Extract: Because ELT data is moving into a data lake, it doesn’t need to go through any changes before doing so. This characteristic enables ELT to work with raw, unstructured, and semi-structured data of all types. Many organizations have data spread throughout databases, cloud solutions, applications, and many other places. Business intelligence tools cannot get a full picture of the organization’s data with this configuration. The ELT tool connects with each data source and extracts the relevant information directly.
Load: Following the extraction, ELT tools take the data and move it to the organization’s data lake. Since the data is untransformed before it loads into the data store, organizations cannot use data warehouses unless they are only working with structured data. Massive volumes of data move along this pipeline, as the lack of restrictions on what it can store opens up big data functionality. Typically, this process is semi- or fully automated, and data loads to the data lake as it collects in the sources.
Transform: Organizations only transform the data that is being moved at the time of their choosing. Given the massive data volumes moving into the data lake, data teams can pick the sets they want to work with their business intelligence tools. In addition, as new data types develop, data managers can store the information even if they’re not able to use it with their current tools.
Why Do Organizations Use ELT?
The biggest draws of ELT are that organizations don’t need to be selective about the data they extract and they can load it into the data lake quickly. As the amount of data companies generate increases exponentially year over year, ELT solutions help organizations consolidate data into a centralized store for their data science teams. They also don’t need to assign resources to filter through the data before extracting it, and even raw data can flow into the data lake.
The ELT world might sound enticing, but it has a dark side that causes many problems for organizations. Here are three big reasons ELT is a bad idea:
1. ELT is Bad at Data Privacy
As consumers are becoming more aware of data collection and usage, data privacy is becoming a critical issue for many organizations. When it comes to maintaining data privacy, ELT falls short in a number of ways.
Sensitive data comes in many forms. It includes names, addresses, social security numbers, medical information, credit card numbers, and other personal information. This info may live in enterprise resource planning tools, customer relationship management platforms, and other data sources throughout the organization. Since ELT technology typically extracts all available data with little to no pre-filtering, sensitive data may mix in with other data types.
By extracting this sensitive data without transforming it, organizations could fall out of compliance with data privacy regulations. A lack of compliance may result in fines and other penalties being levied against the company, and these costs can be high.
However, if organizations completely drop sensitive data from the extraction process, they may lose out on valuable insights. Without a way to transform this information, such as through a data masking process, ELT tools limit how organizations can work with the data they collect.
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
2. ELT Tanks Data Quality
Data quality affects a variety of business aspects, from the customer information used by front-line workers to the decisions made by leadership. Since ELT tools use a data pipeline that transfers data from the source to the data store with no steps in-between, organizations cannot address data quality issues before the information reaches the data lake.
A lack of data cleansing results in more work for data science teams and anyone else that depends on this resource to do their job. While the extraction and loading process is fast, getting insights from the data takes longer because cleansing and transformation have to take place for every ad hoc analysis performed.
The ability to work with every data type can also quickly lead to a data lake filled with massive data sets. Organizations may want to keep irrelevant data “just in case” and avoid any filtering process in the ELT pipeline. Because of this, poor-quality data blends in with the rest of the information and ends up slowing down the analysis process.
Organizations may store customer information in many places. Those with data silos may have hundreds of duplicate data sets among their tools. This data takes up a lot of space and requires substantial preparation before it’s usable.
Since ELT transformations take place on an as-needed basis, the data science team may be duplicating work. Larger teams may perform the same transformations on data sets over time, which lowers productivity and impedes the quick finding of actionable insights. Duplicated work can also lead to worker frustration and disengagement.
Poor-quality data has other long-term consequences. If customers have to repeat the same information every time they interact with a company, it could impact the organization’s reputation with its audience. And customer service representatives will have to deal with frustrated people, which may take up more time than the typical call.
Slow insights from a data set could also make it impossible to act on opportunities. If the market shifts or a start-up disrupts the industry, organizations need to know how it is going to impact them as soon as possible. Delayed insights can make it difficult to compete.
3. ELT Complicates Data Storage and Management
The downsides of indiscriminately extracting every bit of data available to an organization are storing and managing it. Additional data sources are being introduced constantly, especially with the Internet of Things putting sensors in countless new places.
Organizations need to pay for data lakes capable of supporting their current and future scale, which can drive data storage costs up significantly. They also need to consider how much space the transformed versions of the data will also take up.
Every person on the data science team doesn’t need identical access to the data sets in the data lake. However, the sheer scale and unsorted nature of the data make it challenging to create data access policies. Users may have to sort through data that isn’t relevant to their work or, even worse, wind up accessing sensitive information that they shouldn’t be looking at.
Data governance policies need to account for all data types that get pulled into the data lake. For an organization with hundreds or thousands of data sources, this task can be daunting. Trying to stay on top of the latest developments can be a full-time job all on its own. The IT management team also needs to ensure that these policies are being followed and make changes as needed to better support the analysts’ needs.
ELT lacks granular control over the data pipeline, so it takes a one-size-fits-all approach that’s poorly suited to the varying needs of an organization’s databases and applications. Information that is not useful to the company merges with the rest of the data sets.
As time goes on and the data lake increases in size, auditing data usage and access becomes more difficult. When a new data privacy regulation becomes law, trying to bring the organization into compliance may require substantial resources and cause disruption to normal operations.
Organizations that choose to go the ELT route may also miss a cost-saving opportunity. They could use the resources that go into managing the data lake and ELT process elsewhere to better serve their objectives. Since ELT is a less mature technology than alternative options, IT decision-makers also have fewer workarounds available.
Alternatives to ELT
These three issues make ELT a bad choice for organizations that want to get the most out of their data. Extract, Transform, Load (ETL) technology sounds similar to ELT, but it offers many advantages, with none of the weaknesses that ELT introduces.
Just like ELT, ETL starts with an extraction step. What happens after this step in ETL is much different, however. Rather than moving the data directly to a data store, ETL data pipelines pass it through a transformation step.
Organizations have fine-tuned control over the data as it goes through the pipeline, allowing them to mask sensitive data, cleanse poor-quality data, change formats, and perform other activities. When the data finally loads, it’s already prepared for use. So data scientists and other specialists do not need to wait for the transformation, and they can quickly surface important insights.
ETL supports greater data pipeline complexity than does ELT. This is particularly useful for organizations with complex analytics needs. The learning curve may appear to be wider with ETL pipelines when compared to ELT’s simple extract-and-load everything approach, but modern ETL tools such as Integrate.io provide a user-friendly ETL pipeline builder that supports users at every step of the way.
Integrate.io’s no-code and low-code options also open up data pipelines to more than just the data science team. Business users of all technical skillsets can create data pipelines that support their analysis needs.
Get hands-on with Integrate.io and learn more about its advantages over ELT when you check out our 14-day demo.