Preparing data for Tableau generally requires an ETL solution. The problem is that most ETL solutions are extremely expensive and difficult to use – or they lack the sophisticated data transformation capabilities that your use-case requires.
Integrate.io is different. Integrate.io is a cloud-native, enterprise-grade ETL platform that empowers anyone to quickly create sophisticated ETL pipelines between any source and destination. Even if you’re a data integration beginner, each Integrate.io provides unlimited support from a dedicated integration specialist – so you’ll have an expert available for guidance whenever you need it.
In this guide, we’ll show you how to prepare data for Tableau using Integrate.io in five quick steps. But first, we’ll start with a brief overview of data preparation for Tableau and why it's necessary.
- Overview of Data Preparation for Tableau
- Preparing Data for Tableau: A Step-by-Step Integrate.io Guide
- Conclusion
Overview of Preparing Data for Tableau
Tableau is an enterprise-grade business intelligence solution that offers a wide range of data analytics and visual presentation features. With its easy-to-use, point-and-click interface, Tableau empowers you to analyze and explore data via beautiful dashboards – then share the insights you find with decision-makers and the rest of your team.
Some of Tableau’s best features include:
While Tableau offers many advanced features, exposing your data to the platform requires data preparation through a data integration (ETL/ELT) process. You can prepare data for Tableau by developing an automated ETL pipeline with Integrate.io.
After setting up your ETL pipeline (or “package”), Integrate.io will run the package at preset intervals according to the custom schedule you define. When running the package, Integrate.io will periodically (1) extract data from the source; (2) transform it in Integrate.io’s robust transformation layer in a way that suits your data manipulation/cleaning/transformation requirements; and (3) load the information into the data warehouse. From there, Tableau can connect to the data warehouse, access the data, and analyze it to produce the visual metrics you need.
Ultimately, the data preparation actions you implement in Integrate.io will reflect the analyses that you need to perform. For example, your data preparation for Tableau might need to:
- Filter the data on certain columns
- Pivot and aggregate the data based on different timeframes
- Normalize formatting to support Tableau’s ability to read/analyze the information
- Remove PHI/PII data for compliance purposes before loading it into a data warehouse.
Now let’s explore an example, where we use Integrate.io to develop a complete data preparation pipeline for Tableau.
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
Preparing Data for Tableau: A Step-by-Step Integrate.io Guide
The following sections are a step-by-step guide to preparing data for Tableau using Integrate.io.
1) Understand Your Requirements
The first step in preparing data for Tableau is to understand your requirements. For the purposes of this guide, we will use the following example scenario:
Your business sells a unique software product, and your website includes a sales funnel that offers a free trial of the software. To sign up for the trial, potential customers fill out a web form and schedule an appointment to speak with a representative.
You want to measure the success of this sales funnel by pulling data from Google Analytics and sending it to Tableau. This Google Analytics data includes a range of metrics on how website users interact with your website and web form sales funnel.
Decision-makers want you to aggregate this data by Day, Month, Week, and Year-to-Date. They want to investigate different customer actions within the web page funnel, and whether those actions result in the successful completion of product “Intro” and product “Demo” sales calls. By analyzing the data with Tableau, decision-makers can visualize and identify bottlenecks in the sales funnel and experiment with changes to make it more successful.
Now that we understand the goals and requirements of the data analysis, we can develop a rough idea for the data preparation components the Integrate.io dataflow should implement. Generally speaking, we can divide the process into the following steps:
-
Connect to the data source: Establish a connection to the data source (Google Analytics in this case).
-
Extract information: Select the specific columns of data to extract, and extract the data into your ETL data engine.
-
Filter transformations: Use “Filter” transformations to filter the data on different columns to parse out the rows of data the analysis needs.
-
Select transformations: Use “Select” transformations to “pivot” the table in a way that creates new columns of data based on different time-frames (like Day, Month, Week, and Year-to-Date).
-
Aggregate and Sort transformations: Use “Aggregate” transformations to add up totals for the new Day, Month, Week, and Year-to-Date columns created in the previous step. Lastly, use a “Sort” transformation to order the data table by date.
-
Load the data into the data warehouse: Load the transformed data into the data warehouse so Tableau can read it.
In the following sections, we’ll use Integrate.io to develop a data transformation workflow that mirrors the above steps.
2) Create a Connection for the Data Source
Integrate.io offers 200+ native connectors so you can connect with popular data sources including Google Analytics. It also offers a Universal REST API connector so you can connect to more diverse API endpoints. To connect to a data source with a native connector, log into Integrate.io, and perform the following actions. First (1) select the “Connections” icon in the upper left to navigate to the library of connections. Next (2) select “New Connection”:
From here, select the data source that you want to use. In this case, we're using the “Google Analytics” connector that we set up in the previous step.
Complete the form that appears (next image). First (1) name the connection. Then (2) insert your user name, and (3) insert your password. If the source is a database, you will insert the Hostname data as well.
3) Create a New Data Pipeline (Create a Package)
Now you’re ready to create the data pipeline, which is called a “package” in Integrate.io and it contains all of the ETL steps. First (1) select the “Packages” icon in the upper left of the dashboard to navigate to the transformation workflow tool. Then (2) select “New Package” in the upper right:
A package configuration form will appear (next image). Complete this form by (1) naming the package (we have named it “Google Analytics”). Then (2) indicate the type of package (workflow or dataflow). In this example, we are creating a “Dataflow,” which is a data pipeline with three types of components (Data source connections, Transformations, and Destinations). A “Workflow” package allows you to orchestrate a sequence of dataflow packages and set rules/dependencies for when they run. After selecting the package type (3) click the “Create Package” button in the lower right:
(i) Choose and Configure the Data Source
In this step, you will add the first component (the source) to the dataflow package. Integrate.io allows you to extract data from multiple sources within the same dataflow, but for this example, we only use one. Open the package you just created. Click “Add Component”:
Select “Google Analytics”:
The new source component will appear. Select this component to open the setup wizard:
When setting up the source component, you need to give it a name. We named it “google_analytics_1.”
In the next step, “Choose input connection,” you will identify the data source (we used the Google Analytics connection that we set up above). After clicking “next,” you can set up “Source properties.
“Source properties” identifies the properties of the data you want to extract. In this case, we have selected a custom date range of July 9, 2019, to July 13, 2020:
The next step is “Select input fields.” Here you will select the Google Analytics data fields to extract. There are thousands of Google Analytics data fields. Use the Dimensions, Metrics, and Meta Data taps to search for the fields that your ETL process needs. Clicking the “+” icon next to any field to add it to the “selected fields” list.
You can preview the raw data you're extracting by clicking the “refresh” button in the lower right. Once complete, click “Save.”
4) Configure Your Data Pipeline
Integrate.io has 15+ ready-made, no-code data transformation components, and 200+ ready-made transformation functions that you can apply to the data. We recommend preparing the data as a series of individual transformations organized into separate components. This will help you stay organized, and it will make performance tuning easier later.
We also recommend starting with one or more Filter components. This allows you to filter out the data rows you want column by column based on the specific values that you need. It will also make your data table smaller, easier to manage, and improve ETL performance. In the initial transformations, you may also want to use one of Integrate.io’s encryption or hashing functions to secure any. PHI/PII data in accordance with your industry’s compliance standards.
If you’re familiar with the general logic of SQL, Integrate.io’s available transformations (Select, Sort, Join, Limit, Cross Join, etc.) will probably make sense to you. If you’re not sure which transformations to choose or how to set them up, you can always window chat with your dedicated Integrate.io integration specialist for immediate assistance.
(i) Create Two Filter Transformation Components
For the reasons stated above, the first two transformations will be “Filter” components. Add the first Filter component to the pipeline by hovering over the source component. Click the “+” that appears in the blue area below it the icon:
Choose the type of transformation from the window that appears. Choose “Filter”:
Next, you will see a form that lets you specify the details of the Filter component. First (1) Label the component (we labeled it “get_invitee_funnels”). Then (2) identify the column and the desired rows of the column by naming specific field values. In this case, we are filtering on the “ga_eventAction” column. We are filtering all the rows where the “ga_eventAction” column “text equals” four specific values (see image). When finished, click “Save.”
*In the above image, each of the filtered values represents a step customers take in the sales funnel while they schedule an appointment with a company representative to try the software. By filtering these rows, you’re giving Tableau the specific data it needs to identify any steps where customers are dropping out of the funnel. Decision-makers can use the Tableau analysis of the data to brainstorm solutions to any bottlenecks.
Next, we’ll add a second Filter component to the pipeline that filters the data more. Click the “+” below the Filter component we just created. Again, click “Filter” in the menu that appears:
In this second transformation, you’ll filter the data more, but on a different column. First (1) label the component (we labeled it “filter_demo_calls”). Next (2) choose the values that indicate the rows you will filter out from the column. We have selected the “Intro” and “Demo.”
Note that we have also selected the “regex” operator (see next image). The regex operator performs a close match on the terms “Intro” and “Demo” instead of an exact match. This allows for variables and typos.
*Note that the second filter refined the dataset further to produce an even smaller table of information. The resulting table focuses on specific data as it pertains to customer funnels that lead to product Introduction and Demo calls. This offers an easier-to-analyze dataset to Tableau and the entire process will be more performant.
(ii) Select Components: Pivot the Table and More
SQL developers like to “pivot tables” to derive and highlight specific information in a dataset. Pivoting the table means that we will create new columns of data from existing data. In this case, the new columns will show aggregated values for different time periods (Day, Month, Week, and Year-to-Date). By pivoting the table, we give Tableau the data it needs to create timeframe specific graphs and visuals. To do this, we will add a Select transformation component to the pipeline and use Integrate.io’s Expression Editor to define the nature of the pivot.
Ultimately, we will add two “Select” components. The first Select component includes the pivot functions described above. The second Select component, unlike other table-level transformation components, is a field-level transformation component. It includes more nuanced transformations to manipulate the new column data that the pivots created.
First, we’ll create both components. (1) Click the “+” below the previous component in the pipeline, and choose the “Select” icon at the top to add the Select component. (1) Repeat this to add a second “Select” component.
Click on the first component to configure the pivots (next image). First (1) name the component. We named it “pivot.” Next (2) search for the column you want to select. Then (3) click the icon to write the pivot in the Integrate.io Expression Editor. You’ll repeat this process for every new column you want to create. In this example component, there are five pivots, but we will only look at the one indicated by arrow “3.”
After clicking where Arrow 3 indicates above, you’ll open the Expression Editor. The Expression Editor allows you to search for cut-and-paste functions that will define the pivot. In this case, we have pivoted on the “ga_eventAction” column to create a new column for the row value “invitee_event_type_page.” We have named the new column “ga_totalEvents.”
Click ‘Save,’ and move on to the second Select component.
As for the second Select component, we have named it “Select_1.” There are seven transformation functions in this component. Without going into a technical description of each, the following images show you how we configured the second Select component to manipulate the new column data which was created by the Pivot component:
After clicking the Expression Editor icon (green arrow above), you can see the Expression Editor functions for the selected transformation. This particular transformation manipulates the “ga_date” column into a different format that allows Tableau to identify the “sequential week number” within a particular year. Now your data table will have a column that shows you which number week from the year you are viewing data for:
(iii) Aggregate and Sort Components
There are only two more transformation components left for the pipeline. First (1) we will create an “Aggregate” (or GROUP BY in SQL) component to add up values for the new columns created in the last section. Then (2) we will create a “Sort” (or SORT BY in SQL) component. We’ll add these components like we did the previous ones.
Both of these components are relatively simple to set up. In the next two screenshots, you’ll see how to configure them. Here’s how we configured the Aggregate (GROUP BY) component:
Here’s how we configured the SORT BY component:
We have now finished the process of configuring the dataflow components for all of the data preparation tasks. In the next section, we will add the “Destination” component that loads the prepared data into the data warehouse.
(iv) Connect to the Destination
In this step, we will connect to the destination data warehouse that Tableau reads. Do this by adding a new component to the pipeline, and select the data warehouse you intend to use. In this case, we’re sending the data to Google BigQuery:
The next three images show the three steps in the wizard that configures the Big Query destination. In the first image, we named the component “Bigquery_destination” and selected our test_BQ destination connection:
In the next image, we define the name of the source table (which is the data table that results from the previous data pipeline):
Lastly, we map the input to appropriate target table columns. You can use the “find” feature to locate the appropriate fields:
With this final connection to the data warehouse established, you can run the package to load the data, and you can schedule the data to refresh automatically according to your minute-by-minute, daily, weekly, or monthly needs.
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
Once the data has loaded into the data warehouse, Tableau will be able to analyze and produce interactive visualizations and graphs so decision-makers can understand the data.
Conclusion
This concludes the tutorial on how to prepare data for Tableau. After reading this guide, you should have a clearer picture of what it’s like to build a sophisticated ETL pipeline for Tableau with Integrate.io.
As you have seen, the way you prepare data for Tableau depends entirely on the use-case – i.e., the nature of your data and the types of analyses you want Tableau to perform. Ultimately, a powerful yet easy-to-use ETL platform like Integrate.io can help you build automated data pipelines like the one above in a matter of hours. This dataflow will then run on autopilot – according to your schedule – to give your team the metrics they need for more strategic business decisions.
Remember, if you ever get stuck using Integrate.io, your dedicated Integrate.io integration specialist is always available to take the reins and help you set up the data pipelines you need.
Want to try Integrate.io for yourself? Contact our team for a free Integrate.io trial!