What should you know before getting started with ETL in Heroku? In this article, we provide a comprehensive guide to using Heroku ETL.
What is Heroku?
Heroku is a cloud PaaS (platform as a service) solution for developers to rapidly and efficiently build, deploy, monitor, and scale their applications. Originally intended for use with the Ruby programming language, Heroku now includes support for languages such as Java, Scala, Python, PHP, and Node.js. Salesforce, a cloud provider of CRM (customer relationship management) software, acquired Heroku for $212 million in 2010.
One of the greatest benefits of using Heroku is that the platform automatically handles issues with server, database, and infrastructure management, freeing you from having to deal with these frustrating technical concerns. In addition, Heroku offers managed data services for several popular databases, including Heroku Postgres, Redis, and Apache Kafka.
What is ETL?
The ETL (extract, transform, load) process is the most widely used form of enterprise data integration. ETL is a three-stage process:
-
First, information is extracted from one or more data sources (e.g., files, databases, websites, SaaS applications, etc.).
-
This raw information is then transformed to remove out-of-date or duplicate data and to fit the schema of the target location.
-
Finally, the transformed data is loaded into a centralized data warehouse or data lake.
Creating automated ETL pipelines is a good way to establish solid data governance for your organization. By performing ETL at regular intervals, you can ensure that your business intelligence and analytics workloads always have the freshest, most accurate information when they pull from your enterprise data warehouse.
Do You Need Heroku ETL?
As mentioned above, Heroku is a subsidiary of Salesforce, which means that it’s easy to integrate your Salesforce CRM data with a Heroku database. In particular, many Heroku customers use the Heroku Connect plug-in, which helps synchronize your Salesforce data with Heroku Postgres.
But migrating your Salesforce data into Heroku Postgres is just one piece of the puzzle when it comes to data integration. What about integrating this data with the rest of your IT infrastructure? That’s where ETL for Heroku comes in.
Some potential use cases of Heroku ETL are:
-
Salesforce 360: In Salesforce, a “360-degree customer view” refers to a unified, complete picture of an individual customer that contains all of the data about that customer you have available. Your Salesforce CRM may contain a great deal of customer data—but does it have everything you know about that customer? Using Heroku ETL can help you combine Salesforce CRM data with other enterprise data, such as from your ERP software or fulfillment system.
-
Replication and backups: ETL is a robust, mature way to perform data replication and back up your valuable enterprise data. With the right ETL tool, you can easily build data pipelines between Salesforce and Heroku on one end, and a data warehouse such as Amazon Redshift or Snowflake on the other end.
If you want to do ETL in Heroku, it’s a good idea to use a relational database like Heroku Postgres. ETL for NoSQL databases is more challenging due to the unconventional structure of the data (or even unstructured data). Data from NoSQL (i.e., non-relational) databases are usually stored in a data lake rather than a data warehouse, and developers often use ELT for this data instead of ETL, loading the data without first transforming it.
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
How to Get Started Using Heroku ETL
We’re assuming that you’re already set up with Heroku if you want to get started with Heroku ETL. Otherwise, check out our tutorial ”How Do I Use Heroku?”.
Once you’ve got Heroku running, the next question is: what Heroku database are you using? Your choice of Heroku database will affect the ETL tools and methodologies available to you.
-
Heroku Postgres is the most popular choice among Heroku customers, with an open-source SQL database running the PostgreSQL relational database management system.
-
If you prefer a non-relational database, you might instead choose Heroku Redis. Redis is an open-source in-memory data structure store that can be used to implement a NoSQL key-value database.
-
Heroku can also run Apache Kafka, a streaming data processing platform for sending messages between “producers” and “consumers.”
If you’re using Heroku Postgres, there’s no better Heroku ETL tool than Integrate.io. Integrate.io is a powerful, feature-rich ETL and data integration solution with more than 100 pre-built integrations, including Heroku and Salesforce.
Heroku comes with an Integrate.io add-on for sending data to and extracting data from, Heroku Postgres. In addition, Integrate.io can support any service that is compatible with a REST API, so even systems without a direct interface to Heroku Postgres can work with Integrate.io, Heroku, and Salesforce.
Once Integrate.io has your Heroku data in hand, you can send it to your choice of target location, including:
- Data warehouses such as Amazon Redshift, Snowflake, and Google BigQuery.
- Relational and non-relational databases such as Microsoft SQL Server, MySQL, and MongoDB.
- Cloud storage such as Amazon S3.
- Web services such as Google Analytics, Hubspot, and Mixpanel.
- File storage.
Related Reading: BI Tool Integrations for Heroku Postgres
Want to see Integrate.io in action working with Heroku ETL? We’ve written up a quick tutorial in our article “Heroku Data Transfer is Easy with Integrate.io,” which walks you through the various commands and configurations you’ll need to get started.
In an Integrate.io review on the business software review website G2, data engineer Dhruv K. says that Integrate.io “works great with our Salesforce and Heroku Postgres”:
“We are using Integrate.io to pull data from our Salesforce org in full and incremental fashion into our Heroku Postgres database… Integrate.io has an easy drag-and-drop component and tons of connectors. Amazing support teams which typically reply in a few hours, and have a good understanding of the tool and how the issue can be resolved.”
Still on the fence? Try Integrate.io out for yourself. Talk to the Integrate.io team to get set up with a 7-day pilot of our ETL platform.
How Integrate.io Can Help with Heroku ETL
The Integrate.io platform’s drag-and-drop interface makes it easy for anyone to set up enterprise-class data integration pipelines to their cloud data warehouse. Ready to learn how Integrate.io will fit perfectly with your Heroku environment? Get in touch with our team of data experts today for a chat about your business needs and objectives, or to start your 7-day pilot of the Integrate.io platform.