The Workflow feature is an advanced offering in the Integrate.io platform. For uses who want to combine various dataflow packages via a user-friendly interface, Workflows allow you to arrange your inter-dependent ETL tasks with some conditional logic.
In this post, we will cover what Workflows are and how they can help you manage inter-dependent ETL tasks.
- What are Integrate.io Workflows?
- Workflow Main Features
- Example Workflow Use Case
What are Integrate.io Workflows?
Conceptually, a "Workflow" is a special type of package that lets you arrange tasks in the desired order of execution. A Workflow "task" can be either a pre-existing dataflow package or a SQL query operation. The drag-and-drop interface lets you arrange your tasks in the desired order of execution, making Integrate.io Workflows simple to configure while having a powerful impact on your data.
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
Workflow Main Features
1) Trigger Different Execution Paths Based on Execution Status
In a Workflow, you can add and connect two tasks based on the execution relationship between them. In other words, you can execute different paths based on the execution status of the preceding tasks. Here's a simple example:
The execution statuses supported by Integrate.io are "on success," "on failure," or "on completion." When you run the above Workflow, dataflow_task
(which runs a dataflow package) will be run first. The post_completion_query
would be executed once the task completes, and either post_failure_query
or post_success_query
will be executed depending upon the success or failure of the task.
2) Add Execution Constraints Based on the Execution Status of Preceding Tasks
In Workflows, you can add pre-conditional logic to any task based on the preceding tasks' execution status.
You can add any/all execution constraints (logical ORs and ANDs) to the tasks. In the Workflow defined above, dataflow_task_2
will only execute if both the dataflow_task_1
and sql_query_1
run successfully. If we had chosen the OR constraint for dataflow_task_2
, it would have run if any of dataflow_task_1
or sql_query_1
was executed successfully.
Features 1 and 2 combined can model complex execution profiles where we want our packages to be executed in a specific order (A before B, B, and C before D, etc.), and their execution is dependent on the success or completion of each other (run B if A fails, run D if both B and C succeed, etc.).
3) Pass Dynamic Variables to be Used in Tasks
Just like dataflow packages, the workflow containing them has their own variable scope as well. If the variable is defined at the dataflow package level, it's value gets used; otherwise, Integrate.io will lookup for the variable at the workflow level. These workflow variables can be provided during Workflow creation and can be overridden when executing them through a job or a schedule.
SQL query tasks in a Workflow can return values (query results), which can be saved to Workflow variables containing a scalar value. This value can then be passed to successor query tasks or dataflow package tasks.
In the Workflow above, the first SQL query task checks for any active promotions that are not yet published. If there's any such promotion, its identifier is stored in the variable active_promotion_id
(by default, the first column value in the first row of returned results is stored). This identifier is then passed to the dataflow package variable to the publish_promo
package. Finally, if the package execution is successful, there's a query task to mark the promotion as published in the database. This query uses the same identifier to locate the promotion in the database.
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
Example Workflow Use Case
An e-commerce company that uses Integrate.io runs periodic marketing campaigns and promotions to improve their business. To kickstart a campaign, they must run a few preparation tasks, and they may want to run some reconciliation tasks after it is over.
Here's one possible way they could define their Workflow:
Apart from the useful Workflow features discussed previously, here are some of the other ways in which Integrate.io helped the e-commerce company:
- The tasks mentioned above can be self-contained dataflow packages in Integrate.io. This leads to a modular and maintainable set up where you can distribute responsibilities efficiently to different teams.
- The Workflow arranges these tasks to be run in the correct order while abstracting the implementation complexities. It provides a bird's eye view of how different systems will interact.
- They are versioned so you can switch to previous versions in case of modeling issues.
- You can monitor the execution status of workflows in the Integrate.io dashboard, get notifications on failures, and change their schedule.
- Maintaining such set up on your own can be very time/resource consuming. It involves managing Database connections, maintaining cron expressions and multiple processes, modeling the dependencies, adding an alert system, etc. Integrate.io offers all this already through an easy-to-use low-code interface that can blend into your existing workflows.
Here's a short video that implements a simplified version of the above use-case:
Conclusion
A simple but powerful feature, workflows allow you to combine various dataflows from within the user-friendly Integrate.io interface. For more information on how to build workflows, refer to our Creating a Workflow document or contact our support team for assistance.
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer