Along with the functionality to make HTTP requests, Integrate.io provides various Curl functions and advanced features that can be beneficial in certain use cases. This article covers the Curl functions and features in addition to providing a step-by-step demonstration.
- When to Use the Integrate.io Curl Feature
- How to Use the Integrate.io Curl Feature
- Curl Functions and Use Cases
- Table: Integrate.io Curl Functions and Use Cases
- Step-by-Step Example
- Final Thoughts
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
When To Use the Integrate.io Curl Feature
The Curl feature in Integrate.io lets you perform an HTTP request and use the returned response in your data pipeline. You can use the Curl feature to do any of the following:
How To Use the Integrate.io Curl Feature
The Curl feature is available through a set of functions that can be used inside any field expression. Here's an example:
In the screenshot above, base_url
is a package variable that stores the host address of the API. We invoke a function called Curl with appropriate arguments to make the HTTP GET request and store the returned results aliased by the name response.
The response can be further parsed using Integrate.io functions. For example, if the endpoint returns JSON string of the form:
{
"life_time_value": 400,
"is_active": true
}
We can use the expression JsonStringToMap(response)#'life_time_value'
to extract the life_time_value
field out of the JSON response.
Curl Functions and Use Cases
In this section, we discuss the various functions of the Curl feature in detail as well as explore some of their use cases.
Simple HTTP Requests
The Curl function provides all the basic functionalities to prepare and execute an HTTP request. The function signature looks something like this:
Curl(url, method[, headers[, request_body[, username[, password]]]])
The first two arguments of this function (url and method) are mandatory whereas the other parameters are optional. Here's an example using all the arguments:
Curl(
'https://test-app.com/customers/',
'POST',
'{"Accept":"text/json"}',
'{"name":"Satwik","age":23}',
'some_username','some_password'
)
-
The url argument needs to be specified with the protocol (both http and https are supported).
- The HTTP method can be any of GET, PUT, POST, and DELETE.
- The headers can be specified in a key-value format similar to a JSON object.
- The request body can be passed in a raw string format.
-
Finally, you can specify username and password for Basic authentication.
The returned value of the Curl function is a map object consisting of three keys:
- status - HTTP response code
- body - The response body in raw string format
- headers - A map object of response headers
The Curl function covers all the basic functionality you'd need to make a simple HTTP request. The following sections cover more advanced functions that can be used effectively for certain use-cases.
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
Making Paginated Requests
The CurlWithPagination function enables you to make multiple HTTP requests for paginated APIs and collect the responses from the server. This is particularly useful when the API you are fetching data from has lots of data that is impossible to be transferred without pagination.
Here's the function signature of the CurlWithPagination function:
CurlWithPagination(url, method[, headers[, request_body[, username[, password[, pagination_scheme[, sleep_interval[, max_pages]]]]]]])
The function supports all the arguments of the basic Curl function along with some optional arguments for making the paginated requests in the desired way. Here's an example request:
CurlWithPagination(
'https://test-app.com/customers',
'GET',
'', '', '', '',
'LinkHeader'.
100,
10000)
The above snippet makes a maximum of 10,000 GET requests to the specified URL every 100ms. The page details are fetched from the Link headers in the response.
The function supports the following pagination scheme values:
-
Automatic (default): Integrate.io will detect the pagination method by domain. Here's the list of domains with pagination support. If you'd like Integrate.io to support some other endpoint for pagination, feel free to reach out to our support.
-
NoPagination: Makes the function behave similarly to the Curl function.
-
LinkHeader: Integrate.io fetches pages as per the specifications in RFC 5988.
Unlike the Curl function, the returned value of CurlWithPagination function is a bag of map objects (one for each paginated request) instead of a single map object. You can collate these map objects in the desired format using various Integrate.io functions.
Polling the API
The CurlPoll function enables you to make HTTP requests in a continuous fashion until either a regular expression is matched or a specified timeout limit is reached. This function is particularly useful with asynchronous tasks where you need to wait for the status to change or a response to arrive from the endpoint. For example, you might have an endpoint to trigger the summary report for your orders, and the data processing for the endpoint happens asynchronously. Such APIs usually have an endpoint to fetch the status / final response which you can poll frequently.
Here's the function signature:
CurlPoll(regex_string,interval, timeout, url, method[, headers[, request_body[, username[, password]]]])
The function supports all the arguments of the basic Curl function along with three mandatory arguments (regex_string, interval, timeout) for specifying the polling conditions. Here's an example snippet:
CurlPoll(
'status="(completed)',
1000,
60000,
'https://test-app.com/customers/5/upgrade_plan',
'GET',
'{"Accept":"text/json"}',
'','','')
The above function will make a GET request to the specified URL every 1000ms (1 second). The polling will terminate when either the function response returnsstatus=completed or after 60000ms (1 minute) have elapsed. The returned value of the CurlPoll function is similar to the Curl function and contains the entire details (status, body, and headers) for the response that matched the regular expression or the last response in case of timeout.
Receiving a Binary Response
The BinaryCurl function is used to make HTTP requests that return a response in binary format. This function is particularly useful when the API returns file(s). For example, an API might return a compressed .gz file, which you can then write to disk using the File Storage Destination component.
The signature for BinaryCurl function is exactly the same as the Curl function:
BinaryCurl(url, method[, headers[, request_body[, username[, password]]]])
Here's an example request:
BinaryCurl(
'https://test-app.com/customers/5/archive.gz',
'GET',
'', '',
'some_username','some_password'
)
The returned value of BinaryCurl is similar to that of the Curl function. However, the body key in the returned value has binary data.
Authentication Using Integrate.io Connections
The CCurl method is used to make authenticated requests using an Integrate.io connection. This function can help you simplify authentication for services for which an Integrate.io connection can be defined. For example, if you want to make HTTP requests to your Shopify store to create a new order, you can define a Shopify connection and then use it to handle authentication on its own rather than dealing with it yourself in the Curl function.
Here's the signature of CCurl:
CCurl(url, [method, [headers, [request_body, [connection_id]]]])
Note that unlike the Curl method, this function doesn't have username and password arguments, instead, it takes the Integrate.io connection ID for authentication.
Here's an example request:
CCurl(
'http://test-app.com/customers/',
'POST','{"Accept":"text/json"}',
'{"name":"satwik","age":23}',
'my_connection_12'
)
In this example, my_connection_12
is the connection ID which you can find from connections dashboard. The returned value of CCurl function is exactly similar to that of Curl function.
Other Functions
There are few more functions that combine previously discussed features and allow authentication with Integrate.io connection:
-
CCurlWithPagination: Used to make paginated requests while using an Integrate.io connection for authentication.
-
CCurlPoll: Used to make polling requests while using an Integrate.io connection for authentication.
-
BinaryCCurl: Used to make requests that return Binary a response while using an Integrate.io connection for authentication.
One minor thing to note is that these functions that use Integrate.io connections for authentication return null when validating a package with variables that use the function or in X-console. However, it works as expected during the actual job runtime.
Table: Integrate.io Curl Functions and Use Cases
To help you pick the right function for your use-case, refer to the following table:
|
Make HTTP request
|
Basic Auth
|
Auth using Integrate.io connections
|
Retrieve Binary Data
|
Handle Paginated data
|
Poll response
|
Curl
|
✅
|
✅
|
❌
|
❌
|
❌
|
❌
|
BinaryCurl
|
✅
|
✅
|
❌
|
✅
|
❌
|
❌
|
CurlWithPagination
|
✅
|
✅
|
❌
|
❌
|
✅
|
❌
|
CurlPoll
|
✅
|
✅
|
❌
|
❌
|
❌
|
✅
|
CCurl
|
✅
|
❌
|
✅
|
❌
|
❌
|
❌
|
BinaryCCurl
|
✅
|
❌
|
✅
|
✅
|
❌
|
❌
|
CCurlWithPagination
|
✅
|
❌
|
✅
|
❌
|
✅
|
❌
|
CCurlPoll
|
✅
|
❌
|
✅
|
❌
|
❌
|
✅
|
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
Step-By-Step Example
Let's consider a common use-case of migrating data to a new service through API. Customer data exists in a database table, and that data needs to be pushed to an external analytics service after some preprocessing. The preprocessing step involves calling an internal API endpoint (/customer/summary/<int:customer_id>) for enriching the existing data. This API returns JSON response of the following form:
{
"email": satwik@example.org,
"life_time_value": 546,
"is_active": true
... // More such fields
}
We are required to migrate only the customers that are "active". After enrichment, the data needs to be passed to the external service through a REST API to populate customer data. We'll have to use the Curl function to invoke that API passing the data in JSON format. Finally, it is required to save the status of migration in the database for audit purposes.
Solution
Final Thoughts
Integrate.io's Curl function provides multiple options for use with third-party APIs in a variety of scenarios. By referring to the table provided and accessing the video demonstration, you know have a better understanding of Integrate.io's Curl feature and its possibilities.
To see how Integrate.io's Curl function can apply to your specific use case, schedule an introductory call with our support team to see a demo and get a 14-day risk-free trial of the Integrate.io platform.
Video Transcript
Hi, in this video I'll use Integrate.io's curl function to migrate Customer's data to an external API after some preprocessing.
Let's begin by creating a dataflow package called migrate_customers_data.
Next, let's add a Database source component to fetch the Customer's data from the database.
I'll select the database connection, provide the table name, and then use the data preview to select the fields that I want. I'll fix names or data types wherever required.
Next, I'll add a select component called enrich_data. Here, I'll choose Autofill to get all the fields, additionally, I'll call an internal API to get additional data for each customer.
I'll add the expression to fetch the data from an API. I have used a SPRINTF function to create the URL, the host of the API is present in the internal_api_host package variable which I'll define later along with other package variables. I've given the API response body the alias of enrichment_response.
Next I'll add a select component to extract relevant fields from the enrichment_response.
Now, I want to filter only the customers that are active with the help of a filter component.
Okay, now, I have to make a POST request to the external API to actually transfer the data, and save the status of the request.
The expression for the POST request is similar to that of the enrichment API call, the only difference is I've to create a JSON body with the help of TOMAP and ToJson functions since this a POST request. I'm aliasing the status of the response as status.
Finally, I want to write this status to a database table for logging purposes. I'll just add a database destination component to do that.
I'll add table name, set operation type as update and write, select ID as primary key, and save it.
Alright, let's save and validate our pipeline.
Ah right, we haven't defined the package variables so far. Let's do that right away.
Awesome, validation is complete, let's run a job and check if everything works.
Let's wait for the job to finish.
Okay, it's done, as you can see from the dashboard 9 records were written to the database, let's verify if that's the case in our database too.
Yes, 9 rows. So everything looks perfect, thanks for watching with a quick video about Integrate.io's curl function!