In this post, we'll discuss the Rest API Connector in Integrate.io, which is a powerful batteries-included component that can help you quickly fetch data from the services providing an HTTP-based API. Apart from the necessary features like being able to make API requests with the configurable parts (body, URL, and headers), the REST API connector provides a lot of advanced features like pagination schemes, rate limiting, JSON path expression, and more! We'll discuss all of these features in the post, then present a video example.
When to Use the REST API Connector
The REST API connector is useful when the data you're trying to fetch is available through the HTTP endpoints, and you'd want to use that data in the pipeline you're designing with Integrate.io. This is usually the case when:
- You are trying to fetch data from a third-party service like Shopify, Youtube, or Instagram.
- You have internal systems where the database layer isn't directly exposed, and the data can only be fetched from the application layer through an API.
The REST API connector is designed with all the principles of a RESTful architecture in mind, but your architecture needs not to be truly RESTful to be able to use the connector.
How to Use the REST API Connector
Using Integrate.io's REST API connector interface is intuitive and straightforward. It involves the following steps:
- Providing the authentication details (if any)
- Specifying the necessary components of an HTTP request (Headers, URL, body, and the Method type)
- Selecting the fields you want to propagate from the API response.
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
Features of Integrate.io's REST API Connector
1) Authorization
Some APIs require authentication to verify the identity of the invoker. Common strategies to do authentication of API calls are:
-
The 'Basic' HTTP Authentication Scheme: You pass the user password details in an encoded format through the Authorization header.
- API Keys: A unique string is assigned to you that needs to be passed in the API calls for the server to authenticate.
-
OAuth: Another system can generate an authentication token on behalf of the user and can use it to make an authenticated Request to the server.
The REST API connector can handle all three scenarios. Furthermore, Integrate.io integrates with some of the most frequently used third-party services on our platform (YouTube, Shopify, Instagram, etc.) so that you can simply create a connection and leave the rest to Integrate.io.
2) GET and POST Requests
The REST API connector covers both the HTTP methods (GET and POSTS) to fetch information from the server. You can provide all the necessary information for constructing the HTTP request through the interface. The interface lets you:
-
Add custom headers in the key-value format. For example, most mature APIs support the Accept header, or if the API requires you to pass a particular header with a specific value for authentication, this is where you can pass such values.
- Define the Request URL with the URL parameters and the Request body in a raw format.
Another nice feature is that you can maintain information in package level variables and use them in headers, body, and URL. For example, you can save details like username, API key in the package variable, and simply substitute them at all other places using the $var_name
syntax
. The benefit of following this style is that you'll only have to maintain information at a single place, and the package variables become your source of truth. Moreover, if the same package is later used in a workflow, you can even pass this variable from the workflow level scope to package level scope.
3) Response Parsing
The REST API connector can parse the JSON response format (the most commonly used response format) along with line-delimited JSON (where you have one JSON object in each line). The parser can also handle raw and line-delimited raw Responses that can be further used to parse other response formats. For instance, the raw parser returns the data with three fields:
-
status_code: The HTTP status code of the Response
-
body: The response body
-
headers: The response headers
If your response body is in a different format (let's say XML), then you can use Integrate.io functions like XPath or XPathToBag to extract the relevant parts in the subsequent Select components in your pipeline.
Advanced Features of Integrate.io's REST API Connector
1) Pagination Schemes
When retrieving data in large amounts, it is typical for any API to incorporate a paging mechanism. The commonly used methods for pagination are:
- Providing a page number parameter
- Providing an offset and limit parameters
- Using Link headers in the HTTP response
You'll be able to fetch data using any of these methods in the REST connector. There's also an option for an Automatic pagination scheme to detect the pagination scheme to use by the URL host/path. Here's a list of endpoints where the REST API connector can handle pagination right out of the box. If you'd like Integrate.io to support some other endpoint for pagination, please contact our support team for assistance.
2) Rate Limiting
Some APIs you use might have a daily usage limit or might only allow a certain number of requests in a specified time interval, or maybe the server resources are limited, and you might not want to overwhelm the server with lots of Requests. The REST API connector provides a provision to add sleep intervals or add a strict upper-limit to the paginated API calls. Furthermore, if you have an API quota that renews after a specified interval, you can attach this package to an Integrate.io schedule to run it periodically on its own.
3) JSON Path Expression for the Base Response
If the response type is JSON, it's a common use-case to explore only specific fields of interest from the base level response. To make such selection convenient, the REST API connector lets you specify JSON path expressions for base-level records. The path expressions support arrays, and you can select nested objects as well. For example, if the information you want to extract from the API response lies in array form within the "data" attribute, you can specify a path expression like $.data[*]
to ignore other top-level information while specifying the schema. You can try the JSONPath Online Evaluator to figure out the right JSONPath expression for your data.
4) Specifying the Schema Interactively
The REST API connector provides an interactive way to preview the data, select (or filter) relevant fields, and assign them a value type to build the schema. The connector automatically performs a test API call to get a sample of actual data for preview and automatically detects the data types of response fields (like datetime, long, boolean). The collection-level field types like MAP and BAG can further be explored/flattened in subsequent pipeline components using various Integrate.io functions.
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
REST API Connector Example
Let's go through a real-world like example illustrating some of the features of REST API connector discussed above,
Setup
I have an API that returns orders information for a hypothetical e-commerce company. The API uses the Basic Authentication scheme. To fetch the API data, I have to make a GET HTTP request on the /orders path. The API supports these two query params:
-
since
parameter in the GET request, whose value is an ISO-formatted date timestamp. The purpose of this field is only to retrieve the orders after the specified time.
-
limit
parameter to specify the number of objects to receive in every paginated call (default is 50)
For pagination, the API follows the Link header standard, where the information about the last/next/previous page are contained in the Link header. The API only allows 10 calls per minute and has a maximum of 1000 requests in a day. If I specify the Content-Type header (or the Accept header) value as application/json, the API returns the Response of the following form for the orders endpoint:
{
"orders": [
{...}, // these objects in themselves are nested.
{...}
]
}
What I want is to selectively fetch some data and write that out to a Postgres database.
Solution
The solution will include a REST API source component and a Database destination component with all the configurations done as required.
The trickiest part to understand will be incremental loading, for which we'll define a package variable called last_updated to:
ToString(ToDate(CASE WHEN
(COALESCE($_PACKAGE_LAST_SUCCESSFUL_JOB_SUBMISSION_TIMESTAMP,'')=='' OR
$full_load==1) THEN '1900-01-01T00:00:00Z' ELSE
$_PACKAGE_LAST_SUCCESSFUL_JOB_SUBMISSION_TIMESTAMP
END),'yyyy-MM-dd\\'T\\'HH:mm:ss').
This variable is used as the value for the since URL parameter of the API request to fetch the data generated only after the last run. If it seems too complex at first, you can find an explanation about the method in the docs.
Video Tutorial
Here's a video walking you through the above solution step-by-step.
For a full video transcript, see below.
Conclusion
In this post, we discussed the features of Integrate.io's REST API connector, when and how you can use them with the help of an example. If you plan to use the REST API connector in your project, refer our documentation for all the technical details or reach out to our support team for further assistance.
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
Full Video Transcript
Hi! In this video, we'll fetch data from a fictitious order API of an e-commerce company using Integrate.io's EST API connector and save that into a SQL database.
Let's begin by creating a blank dataflow package. I'll name my package ecom-orders-api.
Next, let me add a new REST API component. I'll name it ecom_api.
I'll add the Basic authentication credentials for API authentication.
Next, I'll add the URL for the API and the content-type header value.
Since the API uses link headers for pagination, I'll set the pagination scheme to "Link Headers".
I'll also specify the rate limit and the sleep interval for the paginated requests to the API so that my test server is not overwhelmed. For now, I'll set the maximum paginated requests to only 10 for quick testing and the sleep interval to 6 seconds.
Next, I'll add a base level JSON path expression to capture only the items in the "orders" array from the base response.
Now, in the "Select input fields" section, the fields are automatically detected by Integrate.io along with their data types; I have just selected the relevant ones with the help of the Data Preview pane. We can also provide alias names to the fields if we like.
Okay, we're mostly done with the API part. Let's add the database component now.
I have my Postgres database already configured in my Integrate.io connections so I'll use that right away.
I'll set the table name to just integrate.io, in the operation type I'll select "Merge with existing data using update and insert" since I"d want to load data incrementally in future using the "since" parameter of the API. We'll see this shortly, but first, let's specify the target table columns.
I'll simply Auto-fill and specify the 'id' column as the primary key.
Okay, now let's switch to the REST API component for a second. I'll substitute variables with some package variables so that it's easier to maintain them at a single place.
The URL has three variables; api_url which is the base url of my api, limit_size which is the number of records a single page will return, and finally the "since" parameter which we'll use for incremental loading of data when running the package frequently.
Now let's add values for all these variables,
The value of last_updated variables is constructed using some Integrate.io functions, it basically contains the last run timestamp of the package. You can read more about this in the "moving data incrementally" docs.
Alright, everything is set! Let's save and validate.
Okay now, let's run the package on a cluster.
We can track the progress of the job in the jobs dashboard. When it's done, I can check my database to see if the records are updated.
As you can see the records were updated in a new table and everything looks perfect.
This is how you can use the Integrate.io REST API connector to fetch and save your data. Thanks for watching!