There are a few standard structured data formats and discussions galore on which of them is more advantageous. Within Integrate.io, users are able to process JSON and XML data formats with ease, and this article shares an example showing the functions that facilitate processing XML on Integrate.io.
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
Overview and Resources
For a demonstration, here is the link for the sample XML file we will be processing https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ms762271(v=vs.85)
The file shows XML structure as in the image below:
The Integrate.io functions XPath and XPathToBag are key to the processing of this data. Let's examine these with a data pipeline.
Setting up the Integrate.io Data Pipeline
The following list explains the different components of the Integrate.io pipeline in the order:
1. XML_Source: The XML file from the link shared above is copied onto a cloud storage location and read using the File Storage Source Component
2. XPathToBag: This step calls the XPathToBag function to match the XPath '/catalog/book'. This fetches all the books under <catalog> </catalog> in a Bag datatype. For example, XPathToBag(data,'/catalog/book')
3. Flatten_Books: Uses the Flatten() function to get the books as individual records each record of the structure as
4. XPath: In this step using the XPath function, the individual elements of the book structure can be retrieved. Here is a peek into the component with the XPath set up for the above <book> </book> structure
For additional reference on XPath and examples, refer to an XPath evaluator such as freeformatter.com
5. Destination: The individual fields processed from the XML are stored in a destination, in this example, it is a BigQuery table.
The following image depicts some example records from the output:
Parsing the XML from a file or an API response into a tabular structure would be key for having data lookup, and blending with other datasets could facilitate further data analysis.
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
Summary
There are several enterprise systems that consume and output XML data, and as a trusted document-based information transfer, XML based files and APIs can come up often as use cases. Stop by and explore the functionality for processing the structured data formats on Integrate.io. For more individualized instruction and information, contact us to book a risk-free demo.