Skip to main content
Version: Current

How to ingest CSV files

Using Data Pipeline to ingest data

When you need to take in data from an external source, the first option you should consider is to use the platform's Data Pipeline component.

This enables you to define the data source, transform it to match fields in your tables, then insert or update the data in the specified table. This can be set up to handle streaming data or static files.

A good practical example is to be able to take Copp Clark holiday data that arrives in CSV files.

info

You can also use the Data Pipeline component to send out data to external systems. This works the same way in reverse, taking data from tables in your Genesis application and transforming it into the format required by the target system.

Here we are going focus on incoming data in static files.

Transforming data

To create a Data Pipeline for ingesting data, you create a -project-name_data-pipeline.kts file to define a pipeline that maps data from an external source (database, file) to tables in your application.

Each Data Pipeline must define three things:

  • source specifies the location of the incoming data
  • operator parses the source and maps the data to fields in the application's database; typically, you are looking to transform column headings on the incoming files into the relevant fields in the target table in your database
  • sink is the destination of the transformed data within your application. This can be a file or a queue, for example. You can use this to further update the data and store it somewhere else - such as a different database or a log

Examples

To show how to use Data Pipeline to ingest data, we have provided an example application. This provides different scenarios, based on incoming data in Copp Clark holiday files.

These start at the most basic case and cover increasing levels of complexity.

Throughout, the code in the application includes detailed comments explaining the steps.

Download, view and run

The examples are within one complete example application, which includes a front end so that you can run and see the data.

You can clone the repo to see the code - which includes comments at the key points to highlight what is being specified - and run the application.

Setting up a pipeline in your own app

How to create a similar app

To find out how to create an app similar to our example, go to the readme file in the repository for instructions.

When you want to create a Data Pipeline in your own application, there are other things you need to do in addition to creating the pipeline files themselves.

Processes.xml

Every process in your app must be configured in the application's -processes.xml file. See the howto-csv-ingress-processes.xml file for a simple example.

At minimum, you need to:

For example:

<processes>
<process name="MYAPP_MANAGER">
...
<module>genesis-pal-datapipeline</module>
<package>global.genesis.pipeline</package>
<script>myapp-simple-data-pipelines.kts</script>
...
</process>
</processes>

We would then add code into the data pipeline file like this:

pipelines {
pipeline("MY_PIPELINE") {
source(camelSource { location = getDefaultLocalFileCamelLocation(systemDefinition.get("FolderLocation").get(),"FileName") })
.map { LOG.info("Triggered MY_PIPELINE"); it}
.split(csvRawDecoder())
.map { row ->
...
}
.sink(dbSink())
.onCompletion
}
}

System definition

Every application also has a main configuration file called genesis-system-definition.kts.

Check this file and make sure that the application knows where to source the files from.

Testing

info

Go to our Testing page for details of our testing tools and the dependencies that you need to declare.

To test your auth set-up on your app:

  • Details to follow shortly. Thank you for your patience.

Full technical details

You can find full technical details of how to use the Data Pipeline in our reference documentation.