Automate OCI Data Integration Pipelines with parameters
During the development of ETL processes one of the main goals is to have your processes fully automated and avoid manual intervention.
In this case we are going to use OCI Data Integration parameters and Pl/SQL SDK to achieve this automation.
Which problem do we want to solve?
We have a customer that needs to receive as parameters the source connection, filter condition like country code and target table name.
The main requirement is to have multiple pipelines running at same time without duplicating data or loading the same table twice. For that we need to use one unique ID to make sure we know which tables were loaded. This unique ID will be the task run key.
How can we solve this?
This is just an example for one pipeline that can parameterized and run it in parallel.
1 — The first task (LOAD_ETL_PARAMETERS) is going to receive 2 parameters (TABLE_NAME and CODE). The source table name and country code. This procedure returns as output the target table name and country code.
It’s very important to set the value of these parameters with pipeline parameters. You must add 2 new parameters and select the option “Assign a parameter”
2 — The second task (UPDATE_ETL_PARAMETERS), updates the control table with the table name, task run key, code and status for the data flow will be executed in next step. The table name and code coming from previous task. The task run key is using the system parameters and the status is hardcoded with “RUNNING”.
3 — The Data Flow (integration task) will receive the 2 parameters from previous operator (target table name and code). Create a new parameter (PAR_SOURCE_PIPELINE) and assign it to SOURCE_PARAM.
4 — The last step (UPDATE_LOAD_STATUS), we update the control table. To do that we list all run tasks and search for the task run key and for the status of the pipeline and we update table with all details from previous data flow.
How can we start the pipeline?
We can use the PL/SQL SDK for OCI Data Integration to starting the pipeline. These parameters can be set dynamically.
As you can see, we are defining the values into parameters we created in our main pipeline.
You should see 2 pipelines executions:
The code for di_task_execute_params and create credential can be found on David Allan’s blog.
All code is available in my GitHub page.
I hope this helps. If you have any doubt, please reach out to me through LinkedIn or Medium.