OCI Data Integration and Oracle Data Safe— Masking sensitive data between environments
The main goal of this article is to use OCI Data Integration and Oracle Data Safe to load and mask sensitive data between environments.
OCI Data Integration is a fully managed, multi-tenant service that helps data engineers and ETL developers with common extract, transform, and load (ETL) tasks such as ingesting data from a variety of data assets; cleansing, transforming.
Oracle Data Safe is a unified control center for your Oracle databases which helps you understand the sensitivity of your data, evaluate risks to data, mask sensitive data, implement and monitor security controls, assess user security, monitor user activity, and address data security compliance requirements.
Prerequisites
- Get familiarized with Oracle Data Safe, check out this live lab;
- Get familiarized with OCI Data Integration, check out this live lab;
- 2 Autonomous Databases;
The use case I’m reproducing in this article is to load data from ATP (Production) and insert into ATP (Development), but protecting sensitive data.
The Pipeline has been developed as can be easily re-used for any table and avoid dependencies with data safe objects. As you will see we create all the objects with need from data safe in real time, in that way you don’t need to hard-coded the Rest APIs.
Let me start by explaining the main Pipeline:
- Step 1 — Data Loader task, where we can load one or multiple tables from source to target. More info about Data Loader tasks;
- Step 2 — Call Data Safe Rest API to create Sensitive Data Model;
- Step 3 — Expression to get the sensitive data model ID to be pass to step 4 as a parameter;
- Step 4 — Call Data Safe Rest API to create Masking Policy;
- Step 5 — Expression to get the masking policy ID to be pass to a new Pipeline in step 6;
- Step 6 — Call the Pipeline Mask Target — I had to include an auxiliary pipeline since I need to pass the same parameter twice to two different Rest Tasks.
- Step 1 — This pipeline receives as input parameter the Mask Policy ID to be used in next 2 steps;
- Step 2 — Call Data Safe Rest API to add columns to the specified masking policy from the associated sensitive data model ;
- Step 3 — Call Data Safe Rest API to Mask Data in Target;
- Step 4 — Waits for data masking process, to finish the pipeline.
Rest Tasks APIs configuration
In the first pipeline in step 2 we call the rest API to create a sensitive model. Let’s see how to configure it:
HTTP Method: POST
URL: https://datasafe.us-ashburn-1.oci.oraclecloud.com/20181201/sensitiveDataModels
This rest API must contain some parameters in the body:
{"compartmentId": "ocid1.compartment.oc1..aaaaaaaaatiwok3gmtljmci......", "targetId": "ocid1.datasafetargetdatabase.oc1.iad.abuwcljsjmgmda......", "displayName": "ExampleSD", "schemasForDiscovery": ["HCM1","ADMIN"], "isIncludeAllSensitiveTypes": "true", "isSampleDataCollectionEnabled": "true"}
The target ID, it is the database you registered for data safe. You can find it here:
The schemasForDiscovery are the schemas you want to include in the sensitive model.
In the step 3 we get sensitive ID model from the response payload to be used in masking policy:
CONCAT(CONCAT('"', CAST(json_path(CREATESENSITIVEDATAMODEL.SYS.RESPONSE_PAYLOAD, '$.id') AS String)), '"')
Step 4 — Rest API Masking Policy
HTTP Method: POST
URL: https://datasafe.us-ashburn-1.oci.oraclecloud.com/20181201/sensitiveDataModels
This rest API must contain some parameters in the body:
{"compartmentId": "ocid1.compartment.oc1..aaaaaaaaatiwok3gmtljmciocx7.......a", "columnSource": { "columnSource": "SENSITIVE_DATA_MODEL", "sensitiveDataModelId": ${MODEL_ID}}, "displayName": "ExampleMaskPolicy"}
I created a parameter (MODEL_ID) to receive the value from step 3 (the sensitive model ID).
Step 5 — Expression to get the masking policy ID
CAST(json_path(MASKINGPOLICIES.SYS.RESPONSE_PAYLOAD, ‘$.id’) AS String)
Let’s go to the step 6 that is a pipeline task.
Inside of this pipeline we have 4 steps:
Step 1 — Create parameter in expression
This parameter will receive the value from the main pipeline.
Step 2 — Rest API add masking columns
HTTP Method: POST
Request:
{}
The API uses a parameter with the value from Step 1 (mask id policy).
Since it is a post method in the request, just add an empty json to avoid any error.
Assign the value from expression to the task. Go to configuration and select “Assign a parameter”.
Step 3— Masking data
HTTP Method: POST
Request:
{"targetId": "ocid1.datasafetargetdatabase.oc1.iad.abuwcljsj......."}
The API uses a parameter with the value from Step 1 (mask id policy).
Assign the value from expression to the task. Go to configuration and select “Assign a parameter”.
Step 4 — Wait for masking data request
HTTP Method: GET
In order to have the task running and not finish immediately once receive the response we have to configure a polling condition and a success condition:
The task will be running until the status is on response payload is different from SUCCEEDED, once the status changes to SUCCEEDED the task is finished.
Result:
I hope this can help you. If you have any doubt, please reach out to me through LinkedIn or Medium.