OCI Data Integration and Oracle Data Safe— Masking sensitive data between environments

Eloi Lopes
5 min readJul 21, 2022

The main goal of this article is to use OCI Data Integration and Oracle Data Safe to load and mask sensitive data between environments.

OCI Data Integration is a fully managed, multi-tenant service that helps data engineers and ETL developers with common extract, transform, and load (ETL) tasks such as ingesting data from a variety of data assets; cleansing, transforming.

Oracle Data Safe is a unified control center for your Oracle databases which helps you understand the sensitivity of your data, evaluate risks to data, mask sensitive data, implement and monitor security controls, assess user security, monitor user activity, and address data security compliance requirements.

Prerequisites

  • Get familiarized with Oracle Data Safe, check out this live lab;
  • Get familiarized with OCI Data Integration, check out this live lab;
  • 2 Autonomous Databases;

The use case I’m reproducing in this article is to load data from ATP (Production) and insert into ATP (Development), but protecting sensitive data.

The Pipeline has been developed as can be easily re-used for any table and avoid dependencies with data safe objects. As you will see we create all the objects with need from data safe in real time, in that way you don’t need to hard-coded the Rest APIs.

Let me start by explaining the main Pipeline:

Main Pipeline
  • Step 1 — Data Loader task, where we can load one or multiple tables from source to target. More info about Data Loader tasks;
  • Step 2 — Call Data Safe Rest API to create Sensitive Data Model;
  • Step 3 — Expression to get the sensitive data model ID to be pass to step 4 as a parameter;
  • Step 4 — Call Data Safe Rest API to create Masking Policy;
  • Step 5 — Expression to get the masking policy ID to be pass to a new Pipeline in step 6;
  • Step 6 — Call the Pipeline Mask Target — I had to include an auxiliary pipeline since I need to pass the same parameter twice to two different Rest Tasks.
Auxiliary Pipeline to mask target
  • Step 1 — This pipeline receives as input parameter the Mask Policy ID to be used in next 2 steps;
  • Step 2 — Call Data Safe Rest API to add columns to the specified masking policy from the associated sensitive data model ;
  • Step 3 — Call Data Safe Rest API to Mask Data in Target;
  • Step 4 — Waits for data masking process, to finish the pipeline.

Rest Tasks APIs configuration

In the first pipeline in step 2 we call the rest API to create a sensitive model. Let’s see how to configure it:

HTTP Method: POST

URL: https://datasafe.us-ashburn-1.oci.oraclecloud.com/20181201/sensitiveDataModels

This rest API must contain some parameters in the body:

{"compartmentId": "ocid1.compartment.oc1..aaaaaaaaatiwok3gmtljmci......", "targetId": "ocid1.datasafetargetdatabase.oc1.iad.abuwcljsjmgmda......", "displayName": "ExampleSD", "schemasForDiscovery": ["HCM1","ADMIN"], "isIncludeAllSensitiveTypes": "true", "isSampleDataCollectionEnabled": "true"}

The target ID, it is the database you registered for data safe. You can find it here:

The schemasForDiscovery are the schemas you want to include in the sensitive model.

In the step 3 we get sensitive ID model from the response payload to be used in masking policy:

CONCAT(CONCAT('"', CAST(json_path(CREATESENSITIVEDATAMODEL.SYS.RESPONSE_PAYLOAD, '$.id') AS String)), '"')

Step 4 — Rest API Masking Policy

HTTP Method: POST

URL: https://datasafe.us-ashburn-1.oci.oraclecloud.com/20181201/sensitiveDataModels

This rest API must contain some parameters in the body:

{"compartmentId": "ocid1.compartment.oc1..aaaaaaaaatiwok3gmtljmciocx7.......a", "columnSource": { "columnSource": "SENSITIVE_DATA_MODEL", "sensitiveDataModelId": ${MODEL_ID}}, "displayName": "ExampleMaskPolicy"}

I created a parameter (MODEL_ID) to receive the value from step 3 (the sensitive model ID).

Step 5 — Expression to get the masking policy ID

CAST(json_path(MASKINGPOLICIES.SYS.RESPONSE_PAYLOAD, ‘$.id’) AS String)

Let’s go to the step 6 that is a pipeline task.

Inside of this pipeline we have 4 steps:

Step 1 — Create parameter in expression

This parameter will receive the value from the main pipeline.

Step 2 — Rest API add masking columns

HTTP Method: POST

URL: https://datasafe.us-ashburn-1.oci.oraclecloud.com/20181201/maskingPolicies/${MASKING_POLICY_ID}/actions/addMaskingColumnsFromSdm

Request:

{}

The API uses a parameter with the value from Step 1 (mask id policy).

Since it is a post method in the request, just add an empty json to avoid any error.

Assign the value from expression to the task. Go to configuration and select “Assign a parameter”.

Step 3— Masking data

HTTP Method: POST

URL: https://datasafe.us-ashburn-1.oci.oraclecloud.com/20181201/maskingPolicies/${MASKING_POLICY_ID}/actions/mask

Request:

{"targetId": "ocid1.datasafetargetdatabase.oc1.iad.abuwcljsj......."}

The API uses a parameter with the value from Step 1 (mask id policy).

Assign the value from expression to the task. Go to configuration and select “Assign a parameter”.

Step 4 — Wait for masking data request

HTTP Method: GET

URL: https://datasafe.us-ashburn-1.oci.oraclecloud.com/20181201/workRequests?compartmentId=ocid1.compartment.oc1..aaaaaaaaatiwok3gmt&limit=1&operationType=MASKING_JOB&sortOrder=DESC&sortBy=STARTTIME

In order to have the task running and not finish immediately once receive the response we have to configure a polling condition and a success condition:

The task will be running until the status is on response payload is different from SUCCEEDED, once the status changes to SUCCEEDED the task is finished.

Result:

Data in source database
Data in target database

I hope this can help you. If you have any doubt, please reach out to me through LinkedIn or Medium.

References

--

--

Eloi Lopes

Opinions expressed are solely my own and do not express the views or opinions of my employer Oracle. https://www.linkedin.com/in/eloilopes/