Sunday, January 08, 2017

DRM Data Loading Automation using ODI

The Oracle Hyperion Data Relationship Management application is a pretty flexible tool and most of the activities that can be done manually can be automated (using ETL tools like ODI, Informatica PowerCenter, etc). Recently I was presented with a business scenario by one of my readers which is pretty interesting yet tricky. Yes it involves a request to automating a manual process as you might have already guessed.

The requirement goes as below:

The source systems are DWH and HRMS. Hence, the data/master data which we will get would be through staging tables to DRM. The source system would send the full data every month on month to the staging table and from where DRM has to pick. The comparison process in table (interim changes) first should go to the business user in an email and once the reply is ok, it should be incorporated in DRM and then publish to down stream applications.

To proceed with this activity, first and foremost we need to be familiar with DRM action scripts - which will involve generating Add, AddInsert and Move scripts. In addition to that, if we have an ETL tool in our environment (like ODI, Informatica PowerCenter, etc) it's beneficial - else we can also use SQL to achieve our purpose as we will see next.

Approach 1:

Since the source systems DWH and HRMS are sending the full dataset every month - there needs to be a mechanism to detect the changes arriving at the staging tables in DRM end. This is where the Changed Data Capture (CDC) feature of ODI can come in handy, or else if we are using any other equivalent tool this feature will come in handy. This changed data capture can also be achieved using standard query languages so that the records that have changed/inserted can be flagged (say U/I) accordingly.

This set of records can be sent as an email (with attachment if data volume is too big) to the respective business users for approval. Let's say we have a field called "APPROVED" in our table and all these records are defaulted to N. Once the approval comes from the users - these fields need to be set to Y for the approved changes manually by the IT team. Until then these records will sit idly in the staging table without propagating to DRM. This manual flagging step unfortunately cannot be avoided since there is no integration yet between DRM and email server.

Now, once the records (say 8 out of 10 got approved, so we have 8 Y's and 2 N's) are flagged Y - they will be considered for the next steps of processing. The next steps are to create the Action Scripts for DRM to create the Add and Move scripts. Assuming the DRM version and hierarchy are already existing - the scripts formatting once done as per the required format it will be ready to be loaded to DRM. If the target txt/csv Action Script file is not in the same server where DRM can read it from, it has to be further SFTP ed to that location. Or else we can use SQL Loader to fetch the data from the Action Script table to the DRM server and then schedule Windows Batch Scripts to load the action script. There are quite a few ways this can be done and all depends on the environment setup we have in place.

Approach 2:

If we want to keep the DRM staging table untouched by the manual update of Y and N flags - we can fine tune our approach to a control-file based design where the member names that are approved, will be kept in a file - that will be used for lookup. So say N1 and N2 are approved, the file will contain the below data:


Here X denotes the batch id (yes, we need to create a "BATCH_ID" field in the DRM staging table which should be updated with a sequence generator type of ID for every record flagged U/I for that specific load run) - which should be unique for every load.

Let's see why we need this BATCH_ID field. If tomorrow our load runs and detects a change for the same member N1, and it finds N1 already in the file, it will get processed without waiting for approval. So since the batch id will change in the DRM staging table for every run - we do not have chances to unapproved changes of the same member to flow through since the next steps will check if both the BATCH_ID and member name are present in the control file. Instead of a file, this can also be made into a small table with only 2 columns which the IT team will have update/delete privilege to manage it daily.

This approach will provide a more safer control to the IT team and avoid the risk of accidentally modifying sensitive application tables or objects. Anyways working with DRM is always extremely a cautious activity day-in and day-out where a simple typo can cause widespread implications to multiple downstream systems - so minimizing risk and avoiding manual errors is preferred to be a part of any design.

How do you prefer to automate your DRM data loading processes?

Friday, November 04, 2016

Oracle BI Apps dynamic hierarchy filtering in ODI incremental load plan

Oracle Business Intelligence Applications often sources data from various systems and it is often required to restrict or allow various dimension information during the loading processes. There are a variety of ways this can be done, let's take a walk through of few possible mechanisms of how this can be achieved in the Oracle Data Integrator incremental load plan with the maximum amount of flexibility.

Our source system can be a number of applications like Oracle E-business Suite, Oracle Data Relationship Management, custom data warehouses, custom Essbase cubes, etc. To process the transaction data from each of these source systems, we need to first process the dimensional data from them.

While loading to Oracle BI Apps, there a few integration points which need to be closely monitored. The first area to focus on is the table W_GL_SEGMENT_DH (loaded by the objects in the folder SIL_GLSegmentDimensionHierarchy) which holds all the hierarchy information. This table gets loaded from the table W_GL_SEGMENT_DHS, so we can put a filter based on the top node of the hierarchy tree we want to exclude - so our filter will look like:
    WHERE W_GL_SEGMENT_DHS.HIERARCHY_ID NOT IN ('<<top node name>>')
This will ensure that all the members in the hierarchy under the node <<top node name>> are excluded while loading to W_GL_SEGMENT_DH.

While sourcing from Oracle E-business Suite, if we are using custom interfaces to source the hierarchy information, it becomes essential to make sure that the filtering process is robust to make sure all the intended members are filtered out (or in). This calls for a flexible dynamic filtering process instead of putting in all the individual member names. Here in the below query we will be able to recursively traverse the whole hierarchy tree starting from the top node. Then using these values we can easily implement in our design.
START WITH PARENT_FLEX_VALUE IN ('<<top node name>>')

While sourcing hierarchy from Oracle Data Relationship Management, the first step is to connect our DRM application to a database and then export the DRM hierarchy to a database table (unless you love to work with files more) with the values TOP_NODE, NODE_NAME, PARENT as the mandatory fields. Then, while sourcing from this table, we can similarly use our filter with a CONNECT BY query as above. We can also use DRM Properties as flags and then use the values of those flags (Y or N or some other value) from the table as part of our filtering, but that's usually required for more complex scenarios.

While sourcing data from an Essbase cube, first we have to identify the full hierarchy information. Based on that information we can implement our filtering process. We will be using Essbase Report Script to extract the Essbase outline information using IDESCENDANTS - this will return the values of all the members in the hierarchy starting from the top node. Once the output of this Report Script is stored in a database table, it can be easily used to fetch our required information by placing the filter to include or exclude the members of a tree. Again, I always prefer keeping everything as dynamic and flexible as possible to minimize manual effort in the future in case of any changes - so CONNECT BY is my go-to choice.

While sourcing hierarchy data from a custom data warehouse, things cannot get more exciting. We are blessed with all the freedom in the world to optimize our design as much we want. I prefer designing and creating custom mappings for each dimension with different target dimension tables which provides a lot of ease of maintenance and troubleshooting in the long run. Each target table will be containing all the hierarchy information for the specific dimension along with it's corresponding keys. Then we can load these keys to the fact table as usual. Now, while sourcing each of the dimensions, we can choose to include or exclude specific trees in the hierarchy. The same logic using CONNECT BY will be used here, but need to be repeated for each specific dimension mapping - since we have separate target tables for each dimension as part of design and future troubleshooting optimization.

In each of the above cases, we can see the value for <<top node name>> is critical for the dynamic filtering to work. It's better to keep the value(s) of the top node(s) as much flexible as possible so that any future change can be absorbed with minimal impact. This can be done by storing the values of the relevant top nodes in a table. And refreshing this table from a file at the first step of our ODI incremental load plan. The file will be containing the values of the top nodes, so our only single point of maintenance becomes the file - using this file we can control whether to include or exclude entire hierarchy trees or specific values while loading into our Oracle BI Apps warehouse. This sort of flexibility no doubt comes at the cost of a lot of effort during the initial development phase, but surely proves to be a very decent and robust solution in the long run.

How do you prefer to control your hierarchy data?

Wednesday, November 02, 2016

Managing BI Services using ODI

Oracle Data Integrator is one of the widely used tools for data loading into the Oracle Business Intelligence data warehouse. Often it happens that due to frequent data updates in the source systems, the data needs to be refreshed in the data warehouse during the daytime, while users might still be trying to run reports for their reporting purpose. This no doubt calls for a mechanism to restrict the user data access during the refresh activity to avoid dirty reads.

We will be using the WebLogic Scripting Tool to achieve our objective. At the start of the load plan that loads the data from source to target, we will add a step that will bring down the services so that BI cannot be accessed by the users. Again at the end of the load plan we will add a step to bring the services up. This will also help us ensure that if the load fails the services will continue to stay down.

To Stop:

We will be creating an ODI OS command with the below script that will allow us to stop the BI services: stop

This script will call the script with the variable value as stop. The content of will be:

$MW_HOME/oracle_common/common/bin/ stop

The parameters required for can be passed and set via other supporting (usually ".py") files as described here.

To Start:

We will be creating an ODI OS command with the below script that will allow us to start the BI services: start

This script will call the script with the variable value as start. The content of will be:

$MW_HOME/oracle_common/common/bin/ start

The values for start and stop are passed over from Using the above variables for stop, start and Middleware home further means we have an increased flexibility and these variables can be refreshed using ODI variables in the package from a database control table. Thus this code will run fine in every environment without requiring any change, since the values are already set specific to each instance in our database control table with the appropriate values.

Another advantage of having the start and stop mechanism through ODI is utilizing the daily executing statistics to predict the completion time of the job. Usually the execution time tends to vary over a period of time and often shows a pattern, i.e. the job might take longer on specific days of the months, or might be increasing by a few minutes every week. These can be considered in a ETC (estimated time of completion) calculator procedure that can simply calculate the expected end time of the job using our custom algorithm. This information is often extremely handy in crunch situations and puts the static history data to good use for all. There goes some small AI stuff again!

How do you automate your BI service related activities and downtime maintenance?

Sunday, October 23, 2016

Sqoop Import and Export tables from Hive to Oracle Database

Exporting and Importing table data from Oracle database to Hive and vice-versa is one of the most common activities in the world of Hadoop. It is essential to get sorted out on a few basics for seamless first time integration so as to avoid various parsing and loading errors.

We will be doing the below activities sequentially so as to cover all the integration points between Oracle database, Sqoop, HDFS and Hive.

    Step 1: Extract data from a source Oracle database table to Hadoop file system using Sqoop
    Step 2: Load the above Sqoop extracted data into a Hive table
    Step 3: Use Hive query to generate a file extract in the Hadoop file system
    Step 4: Load the generated file in Step 3 to a new target Oracle database table

Step 1: Sqoop import data from Oracle database to Hive table

Our first task is to identify our source Oracle database table, and then use Sqoop to fetch the data from this table to HDFS using Sqoop.

SQOOP IMPORT --connect "jdbc:oracle:thin:@<<database host>>:<<database port number>>:<<database service name>>" --password "<<database password>>" --username "<<database username>>" --table "<<source schema name>>.<<source database table>>" --columns "<<column1>>,<<column2>>" -m 1 --target-dir "<<HDFS path>>" --verbose

It is interesting to observe that we need to identify a primary key for the source Oracle database table. Else we will get the error "Error during import: No primary key could be found for table". If we want to skip assigning a key we can include the highlighted parameter -m 1.

Step 2: Load the above Sqoop extracted data to a Hive table

Assuming we already have a table created in Hive, we will load the file created in Step 1 into the Hive table using the below syntax.

LOAD DATA INPATH '<<HDFS path>>' INTO TABLE <<hive table name>>;

Step 3: Export a file using Hive query to be consumed by Sqoop

Now that we have the data in our Hive table, we will use the below command to create a file using a custom Hive query, in the green highlighted path. The delimiter highlighted in yellow can be changed according to our requirement - but accordingly it must be changed in Step 4 also where it's highlighted in yellow.

INSERT OVERWRITE DIRECTORY '<<HDFS directory>>/<<file name>>'
select * from <<hive table name>> where <<condition>>;

Step 4: Load data from Hive table exported file to Oracle database table

The below command will use the above Hive exported file (from the same green highlighted path) to load our target Oracle database table.

SQOOP EXPORT --connect "jdbc:oracle:thin:@<<database host>>:<<database port number>>:<<database service name>>" --password "<<database password>>" --username "<<database username>>" --table "<<target schema name>>.<<target database table>>" --input-fields-terminated-by ',' --input-lines-terminated-by '\n' --export-dir "<<HDFS directory>>/<<file name>>" --input-null-string "\\\\N" --input-null-non-string "\\\\N" --verbose

The orange and blue highlighted sections above helps reading the records while exporting to the target database table. Else we might sometimes encounter the error "Can't parse input data".

Thus we have successfully loaded a table from Oracle database to Hive and again back from Hive to Oracle database, using Sqoop. We can query our Oracle and Hive databases to check the data if it is loaded fine. How do you prefer to load your data between these two systems?

Saturday, October 22, 2016

How ODI IKM SQL to Hyperion Essbase (METADATA) influences data loading

Oracle Data Integrator integrates with Essbase for metadata as well as data loading using different Knowledge Modules. Each of the KMs provides a range of options for us to customize the loading as we want. The IKM SQL to Hyperion Essbase (METADATA) is usually the starting point when we begin our activities, since first we will load the metadata and get the outline ready, then we can load the actual data.

The standard practice of the using the IKM SQL to Hyperion Essbase (METADATA) is to create an ODI interface for each dimension using this KM, and provide the values of the parameters (refer page 4 for full list) in each of the interfaces as applicable. As observed below, we need to create a rule file for the different dimensions in Essbase and provide those rule file names as the value for the parameter RULES_FILE. In case we need the data to be loaded in a specific order, we can use the ORDER_BY clause accordingly.

So we can create an ODI package with 6 different interfaces, each with it's own rule file and loading the corresponding dimension hierarchy in the outline. The value for RULE_SEPARATOR need to be set correctly here, else the outline will not reflect any updates and the interface will not work. The value for RESTRUCTURE_DATABASE defines what to do after the metadata loading. If we are clearing out all the data prior to our metadata loading via MaxL, then the default value KEEP_ALL_DATA does not make a difference. Else we can use the values KEEP_INPUT_DATA or KEEP_LEVEL0_DATA or DISCARD_ALL_DATA as per our requirement.

The Essbase cube outline refresh plays a very important role in the daily life of the cube. If the loading process or parent child loading sequence is not set correctly, the members can fall off during the loading phase. For example, if we try to load the child before it's parent, the child will fail to get loaded. And then the parent will stay without it's child at the end of the load. It becomes even more difficult to track such scenarios if we have LOG_ENABLED as the default false value.

The LOG_FILE_NAME and the ERROR_LOG_FILENAME can prove to be very beneficial in such cases. It is always recommended to generate a log during our loading phases, unless we have a strong data quality control check in place. Incorrect outline build can drastically affect the subsequent data loading process, leading to multiple records rejections due to missing members in the outline. This can lead to confusion and data mismatches across different data sources, thus causing a nightmare for developers, more so if insufficient logging is in place.

While loading the dimension members, we can also use the ODI interfaces to load the Alias values of each of the members. If we have more than one Alias, we can accordingly use multiple interfaces and rule files to populate the values accordingly. Only thing is we have to make sure each of the rule files point to the correct Alias in the path below:

    Rule File → Dimension Build Settings → Global Settings → Global Properties → Update Alias Table → <<Alias Table Name>>

So we can use identical ODI interfaces with different values in the Alias field in the Target and different rule files values in the flow properties to load them.

The ODI interfaces during meta data loading sometimes gives the error 'Cannot open cube outline'. This is often caused by a parallel running job which is in incomplete status, thus preventing our interface to have a lock on the outline. Or it can be due to a developer who has locked the outline in edit mode but forgotten to unlock it - thus again preventing our ODI interface to get access to the Essbase outline in Edit mode. In such cases we need to identify the lock and then release it, then restart our ODI process.

This IKM is pretty peaceful otherwise and keeps doing it's tasks quietly over time. How do you use your IKMs to refresh the Essbase metadata?