Friday, November 04, 2016

Oracle BI Apps dynamic hierarchy filtering in ODI incremental load plan

Oracle Business Intelligence Applications often sources data from various systems and it is often required to restrict or allow various dimension information during the loading processes. There are a variety of ways this can be done, let's take a walk through of few possible mechanisms of how this can be achieved in the Oracle Data Integrator incremental load plan with the maximum amount of flexibility.

Our source system can be a number of applications like Oracle E-business Suite, Oracle Data Relationship Management, custom data warehouses, custom Essbase cubes, etc. To process the transaction data from each of these source systems, we need to first process the dimensional data from them.

While loading to Oracle BI Apps, there a few integration points which need to be closely monitored. The first area to focus on is the table W_GL_SEGMENT_DH (loaded by the objects in the folder SIL_GLSegmentDimensionHierarchy) which holds all the hierarchy information. This table gets loaded from the table W_GL_SEGMENT_DHS, so we can put a filter based on the top node of the hierarchy tree we want to exclude - so our filter will look like:
    WHERE W_GL_SEGMENT_DHS.HIERARCHY_ID NOT IN ('<<top node name>>')
This will ensure that all the members in the hierarchy under the node <<top node name>> are excluded while loading to W_GL_SEGMENT_DH.

While sourcing from Oracle E-business Suite, if we are using custom interfaces to source the hierarchy information, it becomes essential to make sure that the filtering process is robust to make sure all the intended members are filtered out (or in). This calls for a flexible dynamic filtering process instead of putting in all the individual member names. Here in the below query we will be able to recursively traverse the whole hierarchy tree starting from the top node. Then using these values we can easily implement in our design.
START WITH PARENT_FLEX_VALUE IN ('<<top node name>>')

While sourcing hierarchy from Oracle Data Relationship Management, the first step is to connect our DRM application to a database and then export the DRM hierarchy to a database table (unless you love to work with files more) with the values TOP_NODE, NODE_NAME, PARENT as the mandatory fields. Then, while sourcing from this table, we can similarly use our filter with a CONNECT BY query as above. We can also use DRM Properties as flags and then use the values of those flags (Y or N or some other value) from the table as part of our filtering, but that's usually required for more complex scenarios.

While sourcing data from an Essbase cube, first we have to identify the full hierarchy information. Based on that information we can implement our filtering process. We will be using Essbase Report Script to extract the Essbase outline information using IDESCENDANTS - this will return the values of all the members in the hierarchy starting from the top node. Once the output of this Report Script is stored in a database table, it can be easily used to fetch our required information by placing the filter to include or exclude the members of a tree. Again, I always prefer keeping everything as dynamic and flexible as possible to minimize manual effort in the future in case of any changes - so CONNECT BY is my go-to choice.

While sourcing hierarchy data from a custom data warehouse, things cannot get more exciting. We are blessed with all the freedom in the world to optimize our design as much we want. I prefer designing and creating custom mappings for each dimension with different target dimension tables which provides a lot of ease of maintenance and troubleshooting in the long run. Each target table will be containing all the hierarchy information for the specific dimension along with it's corresponding keys. Then we can load these keys to the fact table as usual. Now, while sourcing each of the dimensions, we can choose to include or exclude specific trees in the hierarchy. The same logic using CONNECT BY will be used here, but need to be repeated for each specific dimension mapping - since we have separate target tables for each dimension as part of design and future troubleshooting optimization.

In each of the above cases, we can see the value for <<top node name>> is critical for the dynamic filtering to work. It's better to keep the value(s) of the top node(s) as much flexible as possible so that any future change can be absorbed with minimal impact. This can be done by storing the values of the relevant top nodes in a table. And refreshing this table from a file at the first step of our ODI incremental load plan. The file will be containing the values of the top nodes, so our only single point of maintenance becomes the file - using this file we can control whether to include or exclude entire hierarchy trees or specific values while loading into our Oracle BI Apps warehouse. This sort of flexibility no doubt comes at the cost of a lot of effort during the initial development phase, but surely proves to be a very decent and robust solution in the long run.

How do you prefer to control your hierarchy data?

Wednesday, November 02, 2016

Managing BI Services using ODI

Oracle Data Integrator is one of the widely used tools for data loading into the Oracle Business Intelligence data warehouse. Often it happens that due to frequent data updates in the source systems, the data needs to be refreshed in the data warehouse during the daytime, while users might still be trying to run reports for their reporting purpose. This no doubt calls for a mechanism to restrict the user data access during the refresh activity to avoid dirty reads.

We will be using the WebLogic Scripting Tool to achieve our objective. At the start of the load plan that loads the data from source to target, we will add a step that will bring down the services so that BI cannot be accessed by the users. Again at the end of the load plan we will add a step to bring the services up. This will also help us ensure that if the load fails the services will continue to stay down.

To Stop:

We will be creating an ODI OS command with the below script that will allow us to stop the BI services: stop

This script will call the script with the variable value as stop. The content of will be:

$MW_HOME/oracle_common/common/bin/ stop

The parameters required for can be passed and set via other supporting (usually ".py") files as described here.

To Start:

We will be creating an ODI OS command with the below script that will allow us to start the BI services: start

This script will call the script with the variable value as start. The content of will be:

$MW_HOME/oracle_common/common/bin/ start

The values for start and stop are passed over from Using the above variables for stop, start and Middleware home further means we have an increased flexibility and these variables can be refreshed using ODI variables in the package from a database control table. Thus this code will run fine in every environment without requiring any change, since the values are already set specific to each instance in our database control table with the appropriate values.

Another advantage of having the start and stop mechanism through ODI is utilizing the daily executing statistics to predict the completion time of the job. Usually the execution time tends to vary over a period of time and often shows a pattern, i.e. the job might take longer on specific days of the months, or might be increasing by a few minutes every week. These can be considered in a ETC (estimated time of completion) calculator procedure that can simply calculate the expected end time of the job using our custom algorithm. This information is often extremely handy in crunch situations and puts the static history data to good use for all. There goes some small AI stuff again!

How do you automate your BI service related activities and downtime maintenance?

Sunday, October 23, 2016

Sqoop Import and Export tables from Hive to Oracle Database

Exporting and Importing table data from Oracle database to Hive and vice-versa is one of the most common activities in the world of Hadoop. It is essential to get sorted out on a few basics for seamless first time integration so as to avoid various parsing and loading errors.

We will be doing the below activities sequentially so as to cover all the integration points between Oracle database, Sqoop, HDFS and Hive.

    Step 1: Extract data from a source Oracle database table to Hadoop file system using Sqoop
    Step 2: Load the above Sqoop extracted data into a Hive table
    Step 3: Use Hive query to generate a file extract in the Hadoop file system
    Step 4: Load the generated file in Step 3 to a new target Oracle database table

Step 1: Sqoop import data from Oracle database to Hive table

Our first task is to identify our source Oracle database table, and then use Sqoop to fetch the data from this table to HDFS using Sqoop.

SQOOP IMPORT --connect "jdbc:oracle:thin:@<<database host>>:<<database port number>>:<<database service name>>" --password "<<database password>>" --username "<<database username>>" --table "<<source schema name>>.<<source database table>>" --columns "<<column1>>,<<column2>>" -m 1 --target-dir "<<HDFS path>>" --verbose

It is interesting to observe that we need to identify a primary key for the source Oracle database table. Else we will get the error "Error during import: No primary key could be found for table". If we want to skip assigning a key we can include the highlighted parameter -m 1.

Step 2: Load the above Sqoop extracted data to a Hive table

Assuming we already have a table created in Hive, we will load the file created in Step 1 into the Hive table using the below syntax.

LOAD DATA INPATH '<<HDFS path>>' INTO TABLE <<hive table name>>;

Step 3: Export a file using Hive query to be consumed by Sqoop

Now that we have the data in our Hive table, we will use the below command to create a file using a custom Hive query, in the green highlighted path. The delimiter highlighted in yellow can be changed according to our requirement - but accordingly it must be changed in Step 4 also where it's highlighted in yellow.

INSERT OVERWRITE DIRECTORY '<<HDFS directory>>/<<file name>>'
select * from <<hive table name>> where <<condition>>;

Step 4: Load data from Hive table exported file to Oracle database table

The below command will use the above Hive exported file (from the same green highlighted path) to load our target Oracle database table.

SQOOP EXPORT --connect "jdbc:oracle:thin:@<<database host>>:<<database port number>>:<<database service name>>" --password "<<database password>>" --username "<<database username>>" --table "<<target schema name>>.<<target database table>>" --input-fields-terminated-by ',' --input-lines-terminated-by '\n' --export-dir "<<HDFS directory>>/<<file name>>" --input-null-string "\\\\N" --input-null-non-string "\\\\N" --verbose

The orange and blue highlighted sections above helps reading the records while exporting to the target database table. Else we might sometimes encounter the error "Can't parse input data".

Thus we have successfully loaded a table from Oracle database to Hive and again back from Hive to Oracle database, using Sqoop. We can query our Oracle and Hive databases to check the data if it is loaded fine. How do you prefer to load your data between these two systems?

Saturday, October 22, 2016

How ODI IKM SQL to Hyperion Essbase (METADATA) influences data loading

Oracle Data Integrator integrates with Essbase for metadata as well as data loading using different Knowledge Modules. Each of the KMs provides a range of options for us to customize the loading as we want. The IKM SQL to Hyperion Essbase (METADATA) is usually the starting point when we begin our activities, since first we will load the metadata and get the outline ready, then we can load the actual data.

The standard practice of the using the IKM SQL to Hyperion Essbase (METADATA) is to create an ODI interface for each dimension using this KM, and provide the values of the parameters (refer page 4 for full list) in each of the interfaces as applicable. As observed below, we need to create a rule file for the different dimensions in Essbase and provide those rule file names as the value for the parameter RULES_FILE. In case we need the data to be loaded in a specific order, we can use the ORDER_BY clause accordingly.

So we can create an ODI package with 6 different interfaces, each with it's own rule file and loading the corresponding dimension hierarchy in the outline. The value for RULE_SEPARATOR need to be set correctly here, else the outline will not reflect any updates and the interface will not work. The value for RESTRUCTURE_DATABASE defines what to do after the metadata loading. If we are clearing out all the data prior to our metadata loading via MaxL, then the default value KEEP_ALL_DATA does not make a difference. Else we can use the values KEEP_INPUT_DATA or KEEP_LEVEL0_DATA or DISCARD_ALL_DATA as per our requirement.

The Essbase cube outline refresh plays a very important role in the daily life of the cube. If the loading process or parent child loading sequence is not set correctly, the members can fall off during the loading phase. For example, if we try to load the child before it's parent, the child will fail to get loaded. And then the parent will stay without it's child at the end of the load. It becomes even more difficult to track such scenarios if we have LOG_ENABLED as the default false value.

The LOG_FILE_NAME and the ERROR_LOG_FILENAME can prove to be very beneficial in such cases. It is always recommended to generate a log during our loading phases, unless we have a strong data quality control check in place. Incorrect outline build can drastically affect the subsequent data loading process, leading to multiple records rejections due to missing members in the outline. This can lead to confusion and data mismatches across different data sources, thus causing a nightmare for developers, more so if insufficient logging is in place.

While loading the dimension members, we can also use the ODI interfaces to load the Alias values of each of the members. If we have more than one Alias, we can accordingly use multiple interfaces and rule files to populate the values accordingly. Only thing is we have to make sure each of the rule files point to the correct Alias in the path below:

    Rule File → Dimension Build Settings → Global Settings → Global Properties → Update Alias Table → <<Alias Table Name>>

So we can use identical ODI interfaces with different values in the Alias field in the Target and different rule files values in the flow properties to load them.

The ODI interfaces during meta data loading sometimes gives the error 'Cannot open cube outline'. This is often caused by a parallel running job which is in incomplete status, thus preventing our interface to have a lock on the outline. Or it can be due to a developer who has locked the outline in edit mode but forgotten to unlock it - thus again preventing our ODI interface to get access to the Essbase outline in Edit mode. In such cases we need to identify the lock and then release it, then restart our ODI process.

This IKM is pretty peaceful otherwise and keeps doing it's tasks quietly over time. How do you use your IKMs to refresh the Essbase metadata?

Friday, October 21, 2016

Analyzing ODI data load using IKM SQL to Hyperion Essbase (DATA)

Oracle Data Integrator provides a range of Knowledge Modules to integrate with and process data from and load to various applications, and it is no different with Oracle Essbase. The Knowledge Module IKM SQL to Hyperion Essbase (DATA) loads the data from a Oracle database to an Essbase cube - and has various parameters (refer page 5 for full list) to customize it as per our requirement.

Now we need to have the data ready in our Oracle database table to be used as source, usually having all the dimensions (like Time, Account, Department, etc) and then the fact value (Actuals) as the base columns (this is the minimum requirement). We can have more than one fact value (say Budget data), these can be loaded to the Essbase cube using the same source table or another different table, whichever is convenient.

Next we select the Essbase cube as our target - say Sample.Sample. So now we are ready to create the ODI interface to load our data. We drag the Oracle table datastore in our source and the Cube datastore in our target. We will select the IKM SQL to Hyperion Essbase (DATA) as KM in our flow properties for the Target as shown below. It is important to observe each of the parameters closely for this ODI interface to behave exactly as expected to.

The default values usually suffices, but to optimize our loading, it is advisable to play around a little with the values for COMMIT_INTERVAL and the RULES_FILE. It can significantly enhance the performance of the loading process. Also note that the value 0 for MAXIMUM_ERRORS_ALLOWED does not signify the loading stops even for one error, it's just the opposite, here 0 signifies infinity. So the process will ignore all errors and succeed always, even if it is unable to load any record. The log file configured for the variable ERROR_LOG_FILENAME can often come in handy in such cases if the ODI Operator log or the Essbase application log do not provide any fruitful information.

The CLEAR_DATABASE option can be set to true if we are doing a full refresh every time. Also, this same activity can be achieved through MaxL (using ODI OS Commnad to call the MaxL script) for some more granular control.

The CALCULATION_SCRIPT option is a very handy option and is really useful when we need to run a calculation script immediately after our loading process completes. Thus we can skip a MaxL step by incorporating the calculation script within our ODI interface.

One of the common issues faced during the loading process is records getting rejected due to unknown member or missing member. This can be taken care by some robust data quality control check which is often overlooked and not given due importance - but it definitely saves a lot of time and effort in the long run.

Another commonly faced error which does not get captured in the log explicitly is the incorrect value of RULE_SEPARATOR. The default value is comma, but sometimes due to cloning or  migration issues, when we migrate the ODI interfaces with missing references errors, the values set for the IKM gets lost, and returns to the default comma. In such cases, it is best to migrate the interfaces again from higher instances (like Production) after the missing reference errors are fixed. This missing reference error after cloning or migration can happen due to missing KMs, or due to different internal ids for the same ODI objects and KMs between different ODI repositories. If re-migration becomes impossible, we need to re-configure the values for the IKM all over again, and at that time need to make sure we use the correct value for RULE_SEPARATOR.

We can also work on improving the performance of the data loading by customizing the FETCH_SIZE parameter - this parameter basically determines how many rows are read from the source at a time. John Goodwin has a nice article here regarding this.

What's your experience with the IKM SQL to Hyperion Essbase (DATA)?