Sunday, October 23, 2016

Sqoop Import and Export tables from Hive to Oracle Database

Exporting and Importing table data from Oracle database to Hive and vice-versa is one of the most common activities in the world of Hadoop. It is essential to get sorted out on a few basics for seamless first time integration so as to avoid various parsing and loading errors.

We will be doing the below activities sequentially so as to cover all the integration points between Oracle database, Sqoop, HDFS and Hive.

    Step 1: Extract data from a source Oracle database table to Hadoop file system using Sqoop
    Step 2: Load the above Sqoop extracted data into a Hive table
    Step 3: Use Hive query to generate a file extract in the Hadoop file system
    Step 4: Load the generated file in Step 3 to a new target Oracle database table

Step 1: Sqoop import data from Oracle database to Hive table

Our first task is to identify our source Oracle database table, and then use Sqoop to fetch the data from this table to HDFS using Sqoop.

SQOOP IMPORT --connect "jdbc:oracle:thin:@<<database host>>:<<database port number>>:<<database service name>>" --password "<<database password>>" --username "<<database username>>" --table "<<source schema name>>.<<source database table>>" --columns "<<column1>>,<<column2>>" -m 1 --target-dir "<<HDFS path>>" --verbose

It is interesting to observe that we need to identify a primary key for the source Oracle database table. Else we will get the error "Error during import: No primary key could be found for table". If we want to skip assigning a key we can include the highlighted parameter -m 1.

Step 2: Load the above Sqoop extracted data to a Hive table

Assuming we already have a table created in Hive, we will load the file created in Step 1 into the Hive table using the below syntax.

LOAD DATA INPATH '<<HDFS path>>' INTO TABLE <<hive table name>>;

Step 3: Export a file using Hive query to be consumed by Sqoop

Now that we have the data in our Hive table, we will use the below command to create a file using a custom Hive query, in the green highlighted path. The delimiter highlighted in yellow can be changed according to our requirement - but accordingly it must be changed in Step 4 also where it's highlighted in yellow.

INSERT OVERWRITE DIRECTORY '<<HDFS directory>>/<<file name>>'
select * from <<hive table name>> where <<condition>>;

Step 4: Load data from Hive table exported file to Oracle database table

The below command will use the above Hive exported file (from the same green highlighted path) to load our target Oracle database table.

SQOOP EXPORT --connect "jdbc:oracle:thin:@<<database host>>:<<database port number>>:<<database service name>>" --password "<<database password>>" --username "<<database username>>" --table "<<target schema name>>.<<target database table>>" --input-fields-terminated-by ',' --input-lines-terminated-by '\n' --export-dir "<<HDFS directory>>/<<file name>>" --input-null-string "\\\\N" --input-null-non-string "\\\\N" --verbose

The orange and blue highlighted sections above helps reading the records while exporting to the target database table. Else we might sometimes encounter the error "Can't parse input data".

Thus we have successfully loaded a table from Oracle database to Hive and again back from Hive to Oracle database, using Sqoop. We can query our Oracle and Hive databases to check the data if it is loaded fine. How do you prefer to load your data between these two systems?

Saturday, October 22, 2016

How ODI IKM SQL to Hyperion Essbase (METADATA) influences data loading

Oracle Data Integrator integrates with Essbase for metadata as well as data loading using different Knowledge Modules. Each of the KMs provides a range of options for us to customize the loading as we want. The IKM SQL to Hyperion Essbase (METADATA) is usually the starting point when we begin our activities, since first we will load the metadata and get the outline ready, then we can load the actual data.

The standard practice of the using the IKM SQL to Hyperion Essbase (METADATA) is to create an ODI interface for each dimension using this KM, and provide the values of the parameters (refer page 4 for full list) in each of the interfaces as applicable. As observed below, we need to create a rule file for the different dimensions in Essbase and provide those rule file names as the value for the parameter RULES_FILE. In case we need the data to be loaded in a specific order, we can use the ORDER_BY clause accordingly.

So we can create an ODI package with 6 different interfaces, each with it's own rule file and loading the corresponding dimension hierarchy in the outline. The value for RULE_SEPARATOR need to be set correctly here, else the outline will not reflect any updates and the interface will not work. The value for RESTRUCTURE_DATABASE defines what to do after the metadata loading. If we are clearing out all the data prior to our metadata loading via MaxL, then the default value KEEP_ALL_DATA does not make a difference. Else we can use the values KEEP_INPUT_DATA or KEEP_LEVEL0_DATA or DISCARD_ALL_DATA as per our requirement.

The Essbase cube outline refresh plays a very important role in the daily life of the cube. If the loading process or parent child loading sequence is not set correctly, the members can fall off during the loading phase. For example, if we try to load the child before it's parent, the child will fail to get loaded. And then the parent will stay without it's child at the end of the load. It becomes even more difficult to track such scenarios if we have LOG_ENABLED as the default false value.

The LOG_FILE_NAME and the ERROR_LOG_FILENAME can prove to be very beneficial in such cases. It is always recommended to generate a log during our loading phases, unless we have a strong data quality control check in place. Incorrect outline build can drastically affect the subsequent data loading process, leading to multiple records rejections due to missing members in the outline. This can lead to confusion and data mismatches across different data sources, thus causing a nightmare for developers, more so if insufficient logging is in place.

While loading the dimension members, we can also use the ODI interfaces to load the Alias values of each of the members. If we have more than one Alias, we can accordingly use multiple interfaces and rule files to populate the values accordingly. Only thing is we have to make sure each of the rule files point to the correct Alias in the path below:

    Rule File → Dimension Build Settings → Global Settings → Global Properties → Update Alias Table → <<Alias Table Name>>

So we can use identical ODI interfaces with different values in the Alias field in the Target and different rule files values in the flow properties to load them.

The ODI interfaces during meta data loading sometimes gives the error 'Cannot open cube outline'. This is often caused by a parallel running job which is in incomplete status, thus preventing our interface to have a lock on the outline. Or it can be due to a developer who has locked the outline in edit mode but forgotten to unlock it - thus again preventing our ODI interface to get access to the Essbase outline in Edit mode. In such cases we need to identify the lock and then release it, then restart our ODI process.

This IKM is pretty peaceful otherwise and keeps doing it's tasks quietly over time. How do you use your IKMs to refresh the Essbase metadata?

Friday, October 21, 2016

Analyzing ODI data load using IKM SQL to Hyperion Essbase (DATA)

Oracle Data Integrator provides a range of Knowledge Modules to integrate with and process data from and load to various applications, and it is no different with Oracle Essbase. The Knowledge Module IKM SQL to Hyperion Essbase (DATA) loads the data from a Oracle database to an Essbase cube - and has various parameters (refer page 5 for full list) to customize it as per our requirement.

Now we need to have the data ready in our Oracle database table to be used as source, usually having all the dimensions (like Time, Account, Department, etc) and then the fact value (Actuals) as the base columns (this is the minimum requirement). We can have more than one fact value (say Budget data), these can be loaded to the Essbase cube using the same source table or another different table, whichever is convenient.

Next we select the Essbase cube as our target - say Sample.Sample. So now we are ready to create the ODI interface to load our data. We drag the Oracle table datastore in our source and the Cube datastore in our target. We will select the IKM SQL to Hyperion Essbase (DATA) as KM in our flow properties for the Target as shown below. It is important to observe each of the parameters closely for this ODI interface to behave exactly as expected to.

The default values usually suffices, but to optimize our loading, it is advisable to play around a little with the values for COMMIT_INTERVAL and the RULES_FILE. It can significantly enhance the performance of the loading process. Also note that the value 0 for MAXIMUM_ERRORS_ALLOWED does not signify the loading stops even for one error, it's just the opposite, here 0 signifies infinity. So the process will ignore all errors and succeed always, even if it is unable to load any record. The log file configured for the variable ERROR_LOG_FILENAME can often come in handy in such cases if the ODI Operator log or the Essbase application log do not provide any fruitful information.

The CLEAR_DATABASE option can be set to true if we are doing a full refresh every time. Also, this same activity can be achieved through MaxL (using ODI OS Commnad to call the MaxL script) for some more granular control.

The CALCULATION_SCRIPT option is a very handy option and is really useful when we need to run a calculation script immediately after our loading process completes. Thus we can skip a MaxL step by incorporating the calculation script within our ODI interface.

One of the common issues faced during the loading process is records getting rejected due to unknown member or missing member. This can be taken care by some robust data quality control check which is often overlooked and not given due importance - but it definitely saves a lot of time and effort in the long run.

Another commonly faced error which does not get captured in the log explicitly is the incorrect value of RULE_SEPARATOR. The default value is comma, but sometimes due to cloning or  migration issues, when we migrate the ODI interfaces with missing references errors, the values set for the IKM gets lost, and returns to the default comma. In such cases, it is best to migrate the interfaces again from higher instances (like Production) after the missing reference errors are fixed. This missing reference error after cloning or migration can happen due to missing KMs, or due to different internal ids for the same ODI objects and KMs between different ODI repositories. If re-migration becomes impossible, we need to re-configure the values for the IKM all over again, and at that time need to make sure we use the correct value for RULE_SEPARATOR.

We can also work on improving the performance of the data loading by customizing the FETCH_SIZE parameter - this parameter basically determines how many rows are read from the source at a time. John Goodwin has a nice article here regarding this.

What's your experience with the IKM SQL to Hyperion Essbase (DATA)?

Tuesday, October 18, 2016

ODI Automation of Smart View Manual Activities

Oracle Data Integrator provides multiple features to automate almost all manual data extraction and transformation related activities. One such scenario is using Smart View to manually fetch data from the Essbase cubes, maybe weekly or monthly. Often such cases require changing the values of time dimension over the entire scope of the template (maybe to different values for current year and last year, or current month and last month, in adjacent columns, etc) and then retrieve the data to meet our needs. So it might become time consuming and prone to manual errors.

This is where ODI comes really handy with the KM called LKM Hyperion Essbase DATA to SQL. First we need to create our report script (say X.rep) within Essbase to fetch the data we require. This report script is the first layer of data extraction and must include all the relevant information and records. It might not necessarily be formatted exactly as per requirement but should extract all the data for the next processes to act on this retrieved data.

Since the report script is containing all the filters for time dimension and other parameters - it is essential to keep it dynamic to avoid manual intervention. Lets save my report script content in a table T1 and instead of hard-coding the time value, I name it "DummyPeriod". Then using an ODI interface, my first step is to create the report script with the actual value of "DummyPeriod". Using the KM IKM SQL to File Append, I use the table T1 as my source and the file X.rep as my target. In the target, I use the REPLACE function to replace "DummyPeriod" with my actual period value (this value can get refreshed by a ODI variable). In the Flow properties for this IKM, the value for TRUNCATE must be set to True - else the Report Script will get appended with it's old content every time I create the report script using the ODI interface. Thus the report script gets created dynamically every time with dynamically generated values.

Now, in a new ODI interface, a datastore which matches the report script's number of output columns and data-types will act as the Source, and a database staging table will act as our Target. The LKM Hyperion Essbase DATA to SQL will be used here. It has the flow control parameters EXTRACTION_QUERY_TYPE and  EXTRACTION_QUERY_FILE (refer page 9). Our values in this case will be ReportScript and X respectively. Within this interface, we can map the fields from the source to the target and use any filter or join or transformation as required.

Once the data is loaded in the target database staging table, we can implement standard ETL transformations to modify the data as per our required formatting. Then we can unload this data in a delimited file as required. Once the file is ready, it can be zipped and sent over as an email attachment to the intended recipients. It can also be sent via SFTP to any other server and path if required, thus completely eliminating any human intervention.

So now we can save our valuable time to catch some Pokemons! How do you automate your Smart View activities?

Saturday, October 15, 2016

ODI Trigger Essbase MaxL Dynamically with no hard-coding

The Oracle Data Integrator tool provides a flexible option, though customized, to call and execute MaxL scripts to perform various operations on our Essbase cubes. And more importantly, this approach using ODI is completely independent of the environment we are going to run our ODI codes in. A simple control table will contain the server information that will be referred by the ODI code to run the MaxL scripts in that specific Essbase instance.

So the first step is to create my control table with all the different values that will be used by my ODI code. This control table will be present in all the instances like Development Test Production, each with it's own unique values as applicable for that environment. Usually the server admins will be having all these information and the table might only have read access in Development instance. In higher instance the developers might not be allowed to have read access on this table since it will contain sensitive information like the user credentials. The below are the parameters that will be stored in the control table:

    1. Server Directory (where I will find my script)
    2. MaxL Directory (where I have saved my custom MaxL scripts)
    3. The name of my Essbase Application
    4. The name of my Essbase Database (maybe same as Essbase Application name)
    5. Essbase Server Name
    6. Essbase Login User Name with full privilege
    7. Essbase Login Password for the above user

Now, once I have all the required values, it is time to start creating my ODI package. I will need 7 variables to fetch the values of the above 7 parameters. Then I will use an ODI command with the below syntax to run my MaxL script.


The MaxL script now will need to start with the below lines to accept the above values:

    /* Initialize variables */
    SET app =$1;
    SET db =$2;
    SET esbs =$3;
    SET user =$4;
    SET pwd =$5;

    /* Login to the server */
    login "$user" "$pwd" on "$esbs";
    iferror 'loginFailed';

Then the MaxL script can proceed to do it's own activities (like executing calculations) on the cube as usual. That's all to this short flexible process! How do you prefer to execute your MaxL scripts?

Friday, October 14, 2016

ODI Performance Tuning using indexes and keys

The Oracle Data Integrator load plans involve extracting and loading large volumes of data and then transforming them as per requirements. Often the volume of data becomes a bottleneck and the execution timings goes beyond the acceptable limits. That is when some of the age-old practices of performance improvement using keys and indexes come in.

The intermediate C$ work tables and staging tables used in ODI are often ignored as part of the tuning process. It is not always necessary that the Extract and Load phases need to end with the same data set in the Source and the Target Staging areas. Let's say I have a unique key column (X) in my final target table that holds all the data after transformation. This unique key (X) is a sequence number that increases gradually in the source system. So now, during my extract phase, my requirement being fetching only the incremental data to the final target table, I will fetch those records in my source with the filter where the source key is greater than the max (X) of my target final table. So, I do not need to fetch all the records, instead only a subset of the data that meets the 'greater than' criterion.

Often, due to millions of records in the source table (example the table XLA_AE_LINE_ACS in XLA schema in EBS), the process takes an absurd amount of time. This is when we implement an index on the column X in the source system, and the same ODI load performance improves dramatically. It is relevant to note that my custom ODI code uses the filtering only on column X, and no other join or column is being used here.

How do you prefer to tune your custom ODI codes?

Thursday, October 13, 2016

Troubleshooting Hive in Hortonworks Virtualbox Sandbox

While getting started with Hadoop, the sandbox provided by Hortonworks is an easy to use starting point. Once we download the Virtualbox, we can easily import it into our existing VirtualBox Manager. This is assuming we already have a VM VirtualBox Manager installed in our machine and the machine meets the minimum system requirements.

The VirtualBox will create a Virtual Disk Image in the path specified during the import process, I prefer to keep it in my external storage systems to save space since I have more than one such Virtual Boxes deployed.

So now that we have imported successfully and logged in using the default credentials root and hadoop, I get the below error when I try to use hive inside the sandbox.

There's a browser portal that can be used to access:
Here again I get stuck at the below screen for an indefinite amount of time. The screen freezes at the initialization step for a long time. It shows initializing but does not, and the CPU and memory consumption stays the same for a long period of time.

After shutting down the virtual box and re-logging in and again repeating the steps, finally the login worked fine. And I am able to enter the Hive prompt and execute few simple statements.

The create database for a sample database and create schema worked fine as expected.

Then creating a dummy sample table also worked fine inside the sandbox. Thus we are ready with Hive in a virtual box to try out our next steps.

Essbase data quality control using ODI

Oracle Essbase is an excellent tool for handling large volumes of data and doing complex calculations very fast. But it is interesting to observe how good the data is after we load it into Essbase. Seldom we validate the error logs or the Essbase server logs generated during the load process. Also much rarely do we set the process to fail for minor issues. It is under these circumstances that the quality of the data needs to come under scrutiny.

Using the ODI IKM SQL to Hyperion Essbase (DATA) we daily load the data to the Essbase cubes. But it might so happen that the transactional data is not getting loaded into the cubes, because the outline build was not successful. Thus, due to a missing member (say for Account dimension) in the outline, all transactional data corresponding to that specific Account number will get rejected since there is simply no placeholder for the data to get into. This missing member in the outline can happen due to multiple reasons - cube outline build process failed, the source tables or files for outline build not having the correct data, DRM not having the missing member in it's hierarchy, etc.

The impact of this Essbase data discrepancy is profound. The data between Database reports and Essbase reports can drastically differ giving inconsistent results under such scenario causing more confusion. Also, data between different environments (say Dev or Test or Prod) can start varying for the same historic month - obviously due to the outline differences causing the data loads to behave differently - succeeding for some and failing for some.

This scenario can be avoided by keeping a separate step during the transactional data loading process. This step will be a quality control check to make sure that all the segment codes for which the data we intend to load to the Essbase cubes already exist in the outline. If yes, then only proceed with the loading phase (to avoid the data falling off during loading) - else even if a single member is missing in the outline, then trigger an alert to take necessary remedial action. In case you are thinking how to run a compare check with the outline, it's easy - either use a customized report script to export a dump of all the members for that specific dimension using IDESCENDANTS or use the table (or file) that was used as source to build the outline. Any of these can be used as the outline data reference, and then can be used to compare with the transactional data.

How do you make sure your Essbase data quality is perfect?

Wednesday, October 12, 2016

Type 2 Slowly Changing Dimension - How ORA_HASH (mis) fits in

The ORA_HASH is one of my favorite functions to generate a hash value and use it for that record in a variety of ways. But there are certain circumstances where even an useful function like ORA_HASH can have it's limitations. Let's take a look at how this function can act while implementing SCD2.
The above snapshot is from a table where SCD2 is implemented. The logic calculates the ORA_HASH value of each record using the values in columns Field 1, Field 2 and Field 3. If the value is new, it marks the existing record as "Active Flag" N, and inserts the new record as "Active Flag" Y. Thus we can see, for R2, the record having V6 is marked Y since on Day 5 the hash value has changed for this record id.

But note what happens for record R1. On Day 2, the hash value changed (from 100 to 200) due to a different value (V4) in Field 3 - so the new record got inserted with hash value 200. The new record gets flagged Y and the previous one gets tagged N. But on Day 3, when the Field 3 of R1 changes back to V3 (from V4),  it's hash value is again back to 100. Now, since the logic checks that if the new hash value (100) is different from the existing value (200) of the existing R1 record tagged Y, it will flag the record with hash value 200 as N. Now, I have to be careful to make sure the old hash value 100 which already resides with a flag of N do not interfere with the new record (also having hash value 100).

Several scenarios can happen in this case if not handled properly:

    1. We can have duplicates of the record (with hash value 100) both flagged as N since the hash value is already existing
    2. We can have duplicates of the record (with hash value 100) both flagged as Y since the hash value is already existing and it's also the latest
    3. The new record with old hash value 100 might not get inserted at all (most popular error) since hash value 100 is already existing - thus the full set of R1 stays tagged as N with no Y
    4. The new record with old hash value 100 can get inserted but subsequently can get tagged N as the hash value is old - thus the full set of R1 stays tagged as N with no Y

All of these can be overcome with some additional logic to handle them. It's not that we cannot use ORA_HASH to implement SCD2, in fact it is very handy, just that little more caution is required to cover all the scenarios. Where do you like to use ORA_HASH in your codes?

Tuesday, October 11, 2016

ODI Invoke Web Services with dynamic values

The ODI Invoke Web Service utility is extremely handy and lots of users are already using it in their systems. The tool allows to invoke an operation on a web service by the specified port number. Once we provide with all the required parameters, we can use the ODI tool OdiInvokeWebService to meet our required web services operation.

There are credentials to be provided when we use the command OdiInvokeWebService, and the parameters we use as per Oracle's guidelines are as below:


The tricky part is what we do to encrypt or hide our credentials from all users who has access to the environment. Using OdiInvokeWebService as a tool in a Package will not help us hide it. Instead we will use an ODI Procedure to achieve this.

First we will create an ODI procedure, and select the technology ODI Tools in the Source tab. Next, in the Topology, create a new Data Server (say X, under File Technology), and save the username and password in this Topology Connection that you want to use when you call your Web Service. In the field "Host (Data Server)", provide the <<server_name>>:<<port>> details. It is interesting to note that the credentials even if entered incorrectly here will work here if you do "Test Connection". So we shouldn't get misleaded by it, instead we should make sure we can login to the Web Service or Analytics or BI Publisher URL using these credentials to make sure they are correct, and then use them in this Data Server X. In X, create a new Physical Schema with some path (you may or may not use it later) in the server. Then let's tag this Physical Schema to a Logical Schema Y in the default context Global.

Now in the ODI procedure, in the Target tab, select the Technology you used for creating the Data Server X (say File), and then select the corresponding Logical Schema Y. Then in the Source tab we can use the below in place of the credentials. It will automatically retrieve the values we have saved in the Data Server X.

<%=odiRef.getInfo("SRC_USER_NAME")%> - This will dynamically fetch the username
<@=snpRef.getInfo("SRC_PASS") @> - This will dynamically fetch the password
<%=odiRef.getInfo("SRC_DSERV_NAME")%> - This will dynamically retrieve the instance URL

Thus we can see, for Dev or Test or Prod, we do not need to change the code with the above 3 information every time. Instead the same code will dynamically retrieve these details if we simply keep the Topology information updated with the instance specific details. The Topology thus provides incredible flexibility to Developer and Administrators and saves a lot of effort if used wisely. Do you have any ideas to use the Topology in more interesting ways?

Sunday, October 09, 2016

ODI Executions Parametrization - Pros and Cons

The Oracle Data Integrator scenarios and load plans provides a range of flexibilities when it comes to the executions. We can select the Context, Log Levels, provide values as input parameters as required, and all these add to the versatility of this tool. But it also comes with its own advantages and disadvantages. There are various factors that are in our favor and can help us a lot when we use them the right way, on the other hand it also can lead us to a challenging path which we should learn to tread carefully if taken.

Pros of Parametrization:

Flexibility - At the heart of the power of ODI is it's adaptability and flexibility. And during execution of scenarios and load plans, we can make maximum use of it. We can use variables in packages and procedures and pass the values of these declare variables while executing the scenarios. These scenarios when used in load plans provides us the same variables as prompts to pass values to. Thus, we can use the same code to execute for different variable values, giving us a range of options and dynamism while running it.

Durability - When we have codes that can run to meet different scenarios, it has higher chances of meeting changing business requirements in the long run. Thus it will be more durable than a static version of a code built for a specific scenario. So it is often in the interest of the changing business ideas that it is preferred to keep things flexible in hand so that maximum known cases can be covered by a single piece of code which takes inputs from the user.

Integrability - Codes that can be parameterized to handle multiple scenarios often are found useful to do something that can be integrated as a part of another executable scenario or load plan, or maybe integrate with itself. Meaning, we can integrate the same scenario multiple times in a load plan with different set of parameters. Thus using a single component, we are able to accommodate multiple requirements by re-using it more than once in one load plan. The load plan when executed successfully will give us the outputs (say files or data loaded to different tables) for both sets of parameters that were used.

Maintainability - The crucial factor that determines the importance of a parameterized code is it's maintainability. The code might involve a few more variables than it would have needed had we built it in the static way. But if we look at the bigger picture, it's definitely worth it. The number of scenarios that the code can handle is equivalent to the number of static codes we did not need to build, thus saving a bulk of the development effort and keeping the Production instance less cluttered. It provides a single starting trigger point for all the different scenarios it can cover.

Flow Control - The parameters that we use in our codes determines the path the job will take while executing. For each different set of parameters, the path can be different and provide us with a range of options. Thus we can have finer control and ability to decide the path we want the job to take as per the situation and requirement at that moment. This level of control would not have been possible without the use of parameters, and often proves to be extremely useful and handy in situations when we quickly might need the code to do something that is rarely requested.

Cons of Parametrization:

With great power, comes great responsibility. Yes, with the highly flexible code we have in our hands, we need to make absolutely sure it will never fail. This calls for huge rounds of testing, even for test cases you might feel irrelevant.

Branch Testing - When we have a code version that accepts multiple parameters, we have an array of paths that the same code might traverse for each combination of the parameters. To make sure the code is fully correct, exhaustive testing needs to be done to make sure that each path is behaving as expected and the results are meeting the expectations.  This is a must even if it asks for more testing effort, to have confidence the code will act as expected even when provided with lesser used parameter combinations.

Boundary Value Testing - One of the usually faced scenarios in ODI parameters is giving a range of values as parameters, maybe a start_value and end_value of some dimension, say time. In such cases, it becomes crucial to check whether the end points (in our case, months) are working as expected. The first and last month in the range should work fine as the other months. If user mistakenly enters reversed parameter values (start_val in place of end_val and end_val in place of start_val), that should work as well.

Decision Coverage Testing - When we use parameters in an Oracle Data Integrator package, no doubt we are using variables, and lots of variables to make sure we are able to handle all our scenarios. This calls for several true false paths each of which needs extensive testing to make sure we do not inadvertently fall into a trap which was never seen by us, worse that untested code path can end up doing something nasty (say data removal, which might call for re-running the loading processes) that can take several hours to fix.

Fuzz Testing - Even after extensively doing Branch Testing and Decision Coverage Testing, it is important to view from the perspective of an user and try running the parameterized code with random combination of values in an effort to see if all such combinations have been part of already tested scenarios and do not do something that we would not expect it to. It is necessary to document each and every scenarios captured, because in future you do not want to suddenly wake up at night thinking if something was tested or not, right?

Thus we can create something really useful, but at the same time be cautious that we do not hurt ourselves with something so much powerful that it starts overshadowing us. It's true machine and AI is the future, but we do want to make them flawless to avoid something like we saw in Isaac Asimov's fictions!

Friday, October 07, 2016

DRM Single Physical Hierarchy Multiple Logical Hierarchies

The Oracle Data Relationship Management application provides tremendous flexibility in terms of managing the master data for an organization. It has amazing customizing options and integrability with other Oracle products that makes it even more useful.

 Here I will explain how easily we can solve a complex business requirement using DRM. The requirement is to have 3 different hierarchies for 3 different downstream applications in a txt file for the same segment, say Account. So each downstream application will receive a different Account hierarchy from DRM - i.e. the parent child relationships will be different for the 3 applications.

No, I am not going to create 3 different DRM versions and use exports to generate the text files from each version. I will be using DRM properties and a few custom formulae to achieve my goal using a single DRM version. Let's say the version is called V1. I have 3 Defined properties named as FlagHier1, FlagHier2, FlagHier3 - as you can see by the names, these will be used for flagging purposes to identify which DRM nodes will belong to which Hierarchy.

Then I do the usual steps of assigning the 3 properties to a property category and then I open the hierarchy. In the hierarchy, say I have defined the nodes as below:

In Version V1,
   Nodes H1N1, H1N2, H1N3 are marked Yes in FlagHier1
   Nodes H2N1, H2N2, H2N3 are marked Yes in FlagHier2
   Nodes H3N1, H3N2, H3N3 are marked Yes in FlagHier3

They are arranged in a single physical hierarchy in version V1 in this parent-child relationship below:
TopNode -> H1N1 -> H2N1 -> H1N2 -> H3N1 -> H1N3 -> H2N2 -> H3N2 -> H2N3 -> H3N3

In the output DRM exports, each node must belong to it's correct parent present in the corresponding hierarchy - parent name in the text file cannot be the name of a node belonging to a different hierarchy. This will be achieved by the below Calculated Properties - which uses the 3 Flag properties to calculate the parent names of a node.

Formula for Hier1ParentName:

Formula for FindH1ParentName:
AncestorProp(=,Custom.FlagHier1,Yes, 0,Custom.NodeName)

Formula for Hier2ParentName:

Formula for FindH2ParentName:
AncestorProp(=,Custom.FlagHier2,Yes, 0,Custom.NodeName)

Formula for Hier3ParentName:

Formula for FindH3ParentName:
AncestorProp(=,Custom.FlagHier3,Yes, 0,Custom.NodeName)

What the above formula does is, it will keep traversing the hierarchy starting from the Current Node (denoted by the boolean value 0 false - if you give 1 it will start at the top node which is not required in our case). When it finds a "Yes" value for the property specified, it will return the NodeName of that node, thus returning our desired parent name.

Now, I will need to create 3 exports as below using the different properties as mentioned:

For Hierarchy 1, child and parent will be: NodeName, Hier1ParentName
Use Filter: Custom.FlagHier1=Yes

For Hierarchy 2, child and parent will be: NodeName, Hier2ParentName
Use Filter: Custom.FlagHier2=Yes

For Hierarchy 3, child and parent will be: NodeName, Hier3ParentName
Use Filter: Custom.FlagHier3=Yes

After running the first export (let's name the file Hier1.txt), I will see in my file 3 rows:

Note how the second row is reflecting correct parent (H1N1) of H1N2 even though in the hierarchy it is H2N1. Similarly verify the other exports.

That's all! Pretty flexible right?

Wednesday, October 05, 2016

Oracle BI Apps Incremental Load Plan Performance Tuning

The Oracle Business Intelligence Applications integration with Oracle General Ledger comes with an out of the box load plan that Oracle provides for extracting and loading into the BI data warehouse. However, this load plan comes with a lot of scope of improvement. I got the opportunity to analyze this in details and some findings are really helpful.

The first thing that comes to mind is whether all the steps are essential, and yes, they are. In fact, some relevant practices (like gather stats, updating the W_ETL_LOAD_DATES for every scenario, etc) followed in the load plan are critical for successful loadings and long term ease of maintenance.

Before the load plan was tuned, it was observed around 90 minutes was the average execution time, whereas after the changes, the average time has come down to 70 minutes. It is important to mention here that there were some custom loads also that ran as part of the load plan, so the tuning also involved tweaking them. The execution times are dependent on data volume, so keep that in mind while dealing with the stats.

The first change that was done was to move the SDE Dimension and SDE Fact loading phases in parallel. This Source Data Extract (SDE) step can easily help in reducing the time with no impact to the overall process. The dimension data and fact data will continue to be extracted from the source system independent of each other. The dimension loading will complete very soon (usually does unless you are having a massive hierarchy of segment values). The fact loading on the other hand will finish in close to 40-45 minutes (might vary based on your data volume).

The second recommended change is for custom loads in case I have them stitched with the incremental load plan. Many organizations and systems often have custom loads appended at the end of the load - since these loads are dependent on the DW loading and need to wait until the loading process completes. The Post Load Process (PLP) runs after the Source Independent Load Objects (SILOS) loading has completed. Any custom load that uses the DW fact tables can start in parallel with the PLP loading process. There's no need to wait (with no negative impact) for PLP loading to complete, since the fact data is already loaded. Thus we can save some more time with the parallelism kicking in.

What are your ideas of performance tuning?

Monday, October 03, 2016

DRM API Adapter URL not responding

The Oracle Data Relationship Management Web Services use the API Adapter to communicate with the DRM server. This Adapter forms an integral part of this Master Data Management tool for it to integrate and connect with other systems.

The first step is to make sure this Adapter is configured fine in the DRM Configuration Console. After opening the console, navigate to the "Host Machines" tab and then to "API Adapter Hosts".  The default port number 5240 is usually fine. However the host name localhost is not suitable often, I prefer it replacing it with something more meaningful, like maybe the server name. Then after restarting the DRM services, the api-adapter-service should show up fine under the running processes tab.

On visiting the API Adapter URL now even outside the server it should return the XML data fine. We are now ready to use this URL for integration purposes. For more details, check out this Oracle DRM API Guide.

Sunday, October 02, 2016

DRM Essbase Integration to refresh outline

The Oracle Data Relationship Management application has been consistently the leading master data management utility for businesses over the years. The flexibility of this application to easily integrate with other systems has been one of the best features. DRM integration with Hyperion Essbase to refresh the Essbase cubes outline can be accomplished in a variety of ways to update the metadata.

Approach using ODI:

Let's say I have an Account hierarchy in DRM. I have a Hyperion Planning application that has the Account dimension and relies on DRM to get the latest metadata. To start with, there is a list of properties that needs to be present already in DRM and values set accordingly - these are the properties that Essbase will require in the next steps:

Sample list of properties: Account, Parent, Alias: Default, Valid For Consolidations, Data Storage, Two Pass Calculation, Description, Formula, UDA, Smart List, Data Type, Account Type, Time Balance, Skip Value, Exchange Rate Type, Variance Reporting, Source Plan Type, Plan Type, Aggregation

The name Account represents a Dimension. It can be replaced as required by different dimension names like Department, Location etc. Using an external connection from DRM, we can save all the values from the DRM in a Oracle database table. Then after creating a rule file and associating it with the Essbase cube outline, the metadata in the outline can be updated using the new rule file and using the ODI Knowledge Module IKM SQL to Hyperion Essbase (METADATA). The name of the rule file will need to be used as a IKM parameter value.

Approach using flat files:

If I do not have ODI in my environment, then let's start with saving the DRM export with all the dimension property values in a csv or txt file. The file has to be in the format that Essbase can read, usually delimited by comma. The same properties mentioned above will be present in this file. Once the fresh file with the latest metadata is ready, it can be used to update my Essbase cube outline by loading it using a rule file. The fields in the flat file must map correctly with the rule file for the refresh process to work fine.

There are some Essbase property requirements that might be specific to my cube, maybe related to length or special characters. The DRM properties in that case can be adjusted and updated accordingly or for long term solutions can be validated in real-time using custom DRM validations.

Thus once the metadata is refreshed fine using any of the above approaches the Essbase outline will reflect the latest master data exported from DRM.

Evernote and how it transformed my digital life

As digital data has grown over the years and our dependency on gadgets has risen, there came a need for something more than the concept of simply storing files in different folders in the cloud. The concept of note-taking thus evolved and has made a steady progress over the last 4-5 years. I had Evernote installed in my phablet since March 2014 but never really used it until 2016.

When I started using it bit by bit for small amounts of information, I started realizing the significance of it. Today in my Evernote Premium account I have 649 notes in 14 notebooks with 80 tags categorizing each of them. The major driving factor for myself taking up the Evernote Premium membership even after having the OneDrive with Office 365 subscription was fine-tuning my daily life and re-organizing my digital life in the best possible fashion.

Taking quick notes the moment they need to be taken or jotting down quick ideas is now a breeze, more so with the Evernote widgets that allow one-tap access in smartphones. Going paperless is now easier than ever, with Evernote's smart camera - it is really good at scanning documents and provides different modes to save it as Photo, Document (my favorite), Color Document, Post-it Note or a Business Card.

Evernote has the feature to scan text in PDF and Scanned documents or Image files. This is extremely useful for me. I have saved a lot of my E-Books and documents in Evernote just for the fact that they are much more useful to me since I can search all the documents at once in Evernote instead of individually searching them or searching them via file names. It's like I am searching all the information that I have read in my life together at once, this feature is definitely worth the price.

On the left you can see my tag tree, under each bold tag, I have multiple child tags. The tags concept is a very handy and powerful tool for organizing all the notes. Most of my notes go to my notebook called "Personal", the ones I am yet to organize go to ".Inbox" (note the dot at the front - it stays at the top this way since special characters are given priority in Evernote while sorting). And I have a notebook for my E-Books. So the notebooks are not my main priority while organizing my stuffs, the tags make my stuffs organized.

Going paperless has been of priority for me over the last few months, so all bills and receipts are now online. The ones that are not but deserves to be saved end up in front of the Evernote smart camera. This has greatly reduced the clutter at my home and office. More than that, finding out information faster is much easier than before from the thousands of receipts accumulated over the months.

Evernote is nicely integrated with IFTTT also. The best part is I also keep backup of my smartphone using the IFTTT recipes. All my text messages are saved to a "Texts" notebook - just in case if I need to revisit or search them later after few years. All my blog articles are also automatically saved in Evernote in a notebook called "Insight Crunch" as I publish. All my iTunes and Amazon receipts are also automatically saved to the notebook "Bills and Checks". Any screenshots I take in my smartphone are also saved in Evernote in a separate notebook.

Overall, note-taking is going to shape the future of the personal cloud over the years to come. How do you keep your stuffs organized?

Saturday, October 01, 2016

OdiSqlUnload and the Possibilities

The Oracle Data Integrator is one of the best futuristic data transformation and loading tools with a variety of features for the end-user. ODI as an ELT tool provides a lot of inbuilt features for data processing using different technologies.

One of the most useful I find is the ODI tool OdiSqlUnload, which can be used in multiple ways in different scenarios for different purposes. To use this feature, create a new ODI procedure and in the Source tab select technology as "ODI Tools". Then using the below syntax we can create files with the listed customization options:
  • Dynamic output file path
  • Dynamically fetched schema username
  • Dynamically fetched schema password
  • File content is decided by the SQL query (the logical schema for the SQL to be executed in needs to be set in the Target tab of the ODI procedure)
OdiSqlUnload "-FILE=<%=odiRef.getSchemaName("DIRECTORY_FILEOUT", "D")%>/Output.txt" "-DRIVER=<%=odiRef.getInfo("SRC_JAVA_DRIVER")%>" "-URL=<%=odiRef.getInfo("SRC_JAVA_URL")%>" "-USER=<%=odiRef.getInfo("SRC_USER_NAME")%>" "-PASS=<%=odiRef.getInfo("SRC_ENCODED_PASS")%>" "-FILE_FORMAT=VARIABLE" "-FIELD_SEP=" "-ROW_SEP=" "-DATE_FORMAT=yyyy/MM/dd HH:mm:ss" "-CHARSET_ENCODING=ISO8859_1" "-XML_CHARSET_ENCODING=ISO-8859-1"
select <<column1>>, <<column2>>, <<column3>> from <<schema_name>>.<<table_name>>

Now that I am able to successfully generate the ODI generated files, let's look at the different ways this can be leveraged to cater to various scenarios:
  • Create and attach files to emails via the OdiSendMail tool
  • Generate error logs for ODI Load Plans and notify administrators
  • Create multiple files with different delimiter formats and data content in a single procedure by adding multiple steps
  • Create files in a single path using data from different physical sources using different logical schemas in different steps in a single procedure
Can you think of any more use of this cool tool?