Sunday, October 23, 2016

Sqoop Import and Export tables from Hive to Oracle Database

Exporting and Importing table data from Oracle database to Hive and vice-versa is one of the most common activities in the world of Hadoop. It is essential to get sorted out on a few basics for seamless first time integration so as to avoid various parsing and loading errors.

We will be doing the below activities sequentially so as to cover all the integration points between Oracle database, Sqoop, HDFS and Hive.

    Step 1: Extract data from a source Oracle database table to Hadoop file system using Sqoop
    Step 2: Load the above Sqoop extracted data into a Hive table
    Step 3: Use Hive query to generate a file extract in the Hadoop file system
    Step 4: Load the generated file in Step 3 to a new target Oracle database table

Step 1: Sqoop import data from Oracle database to Hive table

Our first task is to identify our source Oracle database table, and then use Sqoop to fetch the data from this table to HDFS using Sqoop.

SQOOP IMPORT --connect "jdbc:oracle:thin:@<<database host>>:<<database port number>>:<<database service name>>" --password "<<database password>>" --username "<<database username>>" --table "<<source schema name>>.<<source database table>>" --columns "<<column1>>,<<column2>>" -m 1 --target-dir "<<HDFS path>>" --verbose

It is interesting to observe that we need to identify a primary key for the source Oracle database table. Else we will get the error "Error during import: No primary key could be found for table". If we want to skip assigning a key we can include the highlighted parameter -m 1.

Step 2: Load the above Sqoop extracted data to a Hive table

Assuming we already have a table created in Hive, we will load the file created in Step 1 into the Hive table using the below syntax.

LOAD DATA INPATH '<<HDFS path>>' INTO TABLE <<hive table name>>;

Step 3: Export a file using Hive query to be consumed by Sqoop

Now that we have the data in our Hive table, we will use the below command to create a file using a custom Hive query, in the green highlighted path. The delimiter highlighted in yellow can be changed according to our requirement - but accordingly it must be changed in Step 4 also where it's highlighted in yellow.

INSERT OVERWRITE DIRECTORY '<<HDFS directory>>/<<file name>>'
select * from <<hive table name>> where <<condition>>;

Step 4: Load data from Hive table exported file to Oracle database table

The below command will use the above Hive exported file (from the same green highlighted path) to load our target Oracle database table.

SQOOP EXPORT --connect "jdbc:oracle:thin:@<<database host>>:<<database port number>>:<<database service name>>" --password "<<database password>>" --username "<<database username>>" --table "<<target schema name>>.<<target database table>>" --input-fields-terminated-by ',' --input-lines-terminated-by '\n' --export-dir "<<HDFS directory>>/<<file name>>" --input-null-string "\\\\N" --input-null-non-string "\\\\N" --verbose

The orange and blue highlighted sections above helps reading the records while exporting to the target database table. Else we might sometimes encounter the error "Can't parse input data".

Thus we have successfully loaded a table from Oracle database to Hive and again back from Hive to Oracle database, using Sqoop. We can query our Oracle and Hive databases to check the data if it is loaded fine. How do you prefer to load your data between these two systems?

Saturday, October 22, 2016

How ODI IKM SQL to Hyperion Essbase (METADATA) influences data loading

Oracle Data Integrator integrates with Essbase for metadata as well as data loading using different Knowledge Modules. Each of the KMs provides a range of options for us to customize the loading as we want. The IKM SQL to Hyperion Essbase (METADATA) is usually the starting point when we begin our activities, since first we will load the metadata and get the outline ready, then we can load the actual data.

The standard practice of the using the IKM SQL to Hyperion Essbase (METADATA) is to create an ODI interface for each dimension using this KM, and provide the values of the parameters (refer page 4 for full list) in each of the interfaces as applicable. As observed below, we need to create a rule file for the different dimensions in Essbase and provide those rule file names as the value for the parameter RULES_FILE. In case we need the data to be loaded in a specific order, we can use the ORDER_BY clause accordingly.

So we can create an ODI package with 6 different interfaces, each with it's own rule file and loading the corresponding dimension hierarchy in the outline. The value for RULE_SEPARATOR need to be set correctly here, else the outline will not reflect any updates and the interface will not work. The value for RESTRUCTURE_DATABASE defines what to do after the metadata loading. If we are clearing out all the data prior to our metadata loading via MaxL, then the default value KEEP_ALL_DATA does not make a difference. Else we can use the values KEEP_INPUT_DATA or KEEP_LEVEL0_DATA or DISCARD_ALL_DATA as per our requirement.

The Essbase cube outline refresh plays a very important role in the daily life of the cube. If the loading process or parent child loading sequence is not set correctly, the members can fall off during the loading phase. For example, if we try to load the child before it's parent, the child will fail to get loaded. And then the parent will stay without it's child at the end of the load. It becomes even more difficult to track such scenarios if we have LOG_ENABLED as the default false value.

The LOG_FILE_NAME and the ERROR_LOG_FILENAME can prove to be very beneficial in such cases. It is always recommended to generate a log during our loading phases, unless we have a strong data quality control check in place. Incorrect outline build can drastically affect the subsequent data loading process, leading to multiple records rejections due to missing members in the outline. This can lead to confusion and data mismatches across different data sources, thus causing a nightmare for developers, more so if insufficient logging is in place.

While loading the dimension members, we can also use the ODI interfaces to load the Alias values of each of the members. If we have more than one Alias, we can accordingly use multiple interfaces and rule files to populate the values accordingly. Only thing is we have to make sure each of the rule files point to the correct Alias in the path below:

    Rule File → Dimension Build Settings → Global Settings → Global Properties → Update Alias Table → <<Alias Table Name>>

So we can use identical ODI interfaces with different values in the Alias field in the Target and different rule files values in the flow properties to load them.

The ODI interfaces during meta data loading sometimes gives the error 'Cannot open cube outline'. This is often caused by a parallel running job which is in incomplete status, thus preventing our interface to have a lock on the outline. Or it can be due to a developer who has locked the outline in edit mode but forgotten to unlock it - thus again preventing our ODI interface to get access to the Essbase outline in Edit mode. In such cases we need to identify the lock and then release it, then restart our ODI process.

This IKM is pretty peaceful otherwise and keeps doing it's tasks quietly over time. How do you use your IKMs to refresh the Essbase metadata?

Friday, October 21, 2016

Analyzing ODI data load using IKM SQL to Hyperion Essbase (DATA)

Oracle Data Integrator provides a range of Knowledge Modules to integrate with and process data from and load to various applications, and it is no different with Oracle Essbase. The Knowledge Module IKM SQL to Hyperion Essbase (DATA) loads the data from a Oracle database to an Essbase cube - and has various parameters (refer page 5 for full list) to customize it as per our requirement.

Now we need to have the data ready in our Oracle database table to be used as source, usually having all the dimensions (like Time, Account, Department, etc) and then the fact value (Actuals) as the base columns (this is the minimum requirement). We can have more than one fact value (say Budget data), these can be loaded to the Essbase cube using the same source table or another different table, whichever is convenient.

Next we select the Essbase cube as our target - say Sample.Sample. So now we are ready to create the ODI interface to load our data. We drag the Oracle table datastore in our source and the Cube datastore in our target. We will select the IKM SQL to Hyperion Essbase (DATA) as KM in our flow properties for the Target as shown below. It is important to observe each of the parameters closely for this ODI interface to behave exactly as expected to.

The default values usually suffices, but to optimize our loading, it is advisable to play around a little with the values for COMMIT_INTERVAL and the RULES_FILE. It can significantly enhance the performance of the loading process. Also note that the value 0 for MAXIMUM_ERRORS_ALLOWED does not signify the loading stops even for one error, it's just the opposite, here 0 signifies infinity. So the process will ignore all errors and succeed always, even if it is unable to load any record. The log file configured for the variable ERROR_LOG_FILENAME can often come in handy in such cases if the ODI Operator log or the Essbase application log do not provide any fruitful information.

The CLEAR_DATABASE option can be set to true if we are doing a full refresh every time. Also, this same activity can be achieved through MaxL (using ODI OS Commnad to call the MaxL script) for some more granular control.

The CALCULATION_SCRIPT option is a very handy option and is really useful when we need to run a calculation script immediately after our loading process completes. Thus we can skip a MaxL step by incorporating the calculation script within our ODI interface.

One of the common issues faced during the loading process is records getting rejected due to unknown member or missing member. This can be taken care by some robust data quality control check which is often overlooked and not given due importance - but it definitely saves a lot of time and effort in the long run.

Another commonly faced error which does not get captured in the log explicitly is the incorrect value of RULE_SEPARATOR. The default value is comma, but sometimes due to cloning or  migration issues, when we migrate the ODI interfaces with missing references errors, the values set for the IKM gets lost, and returns to the default comma. In such cases, it is best to migrate the interfaces again from higher instances (like Production) after the missing reference errors are fixed. This missing reference error after cloning or migration can happen due to missing KMs, or due to different internal ids for the same ODI objects and KMs between different ODI repositories. If re-migration becomes impossible, we need to re-configure the values for the IKM all over again, and at that time need to make sure we use the correct value for RULE_SEPARATOR.

We can also work on improving the performance of the data loading by customizing the FETCH_SIZE parameter - this parameter basically determines how many rows are read from the source at a time. John Goodwin has a nice article here regarding this.

What's your experience with the IKM SQL to Hyperion Essbase (DATA)?

Tuesday, October 18, 2016

ODI Automation of Smart View Manual Activities

Oracle Data Integrator provides multiple features to automate almost all manual data extraction and transformation related activities. One such scenario is using Smart View to manually fetch data from the Essbase cubes, maybe weekly or monthly. Often such cases require changing the values of time dimension over the entire scope of the template (maybe to different values for current year and last year, or current month and last month, in adjacent columns, etc) and then retrieve the data to meet our needs. So it might become time consuming and prone to manual errors.

This is where ODI comes really handy with the KM called LKM Hyperion Essbase DATA to SQL. First we need to create our report script (say X.rep) within Essbase to fetch the data we require. This report script is the first layer of data extraction and must include all the relevant information and records. It might not necessarily be formatted exactly as per requirement but should extract all the data for the next processes to act on this retrieved data.

Since the report script is containing all the filters for time dimension and other parameters - it is essential to keep it dynamic to avoid manual intervention. Lets save my report script content in a table T1 and instead of hard-coding the time value, I name it "DummyPeriod". Then using an ODI interface, my first step is to create the report script with the actual value of "DummyPeriod". Using the KM IKM SQL to File Append, I use the table T1 as my source and the file X.rep as my target. In the target, I use the REPLACE function to replace "DummyPeriod" with my actual period value (this value can get refreshed by a ODI variable). In the Flow properties for this IKM, the value for TRUNCATE must be set to True - else the Report Script will get appended with it's old content every time I create the report script using the ODI interface. Thus the report script gets created dynamically every time with dynamically generated values.

Now, in a new ODI interface, a datastore which matches the report script's number of output columns and data-types will act as the Source, and a database staging table will act as our Target. The LKM Hyperion Essbase DATA to SQL will be used here. It has the flow control parameters EXTRACTION_QUERY_TYPE and  EXTRACTION_QUERY_FILE (refer page 9). Our values in this case will be ReportScript and X respectively. Within this interface, we can map the fields from the source to the target and use any filter or join or transformation as required.

Once the data is loaded in the target database staging table, we can implement standard ETL transformations to modify the data as per our required formatting. Then we can unload this data in a delimited file as required. Once the file is ready, it can be zipped and sent over as an email attachment to the intended recipients. It can also be sent via SFTP to any other server and path if required, thus completely eliminating any human intervention.

So now we can save our valuable time to catch some Pokemons! How do you automate your Smart View activities?

Saturday, October 15, 2016

ODI Trigger Essbase MaxL Dynamically with no hard-coding

The Oracle Data Integrator tool provides a flexible option, though customized, to call and execute MaxL scripts to perform various operations on our Essbase cubes. And more importantly, this approach using ODI is completely independent of the environment we are going to run our ODI codes in. A simple control table will contain the server information that will be referred by the ODI code to run the MaxL scripts in that specific Essbase instance.

So the first step is to create my control table with all the different values that will be used by my ODI code. This control table will be present in all the instances like Development Test Production, each with it's own unique values as applicable for that environment. Usually the server admins will be having all these information and the table might only have read access in Development instance. In higher instance the developers might not be allowed to have read access on this table since it will contain sensitive information like the user credentials. The below are the parameters that will be stored in the control table:

    1. Server Directory (where I will find my script)
    2. MaxL Directory (where I have saved my custom MaxL scripts)
    3. The name of my Essbase Application
    4. The name of my Essbase Database (maybe same as Essbase Application name)
    5. Essbase Server Name
    6. Essbase Login User Name with full privilege
    7. Essbase Login Password for the above user

Now, once I have all the required values, it is time to start creating my ODI package. I will need 7 variables to fetch the values of the above 7 parameters. Then I will use an ODI command with the below syntax to run my MaxL script.


The MaxL script now will need to start with the below lines to accept the above values:

    /* Initialize variables */
    SET app =$1;
    SET db =$2;
    SET esbs =$3;
    SET user =$4;
    SET pwd =$5;

    /* Login to the server */
    login "$user" "$pwd" on "$esbs";
    iferror 'loginFailed';

Then the MaxL script can proceed to do it's own activities (like executing calculations) on the cube as usual. That's all to this short flexible process! How do you prefer to execute your MaxL scripts?