Wednesday, December 19, 2018

Unix find Lines containing specific string, then Cut and Insert these Lines after specific Lines containing another specific string

The world of data integration and transformation gets more and more exciting every day. Where we have data fuelling intelligent applications, and transformations paving the way to create cleaner and leaner data. Let's dive in to some sorting and ordering transformations using shell scripting which will subsequently be triggered via ODI. We now have an interesting scenario where we have to identify all the records having the string "Parmesan Cheese", and after identifying the unique identifier, we have to cut this record and paste it under the immediate next record having the string "Filet Mignon" with the same unique identifier. Let's see a quick example below:

Base Dataset:
"X1","Y1","Z1","Parmesan Cheese","Yummy","Delicious"
"X1","Y1","Z1","Grilled Salmon","Amazing","Tender"
"X1","Y1","Z1","Filet Mignon","Juicy","Exquisite"
"X2","Y2","Z2","Parmesan Cheese","Yummy","Delicious"
"X2","Y2","Z2","Grilled Salmon","Amazing","Tender"
"X2","Y2","Z2","Filet Mignon","Juicy","Exquisite"

Required Dataset:
"X1","Y1","Z1","Grilled Salmon","Amazing","Tender"
"X1","Y1","Z1","Filet Mignon","Juicy","Exquisite"
"X1","Y1","Z1","Parmesan Cheese","Yummy","Delicious"
"X2","Y2","Z2","Grilled Salmon","Amazing","Tender"
"X2","Y2","Z2","Filet Mignon","Juicy","Exquisite"
"X2","Y2","Z2","Parmesan Cheese","Yummy","Delicious"

The below Unix script will process the data as per our required logic. First it will create a lookup file lookup.txt containing all the records having "Parmesan Cheese".

"X1","Y1","Z1","Parmesan Cheese","Yummy","Delicious"
"X2","Y2","Z2","Parmesan Cheese","Yummy","Delicious"

In summary, for each record being read in getEntireRecord from this lookup file, it will take the getUniqueRecIdentifier ("X1","Y1","Z1") and find the line number lineNumOfFiletMignon (3) of the "Filet Mignon" record having same identifier ("X1","Y1","Z1"). Now we know where to insert the "Parmesan Cheese" record getEntireRecord - the line number will be lineNumToInsertParmesanCheese which is the next line, so add one (3+1=4).

Full logic:

grep 'Parmesan Cheese' $filename > lookup.txt

while read -r line
getUniqueRecIdentifier="$(cut -c1-8 <<<"$readLine")"
getEntireRecord="$(cut -c1-100 <<<"$readLine")"
generateSameIdFiletMignon=$getUniqueRecIdentifier""',"Filet Mignon"'""
lineNumOfFiletMignon="$(grep -n "$generateSameIdFiletMignon" $filename | head -n 1 | cut -d: -f1)"
lineNumToInsertParmesanCheese=$((lineNumOfFiletMignon + 1))
sed -i ''"$lineNumToInsertParmesanCheese"'i '"$getEntireRecord"'' file.txt
lineNumToBeDeleted="$(grep -n "$getEntireRecord" $filename | head -n 1 | cut -d: -f1)"
sed -i ''"$lineNumToBeDeleted"'d' file.txt
done < "$lkpfilename"

rm $lkpfilename

Detailed Explanation: To identify lineNumOfFiletMignon we are using grep -n as seen below, with head -n 1 to get the first record for the specific combination, even though we know it will give only one record in our case. Then we have cut -d: -f1 to get the first column as the Unix line number.

lineNumOfFiletMignon="$(grep -n "$generateSameIdFiletMignon" $filename | head -n 1 | cut -d: -f1)"

Now we are adding one to lineNumOfFiletMignon to get lineNumToInsertParmesanCheese.

lineNumToInsertParmesanCheese=$((lineNumOfFiletMignon + 1))

Once we have identified lineNumToInsertParmesanCheese we can use sed -i then the line number where we want to insert our record followed by the record string and file name. Since we are iteratively storing the entire records iteratively in getEntireRecord from lookup.txt, we are using the same in the sed -i command.

sed -i ''"$lineNumToInsertParmesanCheese"'i '"$getEntireRecord"'' file.txt

After we do the above, we are going to have a duplicate original record of "Parmesan Cheese" that has to be deleted, this is calculated in lineNumToBeDeleted by using the entire record string getEntireRecord which was retrieved from lookup.txt.

lineNumToBeDeleted="$(grep -n "$getEntireRecord" $filename | head -n 1 | cut -d: -f1)"

The duplicate original line will be removed by the below sed -i command where we are providing the line number lineNumToBeDeleted to be deleted with d at the end for deletion, followed by the file name.

sed -i ''"$lineNumToBeDeleted"'d' file.txt

Then at the end we can safely delete our lookup file, which was happily storing all the "Parmesan Cheese" for us until now!

The above activity can also be done in Excel macro, but considering the amount of maintenance and scalability factors, we are clear which option to choose now.

Sunday, August 26, 2018

ODI BI Apps Machine Learning Power-Up

Like many millenials, one of my favorite video game was Mario. The most satisfying thing about Super Mario were the power-up mushrooms - that gave it added abilities, to keep conquering world after world. Similarly, today when data and information are created at an accelerating rate, outstripping the ability of humans to keep up - it becomes imperative for enterprise operations to enable a digital workforce to achieve demonstrable gains in efficiency and productivity.

Operational Analytics, with subjective experience, is indeed very much useful - but it's more inclined towards Descriptive Analytics, based on what has already happened in the past. With Predictive Analytics, it opens up a whole new world where we can design algorithms to detect complex pattern - and provide powerful insights to predict the future. The more powerful our mathematical algorithm, and the more robust our datasets, the better we get with our statistical and strategic inferences.

With a journey that started few months back with analyzing and synthesizing vast amounts of logs generated by Oracle BI Apps, Oracle Hyperion Essbase, and Oracle Data Integrator, it's fascinating to see how today unprecedented levels of efficiency and quality can be achieved by transcending conventional performance tradeoffs. Let's coin in the term Intelligent Process Automation here - since it will not be fair to navigate this picturesque landscape without getting a deep feel of the next-gen tools forming the core of this cognitive technical process.

How does IPA fit in our ODI BI Apps Power-Up? Well, wait for it, let's put it out there in as much crispy and munchy (reminding me of chocolate chip cookies...umm..) way as possible. We get to know the answers to all the following questions today, in near real-time. When does the application encounter "ODI-10188: Error while login from OPSS" due to Authentication issues which causes critical Production ODI jobs to fail? When does the application face errors due to "Unable to create connection to LDAP" which creates fatal scenarios in complex running processes? When does the application face errors like "LDAP response read timed out" which causes ODI jobs or online OBIEE reports to error out? Can our IPA model figure out what went wrong by itself and let me know?

Now, let's see what happens when we create a model that will continuously "teach" our "agent" to "learn" from the stream of situational data, analyze the same, and respond to complex queries. What happens when we inject decision-making capabilities to enhance our "agent", such that it is able to learn and adapt with time? We start getting answers to all the following questions - how stable does the system look? Since applications and jobs running fine does not necessarily indicate everything is fine, should we be aware of any "indicators" that can serve as giving us predictive information of the future state? Why is the application or system behavior the way it is now? Which teams need to be involved right away when the system behaves in a specific pattern? Can the system auto-heal given a specific scenario and then share that information? When can we anticipate a specific good news scenario that happened in the past? How can we predict a major upcoming issue that has happened in the past? How close are we to reaching our specific target figures?

Thus with the interplay of concepts, technologies, it's fascinating to see how we are able to create strategic assets, helping us achieve unprecedented levels of efficiency, control, quality, and most importantly, speed - which is definitely poised to transform the existing workforce, with radically enhanced response times, and ofcourse, reduced operational risks.

Tuesday, July 10, 2018

SNP_SESSION - The Data Analyst’s Dream Table of Oracle Data Integrator

The Oracle Business Intelligence Applications stack provides an array of tools during the implementation, and each of them comes with its rich set of features. The awesomeness comes when we get to experience several business use-cases and scenarios, analyze the metrics and data, interpret them along the lines of the business process, and ultimately when also we encounter product limitations - and discover amazing ideas to make our lives easier. deep the word sounds, becomes almost synonymous in such cases, as we make breakthroughs through innovative complex ideas.

All of us are aware of this repository table SNP_SESSION in ODI, the unattractive component that shows lots of rows and numbers and dates, often just helps only to find a specific information and then we are done with it. In an environment where overnight several incremental loads consume around 6 hours daily, it generates lots of logs and data and writes to all of the repository tables, including session level details in SNP_SESSION. All information of every session like rows processed, rows inserted, rows updated, period and filter variables used, duration taken, start time, end time - are logged in SNP_SESSION.

To understand the prowess of SNP_SESSION, we need to get to a few questions first, and then the train of thoughts and discoveries can follow. For a session am interested in, what is the behavior of this session over the last 4 months? Does it have any pattern during specific periods? Does it have any relation with other sessions’ attributes? Does a data volume of another session or duration of a different session influence this? Since rows processed do not always proportionately impact session durations, does a % variance of a different session impact the session am interested in to an extent? Say in my today’s load, can I find which scenarios from the past repeated today, say with similar data volume or % variance in duration? Can I foresee untold information or what is going to happen as the loads progress by real-time analysis of the data?

It’s been very exciting to know over the last few weeks that all of the above questions can be answered, tremendously by using SNP_SESSION, and with some help from SNP_LP_INST and SNP_LPI_STEP. We have implemented a solution which is now in its final testing phase with live data, and will heavily complement manual human monitoring activities - by providing root causes before the impact happens, and providing additional insights into the application which otherwise often gets overlooked due to the vastness of the system.

We have calculated the weighted average of each session duration, load plan wise, over a period of last several months, with an algorithm that took a long time to develop after a lot of brainstorming. Then came the perilous task of calculating the standard deviation of the weight samples so as to help calculate the accuracy of our analysis, but it finally happened! Next came analyzing the data volume - with NB_ROW, NB_INS, NB_UPD already available in SNP_SESSION and waiting for us. Comparing today’s volume with the weighted average for the corresponding session itself started giving insights, but we wanted more. We asked what next, what if, why now, what then, and each metric opened up new paths before us to explore.

Each field of SNP_SESSION gave rise to almost 3-4 metrics of it’s own, giving rise to real-time daily calculation and analysis of the datasets during executions, and the impact it causes to the parent load plan. During the execution of each load, we are able to get insightful emails consisting of detailed analysis of the load - and a similar day in history if today’s scenario matched, and what happened then, and how the day went with the consequent activities.

But again, we need a single indicating factor, giving rising to the calculation of probability - as a single point of figure to indicate the possibility of actual realization of our real-time predictions, for each prediction. Hence more brainstormings followed, and finally now every email gets tagged by a probability that makes it so much more meaningful. Thus it becomes so true in today’s world, data is only useful when we know how to process it to our benefit - and it will truly continue to become more and more the case in future!

Saturday, June 09, 2018

Sensational Sequoia

The exquisite Sierra Nevada, in the magnificent slopes of California, is the home to some of the oldest living things on earth – the dazzling Giant Sequoia trees, some as old as 3500 years. The stunning facts of these exquisite trees were as overwhelming to me as a 10-year old, as it is today – and the long dream of experiencing these breathtaking wildernesses came true few days back.

Driving via Three Rivers Visalia through the wild fascinating roads uphill to the Giant Forest is an experience in itself that takes some time to sink in. The gradual change in the landscape around us as we approach is magnificent, the appearance of the trees, the heights, the feeling of suddenly shrinking is so electrifying – one just gets dazed.

The role of forest fires on the lifecycle of the Sequoia trees is interesting. Due to reduction in the number of natural forest fires today, the Park Service executes controlled fires to remove competing vegetation to allow Sequoia seedlings to germinate - which has difficulty otherwise. The Giant Forest Museum is a storehouse of some gripping figures and facts – the Sentinel tree greets all the visitors as we enter the museum, a statuesque of over 250 feet high.

As we went down from the parking lot via the Sherman Tree Trail, we could feel the energy and excitement around us – the weather felt perfect for visitors, not a surprise given that the Sequoias thrive best in humid climates. When we reached, standing in front of the General Sherman Tree – the magnificence and awe was too much to absorb, and took an hour of lingering around; to really believe it's happening for me. A part of me felt complete. For me, it was always more than a tree. That day I realized it for real. The largest living tree – the all so familiar impressive artistry that we all have craved to see for so long after reading about it in books – right before us.

The enchanting walks around Sequoia forest, the rousing emotions, the captivating views, the grace and elegance of surviving thousands of years withstanding fires and storms, converted it into a spellbinding magical forest – resulting in an indelible weekend.

The wildlife comprises of hundreds of black bears, rattlesnakes, mountain lions, and some friendly rodents. The visitor guide below is really well versed with all the different situations that our curious minds wandered on, and it was interesting to read.

Friday, June 08, 2018

Coinbase and the Blockchain Revolution

As we step over into the age of new technologies that has the power to revolutionize the very meaning of currency and fundamental aspects of society, it's evident that such transition will take time to absorb and fill into our lives at a mass scale.  Some notable authorized organizations being the
driving force today like Coinbase and Facebook helps bring in an air of assurance and credibility into this world, which is otherwise often torn apart regularly due to hacking and other mishaps.

The concept of Coinbase at being the most friendly user interface driven product appealed to the people so much that millions have subscribed to it over last few years. But with the growing volume of interest in BTC, XRP, ETH, XLM, LTC, ETC, BCH, EOS, TRX, ADA, XMR, and other cryptocurrencies, it's natural that delays in deposit, withdrawal, and transfer of the currencies will start to pile up and create frustration among users unless addressed regularly. Even with security features like Coinbase Vault, Trezor, Electrum, Robinhood and Exodus, the landscape is as good as it's processing time is, and speed is thus a prerequisite to be able to get into our daily lives as live currencies. The sudden variances in the volume of user hits due to rise and drop in market price of a cryptocurrency does not seem likely to calm down soon, and needs a strong infrastructure to support this craze.

The below will capture a snapshot of your experience and will get added to the consolidation summary of all users.

The summary of all the data that is being recorded is represented schematically below to get an overview of the nature of all user experiences.

Optional: Upload a file.

Sunday, March 18, 2018

Excel Pivot Table Count Distinct Values Challenge Overcoming

The use of Microsoft Excel automatically becomes a powerful tool to dive deep into the sea of data and form perceptions while generating interesting data models. Recently while in the middle of such an exciting activity came a moment where we were stuck with a not-so-latest version of Excel and thus we are missing the oh-so-lovely built-in Count Distinct formula for a Pivot Table. Yes it's a deal-breaker, when we cannot avoid a pivot, and also desperately don't want to create a different standalone table or formula for a calculation to count the number of distinct values for a combination.

Say, we need to find for each Attrib_1 values (Column B) how many distinct IDs (column A) exist. Thus we can see AX and BY are repeated in rows 5 and 9 and so we need to tag their duplicate occurrences with a 0.

In the first approach, column E, we check if the row number of each row equals the first occurence of the unique combination (Column D) we are looking for:


In the second approach, column F, we check if the counted value for the unique combination (Column D) we are looking for exceeds 1, in which case it's a duplicate and tagged 0. The range of this formula increases like $D$2:D3 $D$2:D4 $D$2:D5 as it goes down and thus the countif function can calculate from the top down. This needs tad more effort to type and create than the former.


Now if we select values from the purple buttons above we can see how vibrantly the pivot chart tells us that the count of distinct combinations for X is 3 (i.e. AX, BX, CX), Y is 2 (i.e. BY, AY), and Z is 1 (i.e. CZ). Once the data is ready, we can use ODI or any integration tool to further process this intelligent dataset.

Saturday, January 06, 2018

wss_username_token_service_policy use case for Oracle EBS and DRM Integration

The year of 2017 was an incredible year with tremendous ups and equal downs both at a professional and personal level. However it has again helped in garnering some very fruitful insights regarding everything around me and to plan few things better ahead. This is not to undermine any of the other years like say 2016 or 2015, but it is just that the impact of some of the incidents and the decisions that I have done and taken in 2017 will be changing my life forever. Let's see what 2018 has in hold! Wishing you a very happy new year ahead!

The web services play an important role in the authentication process for the EBS and DRM metadata integration. Few months back during the DRM repository movement we came across a few challenges with the MDS schema database host info which enlightened a few areas and paved way for some more personal study. After the initial setup, once the oracle-epm-drm-webservices WSDL is up and running fine, we need to attach a security policy to this application. This will ensure that clients like the program "Load Segment Values and Hierarchies" makes the request to the WebLogic Server to get the system generated token for the user (say EbsIntegrationUser) which can be passed to DRM. Then DRM can validate that token with OID to verify authentication instead of requiring a username/password.

Oracle Web Services Manager (OWSM) will need to be deployed first in the same EPM Server and domain where DRM Web Service is deployed. The database repository schema name for OWSM is set to a different value and usually ends with *_MDS which corresponds to Metadata Schema.

Once done, the new policy needs to be created in Weblogic under Farm_EPMSystem  Weblogic Domain  EPM System  Web Services  Policy Sets. Then in Step 3 for "Add Policy References" we need to select and assign wss_username_token_service_policy.

The details of the steps to be followed can be referred here. There are other policies also that can be used as per the scenario faced, however for this specific integration an authentication token suffices. Here are some more details related to authentication and uses of web services.

The ultimate test will be to make sure the token name is visible in the WSDL url. If the attachment of the policy is done fine, it will reflect in the URL. Else there's another approach to manually attach the policy which is kind of a workaround and done only in exception scenarios, which we faced few months back.