Friday, April 16, 2021

Whole to part panorama

 Any new initiative brings with it several bytes of data to start with. When we start with the goal to derive insights, evaluating the available raw data becomes the only activity for days. Fields stop making sense with relation to another if at all when we challenge it enough but often enables modeling with precision with regards to the context.

 It all began with a few spreadsheets with lots of related data from respective areas, the discovery activity started with all the enthusiasm and energy similar to past activities. When we dived deeper to build the whole picture we started to struggle in putting the individual parts together. It seemed more complex than we anticipated, after adding moving and rolling aspects to current key measures. The more we focused on deriving the key metrics, the more the data fought back. 

It was then a transformative journey where narrowing down from the goal backward for a change proved helpful. Adding in the variables and factors to account for precision along the way, moving ahead and backward in timelines, until it looked well enough to be stable. Few metrics that apparently did not look convincing enough proved useful in enhancing the accuracy of our insights. 

A data discovery journey is not without its fair share of hurdles, but it gets more exciting when we are able to create something more than what actually existed before and what we hoped to achieve. Overturning the conventional part to whole relationships and stereotypes it was an incredible satisfaction on being able to finish painting the final picture. 

Thursday, February 11, 2021

The Elusive Paradox

Curiosity to derive a metric however simple or complex it may be requires clear objective and precision right from the initial stages. Adding to it the modern examples of flashy dashboards and infinite interactivity, and we have all the ingredients of a recipe that can go extreme either way. All that dazzles and sparkles need not help visualize data in the best possible way.

In our efforts to derive insights from an ocean of data, all that we initially managed in the first few weeks was plain text, and then some more fields of plain text, and more such fields of plain text. It is only when we were halfway through our intense data prepping process that we gradually started to realize the opportunities ahead of us, and how much more calculations and fields we needed to implement the aesthetics in our insights. It was interesting to learn from some of the pros about how the landscape changed in exactly these same phases of data processing over the years, and how we approach similar scenarios differently today.

Some of the easily derivable figures helped us get the straightforward metrics, but then it was all up to the questions that were posed before us. Answers that were impossible with what was available to us slowly started to take shape and led to more intuitive insights which were not evident. To our delight, it was then time to choose our visuals and once the effectiveness of each variety and classification were assessed, we were able to narrow down to a few that best suits over our pages of topics.

Some of the most decisive areas of study are often on where we are headed, contrary to the usual focus of comparisons and forecasting magnitudes of change. Where we can specify and identify values of combinations that we can focus on for further investigation. The elusive areas of interest get tougher to detect amidst the noise, and that is when modern concepts of data processing help us step up the game.

Friday, November 06, 2020

Catch 22 Paradigms

It seldom comes as a surprise when a fascinating vision of deriving insights starts with tons of bytes of texts and numbers looking like anything but structured to start with. An engagement gets all the more exciting while engaging with a pool of bright data individuals, and a great pleasure to be able to interact with a thoughtful team discussing the subtle intricacies and popular industry challenges and solutions. The tougher the path became, the more exciting the discussions became, often for hours.

If we push a data long enough, it will take a shape or form that we might be looking for it to get to. This is a scenario we would like to stay away from, stay unbiased, and prep the data to the fullest form of its usability. The analysis of the raw data is what can be called the most painful and time-consuming of all. Something that seems a mere non-classifier is an excellent piece when combined with another, or maybe not just one more, a few more. There is no guarantee that the data will learn to talk to us easily, hence we went down the path for weeks, implementing all our concepts and techniques along the way of prepping data. With quite a few tools at our disposal and picking up a few more along the way, things slowly turned to take the form which finally seemed insightful enough to all of us.

However, did we force the data to project this insight? Or is the insight a natural outcome of the way we processed the data? Would we still be getting this same answer from our data had we processed it differently? Was our question biased in some way unknown to us even after so much analysis? Are traditional obvious methodologies so imbibed in us that we tend to apply them erroneously where we shouldn’t be? Did we wrap up cleaning the data too early before we learned more? This was one of the main challenges that we faced, leading to iterative cycles of workshops and giving rise to deeper questions in every iteration. Often it seemed so dark without any possibility of light around. What works in one place, will very likely not work in another, even when things look almost identical, we learned this the hard way.

A situation like this is extremely satisfying and yet challenging after weeks of brainstorming, and enables more learning and nerve-wracking workshops. Hours where anomalies seem valuable, and omitting outliers feels like a sin, feeling the data blend into ourselves all around us as we crave towards the perfect insight with every tougher question presented to us, finally helped arrive at data analysis in front of us rich enough to satisfy most of us. While enough items remain unexplored still, strategic solutions such as these are quite a step in the vast expanse of the data world.

Tuesday, August 04, 2020

Counting hails in the hailstorm

The true essence of technological advances comes with our understanding of the surroundings, getting better with the aid of tools that never existed before. As analytics continues to evolve at a blistering pace, it brings with it the ability to take decisions that affects the lives of millions around us. Out of seeming nothingness we get concrete patterns that could only have been imagined few decades back.

As interesting scenarios continue to emerge in a changing superfluous landscape, the veil is lifted gradually as we intensely dig deeper using instruments ready to redefine our future. Our profound ignorance often becomes bluntly evident as we start to passionately navigate the curves of the charts, allowing us to gradually achieve a level of awareness at which we learn to be amused rather than shocked. The below graphical representation using Tableau of Coronavirus statistics of India depicts statistics of confirmed cases, cured cases and death cases across all the states as of Aug 22nd 2020. It’s strikingly concerning when clicking through the percentages of health stats, we find some states which seemingly have low confirmed cases are not doing too well in overall death percentage. Or states that appear dangerously higher up in rankings of confirmed cases are often actually doing comparatively well considering their cured percentages. The below analytics can be best viewed on a larger screen display.


Monday, July 13, 2020

Interacting with Enhanced Data Interpretations

As we traverse one of the most uncertain times in our history to a new future where things may never allow us to be the same, informed decision-making in the age of data analytics can go a long way to help see the unseen often right infront of us. Correlation, causality, related dimensions that otherwise would be difficult to interpret easily surfaces up when seen from the right context.

In the eastern part of the 2nd most populous country of the world in India, lies the diverse state of West Bengal with a population of nearly 100 million and land area of 34,267 mi². To put that into perspective, that translates to nearly ¼ of the US population in an area that is ¹⁄₁₁₀ the size of US land area, a population density of 28 times more. Upholding the safety protocols at this juncture will need prolific planning and execution, as we all try to overcome the Coronavirus pandemic together.

The below schematics has been created using data from Wikipedia which currently holds active cases counts as of mid-June 2020. The intensity of colors represent amount of active cases in comparison to other districts, almost always proportionate with the population in the respective district. The population figures pertain to Census 2011 however would provide a rough comparative summary of the districts. Deselecting the toppers from the District dropdown below starts to reveal more distinguishable comparative shades of the districts.

A geomap creation with overlaying useful dimensions helps strengthen the visualization. It took some digging around and fine-tuning as there wasn't a readily available dataset with accurate latitudes and longitudes to plot the required districts. Now along with the headquarters of each district and the containment zone coloring as of June data, we have an enhanced visibility of the current scenario.

The charts above are best viewed in a bigger screen area. For the latest figures, we can find them in these curated sites for West Bengal and India.