Sweet visual

 

Above the Noise

Distilling insights and key lessons from new concepts and current events. In this post, our partner, Jon Mayes, explores the factors you should understand when deciding between traditional and data science tools

 

To anyone who is paying attention, there is an absolute gold rush with organizations rapidly building up their “data science” capability. But what has happened is the term seems to include anything remotely analytical, and in doing so, signifies nothing.

As someone whose technical career spanned the evolution of the term, I’d like to offer my take on how we can create clarity around the tools and frameworks that have been come to be included in the big tent of “data science” and how business decision makers should think about them.

To begin, I’d like to propose the term Decision Science as a term to encompass any framework that seeks to understand and/or predict individual and aggregate human decisions. As examples, an individual human decision would be I chose to buy a t-shirt after having an ad shown to me on Instagram, and an aggregate of human decisions would be how many more t-shirts were purchased after the price was lowered by 10%.

With Decision Science defined, we can break down decision science models into three main paradigms being:

  • Social Science Models – Models based on social science theory such as Economics which impose assumptions on behavior (e.g. rationality) and relationships
  • Time Series Models – Models predicting results based (mostly) on past behavior and time-wise patterns in that behavior, such as seasonality and trends, and
  • Data Science Models – computational frameworks that produce predictions based on data with no structure or relationships imposed

You may notice that as we move down the list from Social Science Models to Data Science Models, we are removing “structure” from the model. The benefit is the model can find patterns that can produce useful prediction even when a human can’t define what it is looking for. Think about an image recognition algorithm that recognizes cats. It’s much easier for a human to know it when it sees it than to lay out a set of rules that define a cat, just like the Supreme Court and Obscenity.

The side effect of this benefit is that the removal of structure also limits the amount of explanatory power a data science model has versus the other two. Figure 1 below illustrates the tradeoff between explanation and prediction.

 

Figure 1
Figure 1: Explanation vs Prediction in Decision Science Models

Now you can say that you fed the model this data, and with some intense calculation such as Shapley values, you can say which data was more important than others in making a prediction, but the model can’t say exactly how it affected the prediction.

A natural question to ask would be, so what? If I’m getting the best prediction possible, then why do I care if I can or can’t explain it? The short answers are Upstream and Downstream Accountability.

For the time being, our bosses/supervisors/clients/boards are made up of humans. When something does not go to plan, which will always be the case, we must often explain why. That is, we need to explain what we did and why in a logical manner and how what we did made sense. This is Upstream Accountability.

To understand Downstream Accountability, think about the business decisions that are made that depend on the one you are currently making. An easy illustration would be that of a strategy decision. If the organization decides to enter a market, knowing why is important for those who must plan production capacity, route to market strategy, pricing, etc. to make tactical decisions that go along with the strategy.

To take it back to our original example, the decision to show me an ad an Instagram does not really fall into upstream or downstream accountability. It’s cheap and no other decision depends upon it. A machine learning algorithm is likely the perfect tool to make that quick decision. That said, the decision to deploy an ML tool was based upon an expected ROI determined by analysis, benchmarking, etc. This decision has both upstream and downstream accountability and requires a more explanatory approach.

When you are deciding upon a framework to solve a particular business problem, we recommend understanding your need for explanation before going all in on any particular method.