Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

January 21 2014

Decision making under uncertainty

The 2014 Edge Annual Question (EAQ) is out. This year, the question posed to the contributors is: What scientific idea is ready for retirement?

As usual with the EAQ, it provokes thought and promotes discussion. I have only read through a fraction of the responses so far, but I think it is important to highlight a few Edge contributors who answered with a common, and in my opinion a very important and timely, theme. The responses that initially caught my attention came from Laurence Smith (UCLA), Gavin Schmidt (NASA), Guilio Boccaletti (The Nature Conservancy) and Danny Hillis (Applied Minds). If I were to have been asked this question, my contribution for idea retirement would likely align most closely with these four responses: Smith and Boccaletti  want to see same idea disappear — stationarity; Schmidt’s response focused on the abolition of simple answers; and Hillis wants to do away with cause-and-effect.

In the age of big-data, from a decision-making standpoint, all of these responses address the complex nature of interconnected scientific topics and the search for one-size-fits-all answers. The conclusions should all point toward what science is supposed to do in the first place: to generate knowledge. Of course with every experiment there is the objective of finding an answer to a specific question, but each experiment, if performed properly, should fundamentally serve to generate more questions, not answers. When a newly-minted PhD successfully defends his or her dissertation, for a brief moment in time, they are the world’s expert on their particular subject. However, the next day, that may not hold true. This is progress and should be embraced. If we can apply findings of a study to address a specific problem, great. But science should be a humbling endeavor — as each day we should realize how much we actually don’t know.

As more and more universities are cranking out graduates under the data-science rubric, I hope that part of the curriculum stresses that while machine learning and advanced algorithms can be used to uncover new, useful and novel patterns contained within large datasets, it should also be stressed that these same tools and techniques can be trained to determine when false positives might lead down dark data alleys. This is all part of a proper lens through which to view scientific risk management. Taking a complex adaptive systems approach to data analysis will better prepare decision makers to identify tipping points and non-stationarity, while providing a foundation to continuously challenge assumptions, and at the same time, to embrace the notion of complexity, shifting baselines, and ambiguity.

August 19 2013

Data Science for Business

A couple of years ago, Claudia Perlich introduced me to Foster Provost, her PhD adviser. Foster showed me the book he was writing with Tom Fawcett, and using in his teaching at NYU.

Foster and Tom have a long history of applying data to practical business problems. Their book, which evolved into Data Science for Business, was different from all the other data science books I’ve seen. It wasn’t about tools: Hadoop and R are scarcely mentioned, if at all. It wasn’t about coding: business students don’t need to learn how to implement machine learning algorithms in Python. It is about business: specifically, it’s about the data analytic thinking that business people need to work with data effectively.

Data analytic thinking means knowing what questions to ask, how to ask those questions, and whether the answers you get make sense. Business leaders don’t (and shouldn’t) do the data analysis themselves. But in this data-driven age, it’s critically important for business leaders to understand how to work with the data scientists on their teams. In today’s business world, it’s essential to understand which algorithms are used for different applications, how statistics are used to create models of human and economic behavior, overfitting and its symptoms, and much more. You might not need to know how to implement a machine learning algorithm, but you do need to understand the ideas the data scientists on your team are using.

The goal of data science is putting data to work. That’s what Data Science for Business is all about, and the reason I’m excited to see us publishing it. There are many books about data science, and an increasing number of undergraduate and graduate programs in data science. But I haven’t seen anything that teaches data science for the leaders who will be using data to drive their businesses forward.

August 15 2013

The vanishing cost of guessing

If you eat ice cream, you’re more likely to drown.

That’s not true, of course. It’s just that both ice cream and swimming happen in the summer. The two are correlated — and ice cream consumption is a good predictor of drowning fatalities — but ice cream hardly causes drowning.

These kinds of correlations are all around us, and big data makes them easy to find. We can correlate childhood trauma with obesity, nutrition with crime rates, and how toddlers play with future political affiliations.

Just as we wouldn’t ban ice cream in the hopes of preventing drowning, we wouldn’t preemptively arrest someone because their diet wasn’t healthy. But a quantified society, awash in data, might be tempted to do so because overwhelming correlation looks a lot like causality. And overwhelming correlation is what big data does best.

It’s getting easier than ever to find correlations. Parallel computing, advances in algorithms, and the inexorable crawl of Moore’s Law have dramatically reduced how much it costs to analyze a data set. Consider an activity we do dozens of times a day, without thinking: a Google search. The search is farmed out to thousands of machines, and often returns hundreds of answers in less than a second. Big data might seem esoteric, but it’s already here.

Google’s search results aren’t the right results; they’re those that are most likely to be related to what you searched for. Similarly, Watson, IBM’s Jeopardy-winning software, mined millions of records to guess at the right answer. Today, an abundance of cheap, simple tools makes it trivial for organizations to guess rather than to know about everything from employee honesty to the spread of disease to the optimal delivery of car parts in a snow-bound city to whether a teenager is pregnant.

Tomorrow’s data-driven society is both smarter and dumber, more just and more merciless. The ethical implications of this shift are only now becoming clear: at some point, innocent-until-proven-guilty looks a lot like innocent-until-likely-to-be guilty.

What the big data revolution is really about is predicting the future. Whether it’s choosing the right ad to show a web visitor, or setting the optimal insurance premium, or helping an inner-city student learn better, we crunch reams of data to try to predict what will happen.

Proponents see this as a boon to humanity. Big data makes us smart: we can anticipate a flu outbreak or where charitable donations do the most good. It also makes us just: transparent, open information and the tools to analyze it shine the harsh light of data on corruption, replacing opinions with facts.

On the other hand, critics charge that big data will make us stick to constantly optimizing what we already know, rather than thinking out of the box and truly innovating. We’ll rely on machines for evolutionary improvements, rather than revolutionary disruption. An abundance of data means we can find facts to support our preconceived notions, polarizing us politically and dividing us into “filter bubbles” of like-minded intolerance. And it’s easy to mistake correlation for causality, leading us to deny someone medical coverage or refuse them employment because of a pattern over which they have no control, taking us back to the racism and injustice of Apartheid or Redlining.

Big data isn’t a magical tool for predicting the future. It’s not a way to peer into someone’s soul or decide what’s going to happen, even though it’s often frighteningly good at guessing. Just because the cost of guessing is dropping quickly to zero doesn’t mean we should treat a guess as the truth. As we become an increasingly data-driven society, it’s critical that we remember we can no more predict tomorrow with today’s data than we can prevent drowning by banning ice cream.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl