Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

March 04 2013

Untangling algorithmic illusions from reality in big data

Microsoft principal researcher Kate Crawford (@katecrawford) gave a strong talk at last week’s Strata Conference in Santa Clara, Calif. about the limits of big data. She pointed out potential biases in data collection, questioned who may be excluded from it, and hammered home the constant need for context in conclusions. Video of her talk is embedded below:

Crawford explored many of these same topics in our interview, which follows.

What research are you working on now, following up on your paper on big data?

Kate Crawford: I’m currently researching how big data practices are affecting different industries, from news to crisis recovery to urban design. This talk was based on that upcoming work, touching on questions of smartphones as sensors, on dealing with disasters (like Hurricane Sandy), and new epistemologies — or ways we understand knowledge — in an era of big data.

When “Six Provocations for Big Data” came out in 2011, we were critiquing the very early stages of big data and social media. In the two years since, the issues we raised are even more prominent.

I’m now looking beyond social media to a range of other areas where big data is raising questions of social justice and privacy. I’m also editing a special issue on critiques of big data, which will be coming out later this year in the International Journal of Communications.

As more nonprofits and governments look to data analysis in governing or services, what do they need to think about and avoid?

Kate Crawford: Governments have a responsibility to serve all citizens, so it’s important that big data doesn’t become a proxy for “data about everyone.” There are two problems here: first is the question of who is visible and who isn’t represented; the second is privacy, or what I call “privacy practices” — because privacy means different things depending on where and who you are.

For example, the Streetbump app is brilliant. What city wouldn’t want to passively draw on data from all those smartphones out there, a constantly moving network of sensors? But, as we know, there are significant percentages of Americans who don’t have smartphones, particularly older citizens and those with lower disposable incomes. What happens to their neighborhoods if they generate no data? They fall off the map. To be invisible when governments make resource decisions is dangerous.

Then, of course, there’s the whole issue of people signing up to be passively tracked wherever they go. People may happily opt into it, but we’d want to be very careful about who gets that data, and how it is protected over the long term — not just five years, but 50 years and beyond. Governments might be tempted to use that data for other purposes, even civic ones, and this has significant implications for privacy and the expectations citizens have for the use of their data.

Where else could such biases apply?

Kate Crawford: There are many areas where big data bias is a problem from a social equity perspective. One of the key ones at the moment is law enforcement. I’m concerned by some of the work that seeks to “profile” areas, and even people, as likely to be involved in crime. It’s called “predictive policing” (more here). We’ve already seen some problematic outcomes when profiling was introduced for plane travel. Now, imagine what happens if you or your neighborhood falls on the wrong side of a predictive model. How do you even begin to correct the record? Which algorithm do you appeal to?

What are the things, as David Brooks listed recently, that big data can’t do?

Kate Crawford: There are lots of things that big data can’t do. It’s useful to consider the history of knowledge, and then imagine what it would look like if we only used one set of tools, one methodology for getting answers.

This is why I find people like Gabriel Tarde so interesting — he was grappling with ideas of method, big data and small data, back in the late 1800s.

He reminds us of what we can lose sight of when we go up orders of magnitude and try to leave small-scale data behind — like interviewing people, or observing communities, or running limited experiments. Context is key, and it is much easier to be attentive to context when we are surrounded by it. When context is dissolved into so many aggregated datasets, we can start getting mistaken impressions.

When Google Flu Analytics mistakenly predicted that 11% of the US had flu this year, that points to how relying on a big data signal alone may give us an exaggerated or distorted result (in that case, more than double the actual figure, which was between 4.5-4.8%). Now, imagine how much worse it would be if that data was all that health agencies had to work with.

I’m really interested in how we might best combine computational social science with traditional qualitative and ethnographic methods. With a range of tools and perspectives, we’re much more likely to get a three-dimensional view of a problem and be less prone to serious error. This goes beyond tacking on a few focus groups to big datasets, but conjoining deep, ethnographically-informed research with rich data sources.

What can the history of statistics in social science tell us about correlation vs causation? Does big data change that dynamic?

Kate Crawford: This is a gigantic question, and one that could be its own talk! With big datasets, it’s very tempting for researchers to engage in apophenia — seeing patterns where none actually exist — because massive quantities of data can point to a range of correlative possibilities.

For example, David Leinweber showed back in 2007 that data mining techniques could show a strong but spurious correlation between the changes in the S&P 500 stock index and butter production in Bangladesh. There’s
another great correlation between the use of Facebook and the rise of the Greek debt crisis.

With big data techniques, some people argue you can get much closer to being able to predict causal relations. But even here, big data tends to need several steps of preparation (data “cleaning” and pre-processing) and several steps in interpretation (deciding which of many analyses shows a positive result versus a null-result).

Basically, humans are still in the mix, and thus it’s very hard to escape false positives, strained correlations and cognitive bias.

October 15 2012

New ethics for a new world

Since the first of our ancestors chipped stone into weapon, technology has divided us. Seldom more than today, however: a connected, always-on society promises health, wisdom, and efficiency even as it threatens an end to privacy and the rise of prejudice masked as science.

On its surface, a data-driven society is more transparent, and makes better uses of its resources. By connecting human knowledge, and mining it for insights, we can pinpoint problems before they become disasters, warding off disease and shining the harsh light of data on injustice and corruption. Data is making cities smarter, watering the grass roots, and improving the way we teach.

But for every accolade, there’s a cautionary tale. It’s easy to forget that data is merely a tool, and in the wrong hands, that tool can do powerful wrong. Data erodes our privacy. It predicts us, often with unerring accuracy — and treating those predictions as fact is a new, insidious form of prejudice. And it can collect the chaff of our digital lives, harvesting a picture of us we may not want others to know.

The big data movement isn’t just about knowing more things. It’s about a fundamental shift from scarcity to abundance. Most markets are defined by scarcity — the price of diamonds, or oil, or music. But when things become so cheap they’re nearly free, a funny thing happens.

Consider the advent of steam power. Economist Stanley Jevons, in what’s known as Jevons’ Paradox, observed that as the efficiency of steam engines increased, coal consumption went up. That’s not what was supposed to happen. Jevons realized that abundance creates new ways of using something. As steam became cheap, we found new ways of using it, which created demand.

The same thing is happening with data. A report that took a month to run is now just a few taps on a tablet. An unthinkably complex analysis of competitors is now a Google search. And the global distribution of multimedia content that once required a broadcast license is now an upload.

Big data is about reducing the cost of analyzing our world. The resulting abundance is triggering entirely new ways of using that data. Visualizations, interfaces, and ubiquitous data collection are increasingly important, because they feed the machine — and the machine is hungry.

The results are controversial. Journalists rely on global access to data, but also bring a new skepticism to their work, because facts are easy to manufacture. There’s good evidence that we’ve never been as polarized, politically, as we are today — and data may be to blame. You can find evidence to support any conspiracy, expose any gaffe, or refute any position you dislike, but separating truth from mere data is a growing problem.

Perhaps the biggest threat that a data-driven world presents is an ethical one. Our social safety net is woven on uncertainty. We have welfare, insurance, and other institutions precisely because we can’t tell what’s going to happen — so we amortize that risk across shared resources. The better we are at predicting the future, the less we’ll be willing to share our fates with others. And the more those predictions look like facts, the more justice looks like thoughtcrime.

The human race underwent a huge shift when we banded together into tribes, forming culture and morals to tie us to one another. As groups, we achieved great heights, building nations, conquering challenges, and exploring the unknown. If you were one of those tribesmen, it’s unlikely you knew what was happening — it’s only in hindsight that the shift from individual to group was radical.

We’re in the middle of another, perhaps bigger, shift, one that’s taking us from physical beings to digital/physical hybrids. We’re colonizing an online world, and just as our ancestors had to create new social covenants and moral guidelines to work as groups, so we have to craft new ethics, rights and laws.

Those fighting for social change have their work cut out for them, because they’re not just trying to find justice — they’re helping to rewrite the ethical and moral guidelines for a nascent, always-on, data-driven species.

Related:

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl