Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

June 11 2012

Big ethics for big data

As the collection, organization and retention of data has become commonplace in modern business, the ethical implications behind big data have also grown in importance. Who really owns this information? Who is ultimately responsible for maintaining it? What are the privacy issues and obligations? What uses of technology are ethical — or not — when it comes to big data?

These are the questions authors Kord Davis (@kordindex) and Doug Patterson (@dep923) address in "Ethics of Big Data." In the following interview, the two share thoughts about the evolution of the term "big data," ethics in the era of massive information gathering, and the new technologies that raise their concerns for the big data ecosystem.

How do you define "big data"?

Douglas Patterson: The line between big data and plain old data is something that moves with the development of the technology. The new developments in this space make old questions about privacy and other ethical issues far more pressing. What happens when it's possible to know where just about everyone is or what just about everyone watches or reads? From the perspective of business models and processes, "impact" is probably a better way to think about "big" than in terms of current trends in NoSQL platforms, etc.

One useful definition of big data — for those who, like us, don't think it's best to tie it to particular technologies — is that big data is data big enough to raise practical rather than merely theoretical concerns about the effectiveness of anonymization.

Kord Davis: The frequently-cited characteristics "volume, velocity, and variety" are useful landmarks — persistent features such as the size of datasets, the speed at which they can be acquired and queried, and the wide range of formats and file types generating data.

The impact, however, is where ethical issues live. Big data is generating a "forcing function" in our lives through its sheer size and speed. Recently, CNN published a story similar to an example in our book. Twenty-five years ago, our video rental history was deemed private enough that Congress enacted a law to prevent it from being shared in hopes of reducing misuse of the information. Today, millions of people want to share that exact same information with each other. This is a direct example of how big data's forcing function is literally influencing our values.

The influence is a two-way street. Much like the scientific principle that we can't observe a system without changing it, big data can't be used without an impact — it's just too big and fast. Big data can amplify our values, making them much more powerful and influential, especially when they are collected and focused toward a specific desired outcome.

Big data tends to be a broad category. How do you narrow it down?

Douglas Patterson: One way is the anonymization of datasets before they're released publicly, acted on to target advertising, etc. As the legal scholar Paul Ohm puts it, "data can be either useful or perfectly anonymous, but never both."

So, suppose I know things about you in particular: where you've eaten, what you've watched. It's very unlikely that I'm going to end up violating your privacy by releasing the "information" that there's one particular person who likes carne asada and British sitcoms. But if I have that information about 100 million people, patterns emerge that do make it possible to tie data points to particular named, located individuals.

Kord Davis: Another approach is the balance between risk and innovation. Big data represents massive opportunities to benefit business, education, healthcare, government, manufacturing, and many other fields. The risks, however, to personal privacy, the ability to manage our individual reputations and online identities, and what it might mean to lose — or gain — ownership over our personal data are just now becoming topics of discussion, some parts of which naturally generate ethical questions. To take advantage of the benefits big data innovations offer, the practical risks of implementing them need to be understood.

How do ethics apply to big data?

Kord Davis: Big data itself, like all technology, is ethically neutral. The use of big data, however, is not. While the ethics involved are abstract concepts, they can have very real-world implications. The goal is to develop better ways and means to engage in intentional ethical inquiry to inform and align our actions with our values.

There are a significant number of efforts to create a digital "Bill of Rights" for the acceptable use of big data. The White House recently released a blueprint for a Consumer Privacy Bill of Rights. The values it supports include transparency, security, and accountability. The challenge is how to honor those values in everyday actions as we go about the business of doing our work.

Do you anticipate friction between data providers (people) and data aggregators (companies) down the line?

Douglas Patterson: Definitely. For example: you have an accident and you're taken to the hospital unconscious for treatment. Lots of data is generated in the process, and let's suppose it's useful data for developing more effective treatments. Is it obvious that that's your data? It was generated during your treatment, but also with equipment the hospital provided, based on know-how developed over decades in various businesses, universities, and government-linked institutions, all in the course of saving your life. In addition to generating profits, that same data may help save lives down the road. Creating the data was, so to speak, a mutual effort, so it's not obvious that it's your data. But it's also not obvious that the hospital can just do whatever it wants with it. Maybe under the right circumstances, the data could be de-anonymized to reveal what sort of embarrassing thing you were doing when you got hurt, with great damage to your reputation. And giving or selling data down the line to aggregators and businesses that will attempt to profit from it is one thing the hospital might want to do with the data that you might want to prevent — especially if you don't get a percentage.

Questions of ownership, questions about who gets to say what may and may not be done with data, are where the real and difficult issues arise.

Which data technologies raise ethical concerns?

Douglas Patterson: Geolocation is huge — think of the flap over the iPhone's location logging a while back, or how much people differ over whether or not it's creepy to check yourself or a friend into a location on Facebook or Foursquare. Medical data is going to become a bigger and bigger issue as that sector catches up.

Will lots of people wake up someday and ask for a "do over" on how much information they've been giving away via the "frictionless sharing" of social media? As a teacher, I was struck by how little concern my students had about this — contrasted with my parents, who find something pretty awful about broadcasting so much information. The trend seems to be in favor of certain ideas about privacy going the way of the top hat, but trends like that don't always continue.

Kord Davis: The field of predictive analytics has been around for a long time, but the development of big data technologies has increased accessibility to large datasets and the ability to data mine and correlate data using commodity hardware and software. The potential benefits are massive. A promising example is that longitudinal studies in education can gather and process significantly more minute data characteristics and we have no idea what we might learn. Which is precisely the point. Being able to assess a more refined population of cohorts may well turn out to unlock powerful ways to improve education. Similar conditions exist for healthcare, agriculture, and even being able to predict weather more reliably and reducing damage from catastrophic natural weather events.

On the other hand, the availability of larger datasets and the ability to process and query against them makes it very tempting for organizations to share and cross-correlate to gain deeper insights. If you think it's difficult to identify values and align them with actions within a single organization, imagine how many organizations the trail of your data exhaust touches in a single day.

Even a simple, singular transaction, such as buying a pair of shoes online touches your bank, the merchant card processor, the retail or wholesale vendor, the shoe manufacturer, the shipping company, your Internet service provider, the company that runs or manages the ecommerce engine that makes it possible, and every technology infrastructure organization that supports them. That's a lot of opportunity for any single bit of your transaction to be stored, shared, or otherwise mis-used. Now imagine the data trail for paying your taxes. Or voting — if that ever becomes widely available.

What recent events point to the future impact of big data?

Douglas Patterson: For my money, the biggest impact is in the funding of just about everything on the web by either advertising dollars or investment dollars chasing advertising dollars. Remember when you used to have to pay for software? Now look at what Google will give you for free, all to get your data and show you ads. Or, think of the absolutely pervasive impact of Facebook on the lives of many of its users — there's very little about my social life that hasn't been affected by it.

Down the road there may be more Orwellian or "Minority Report" sorts of things to worry about — maybe we're already dangerously close now. On the positive side again, there will doubtless be some amazing things in medicine that come out of big data. Its impact is only going to get bigger.

Kord Davis: Regime change efforts in the Middle East and the Occupy Movement all took advantage of big data technologies to coordinate and communicate. Each of those social movements shared a deep set of common values, and big data allowed them to coalesce at an unprecedented size, speed, and scale. If there was ever an argument for understanding more about our values and how they inform our actions, those examples are powerful reminders that big data can influence massive changes in our lives.

This interview was edited and condensed.

Ethics of Big Data — This book outlines a framework businesses can use to maintain ethical practices while working with big data.


November 02 2011

What does privacy mean in an age of big data?

As we do more online — shop, browse, chat, check in, "like" — it's clear that we're leaving behind an immense trail of data about ourselves. Safeguards offer some level of protection, but technology can always be cracked and the goals of data aggregators can shift. So if digital data is and always will be a moving target, how does that shape our expectations for privacy? Terence Craig (@terencecraig), co-author of "Privacy and Big Data," examines this question and related issues in the following interview.

Your book argues that by focusing on how advertisers are using our data, we might be missing some of the bigger picture. What are we missing, specifically?

Terence CraigTerence Craig: One of the things I tell people is I really don't care if companies get more efficient at selling me soap. What I do care about is the amount of information that is being aggregated to sell me soap and what uses that data might be put toward in the future.

One of the points that co-author Mary Ludloff and I tried to make in the book is that the reasons behind data collection have nothing to do with how that data will eventually be used. There's way too much attention being paid to "intrusions of privacy" as opposed to the problem that once data is out there, it's out there. And potentially, it's out there as long as electronic civilization exists. How that data will be used is anybody's guess.

What's your take on the promise of anonymity often associated with data collection?

Terence Craig: It's fundamentally irresponsible for anyone who collects data to claim they can anonymize that data. We've seen the Netflix de-anonymization, the AOL search release, and others. There's been several cases where medical data has been released for laudatory goals, but that data has been de-anonymized rather quickly. For example, the Electronic Frontier Foundation has a piece that explains how a researcher was able to connect an anonymized medical record to former Massachusetts governor William Weld. And in relation to that, a Harvard genome project tries to make sure people understand the privacy risks of participating.

If we assume that companies have good will toward their consumers' data — and I'll assume that most large corporations do — these companies can still be hacked. They can be taken advantage of by bad employees. They can be required by governments to provide backdoors into their systems. Ultimately, all of this is risky for consumers.

Assuming that data can't be anonymized and companies don't have malicious plans for our personal data, what expectations can we have for privacy?

Terence Craig: We've moved back to our evolutionary default for privacy, which is essentially none. Hunter-gatherers didn't have privacy. In small rural villages with shared huts between multi-generational families, privacy just wasn't really available there.

The question is how do we address a society that mirrors our beginnings, but comes with one big difference? Before, anyone who knew the intimate details of our lives were people we had met physically, and they were often related to us. But now the geographical boundary has been erased by the Internet, so what does that mean? And how are we as a society going to evolve to deal with that?

With that in mind, I've given up on the idea of digital privacy as a goal. I think you have to if you want to reap the rewards of being a full participant in a digitized society. What's important is for us to make sure we have transparency from the large institutions that are aggregating data. We need these institutions to understand what they're doing with data and to share that with people so we, in aggregate, can agree whether or not this is a legitimate use of our data. We need transparency so that we — consumers, citizens — can start to control the process. Transparency is what's important. The idea that we can keep the data hidden or private, well ... that horse has left the stable.

What's the role of governments here, both in terms of the data they keep but also the laws they pass about data?

Terence Craig: Basically anything the government collects, I believe should be made available. After all, governments are some of the largest aggregators of data from all sorts of people. They either purchase it or they demand it for security needs from primary collectors like Google, Facebook, and the cell phone companies — the millions of requests law enforcement agencies sent to Sprint in 2008-2009 was a big story we mentioned in the book. So, it's important that governments reveal what they're doing with this information.

Obviously, there's got to be a balance between transparency and operational security needs. What I want is to have a general idea of: "Here's what we — the government — are doing with all of the data. Here's all of the data we've collected through various means. Here's what we're doing with it. Is that okay?" That's the sort of legislation I would like, but you don't see that anywhere at this point.

This interview was edited and condensed.

Privacy and Big Data — This book introduces you to the players in the personal data game, and explains the stark differences in how the U.S., Europe, and the rest of the world approach the privacy issue.


Reposted byschlingelulexElbenfreund
Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
Get rid of the ads (sfw)

Don't be the product, buy the product!

YES, I want to SOUP ●UP for ...