Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

September 09 2013

L'avenir de la prédiction - SSRN

L’avenir de la prédiction - SSRN

Dans cet article de recherche, Lynn Wu et Erik Brynjolfsson montrent comment les données provenant de moteurs de recherches comme #google permettent de façon précise et simple et prédire l’avenir d’activités commerciales. Les chercheurs ont montré que l’activité de recherche de logement depuis Google sur un endroit donné était un meilleur prédicateur de prix que bien d’autres modèles. Et ce modèle pourrait s’appliquer à bien d’autres marchés, comme par exemple, l’électro-ménager... Tags : (...)


August 15 2013

The vanishing cost of guessing

If you eat ice cream, you’re more likely to drown.

That’s not true, of course. It’s just that both ice cream and swimming happen in the summer. The two are correlated — and ice cream consumption is a good predictor of drowning fatalities — but ice cream hardly causes drowning.

These kinds of correlations are all around us, and big data makes them easy to find. We can correlate childhood trauma with obesity, nutrition with crime rates, and how toddlers play with future political affiliations.

Just as we wouldn’t ban ice cream in the hopes of preventing drowning, we wouldn’t preemptively arrest someone because their diet wasn’t healthy. But a quantified society, awash in data, might be tempted to do so because overwhelming correlation looks a lot like causality. And overwhelming correlation is what big data does best.

It’s getting easier than ever to find correlations. Parallel computing, advances in algorithms, and the inexorable crawl of Moore’s Law have dramatically reduced how much it costs to analyze a data set. Consider an activity we do dozens of times a day, without thinking: a Google search. The search is farmed out to thousands of machines, and often returns hundreds of answers in less than a second. Big data might seem esoteric, but it’s already here.

Google’s search results aren’t the right results; they’re those that are most likely to be related to what you searched for. Similarly, Watson, IBM’s Jeopardy-winning software, mined millions of records to guess at the right answer. Today, an abundance of cheap, simple tools makes it trivial for organizations to guess rather than to know about everything from employee honesty to the spread of disease to the optimal delivery of car parts in a snow-bound city to whether a teenager is pregnant.

Tomorrow’s data-driven society is both smarter and dumber, more just and more merciless. The ethical implications of this shift are only now becoming clear: at some point, innocent-until-proven-guilty looks a lot like innocent-until-likely-to-be guilty.

What the big data revolution is really about is predicting the future. Whether it’s choosing the right ad to show a web visitor, or setting the optimal insurance premium, or helping an inner-city student learn better, we crunch reams of data to try to predict what will happen.

Proponents see this as a boon to humanity. Big data makes us smart: we can anticipate a flu outbreak or where charitable donations do the most good. It also makes us just: transparent, open information and the tools to analyze it shine the harsh light of data on corruption, replacing opinions with facts.

On the other hand, critics charge that big data will make us stick to constantly optimizing what we already know, rather than thinking out of the box and truly innovating. We’ll rely on machines for evolutionary improvements, rather than revolutionary disruption. An abundance of data means we can find facts to support our preconceived notions, polarizing us politically and dividing us into “filter bubbles” of like-minded intolerance. And it’s easy to mistake correlation for causality, leading us to deny someone medical coverage or refuse them employment because of a pattern over which they have no control, taking us back to the racism and injustice of Apartheid or Redlining.

Big data isn’t a magical tool for predicting the future. It’s not a way to peer into someone’s soul or decide what’s going to happen, even though it’s often frighteningly good at guessing. Just because the cost of guessing is dropping quickly to zero doesn’t mean we should treat a guess as the truth. As we become an increasingly data-driven society, it’s critical that we remember we can no more predict tomorrow with today’s data than we can prevent drowning by banning ice cream.

October 26 2011

We're in the midst of a restructuring of the publishing universe (don't panic)

A new book released this week called "Book: A Futurist's Manifesto," by Hugh McGuire (@hughmcguire) and Brian O'Leary (@brianoleary), examines the future of book publishing from an advanced perspective. Beyond pricing and delivery mechanisms, beyond taking print and displaying it on a screen, the authors look at the digital transformation as more than a change in format — as stated in the book's introduction:

The move to digital is not just a format shift, but a fundamental restructuring of the universe of publishing. This restructuring will touch every part of a publishing enterprise — or at least most publishing enterprises. Shifting to digital formats is 'part one' of this changing universe; 'part two' is what happens once everything is digital. This is the big, exciting unknown.

I reached out to the book's co-author Hugh McGuire to examine some of the elements at play in the future of publishing and in the "exciting unknown" of doing things with books that have never before been possible. Our interview follows.

What's the story behind "Book: A Futurist's Manifesto"?

HughMcGuire.jpgHugh McGuire: I'd been working on building — a digital book production tool designed for publishers — and I wanted to get a real sense of how it worked, hands on. How better than to manage a real publishing project, working with a real publisher, from beginning to end, using PressBooks?

Of course, it made sense to make it a book about the future of books and publishing. So much ink is spilled about that topic, but we wanted to get away from the abstract and right down to the nitty-gritty. We wanted to produce something that would be a handbook you could give to someone starting a publishing house today.

I talked to my friend Brian O'Leary about co-editing with me, and he was on board. With that, I pitched it to Joe Wikert at O'Reilly — he loved the idea, and off we went.

It's been a bit of a challenge, producing a book while simultaneously building the book production tool on which the book is produced, but we've managed ... if a month or two late.

This is a broad question, but what are the major ways digital is changing publishing?

Hugh McGuire: It's more like in what ways isn't digital changing publishing? First, we very quickly dispatched of the pre-Kindle, pre-iPad question of, "Will people read books on screens?" Yes, and the growth curves are spectacular. The publishing world has, in a pretty orderly way, adapted to this change — with digital files now slotting alongside print books in the distribution chain. I think is this just the start, however.

The publishing world has managed the "digital-conversion disruption" pretty well. Publishers make ebooks now as a matter of course, and consumers buy them and read them on a multitude of devices.

What we as an industry haven't managed yet is the "digital-native disruption." What happens when all new books are ebooks, and the majority of books are read on digital devices, most of which are connected to the Internet? This brings with it so many new expectations from consumers, and I think this is where the real disruption in the market will come.

The kinds of disruption there include: speed of the publishing process, reader engagement with content, linking in and out of books, layers of context added to books, and the webification of books. I think the transitions we've seen in the past three years will pale in comparison to what's going to happen to publishing in the next three years.

Book: A Futurist's Manifesto — Through this collection of essays from publishing thought leaders and practitioners, you'll become familiar with a wide range of developments occurring in the wake of the digital book shakeup.

Which digital tools should publishers focus on?

Hugh McGuire: Publishing is such a strange, conservative business, and I think there is a real hesitancy to invest heavily early on until there is real clarity on what the long-term standards will be. But EPUB is based on HTML, and I think whatever happens, HTML will be with us for the long haul.

So, tools I think publishers need to start working with:

These are the keys to having a successful publishing company that is future-proofed as best as it can be.

Why is metadata important to digital publishing?

Hugh McGuire: Physical bookstores provide a range of crucial services beyond being a place where you can buy books. Stores offer selection, curation, and recommendation. The digital book retail world is very different because it offers nearly unlimited selection. While retailers like Amazon spend a fair bit of energy trying to recommend titles to readers, the task of sifting through and finding books is increasingly left to consumers.

So, having good metadata — which really should be renamed "information about a book" so it's less intimidating — means providing information that will: A) ensure that people looking for your book, or for the kind of content in your book, will find it; and B) help potential buyers of your book decide they want to buy it.

On the web, companies spend lots of time making sure their sites are search engine optimized, so that people looking for those websites (or the information on them) will find them. Attaching good metadata to a book is much like search engine optimization — it's the mechanism you use to make sure your book gets found by the people looking for it.

What will the publishing landscape look like in five years?

Hugh McGuire: In five years:

  • Print is a marginal part of the trade business.
  • There's a huge increase in the number of small publishers of all stripes.
  • There's a massive increase in the number of books on the market.
  • The Big Six publishers will consolidate to become the Big Two or Three.
  • Most writers will continue to have a hard time making a living as writers.
  • Good/successful publishers will be those that provide good APIs to their books.
  • All books will be expected to be connected to the web, allowing linking in and out, and contextual layers of commentary, etc. (Will this be driven by publishers or retailers? To date, retailers have lead the way.)
  • The distinction between what you can do with an ebook and what you can do with a website will disappear (and it will seem strange that it ever existed).
  • While books will become more webby, the web will also become more bookish, accommodating more book-like structures in evolving HTML standards.

What's the publishing schedule for "Book: A Futurist's Manifesto"?

Hugh McGuire: The book comes in three parts:

  1. Out now: "Part 1: The Setup" — This addresses what's happening right now in publishing.
  2. Out sometime before Christmas: "Part 2: The Outlook: What Is Next for the Book?" — Given the technology we currently have, what can we expect to see happening with books going forward?
  3. Out in early 2012: "Part 3: The Things We Can Do with Books: Projects from the Bleeding Edge" — Case studies of real publishing projects, technologies, and enterprises working right now at the bleeding edge.

This interview was edited and condensed.


May 05 2011

Strata Week: Will data make stock exchanges unnecessary?

Here are a few of the data stories that caught my eye this week.

Will big data make the stock exchange obsolete?

Bull at the New York Stock Exchange by Walter Rodriguez, on FlickrThat's the question asked by Andy Kessler in a story titled "Is Internet Cloud and Big Data Killing Stock Exchanges? The New Network Is Virtual." Kessler points to several interesting forces at play that may be undermining the necessity for stock exchanges — or at least stock exchanges as we know them. Technology, he contends, has rendered the traders shouting out on the NYSE floor obsolete:

[W]e need markets. But we barely need humans and traditional exchanges anymore to implement these markets. And certainly not inside a building, as voices now carry to the far reaches of the globe in 300 milliseconds, and even that's considered too slow. Trading on Wall Street is just plumbing these days. Value is added much further up the food chain. Trades take place on servers in the great data cloud in the sky. A third of trading even takes place in so-called Dark Pools, privately owned servers that match institutional orders without ever revealing the size or price of the order. Technology has rendered the stock exchange as we know it obsolete.

Kessler argues that today, "speed is everything," and even though the "flash crash" of May 2010 demonstrated some of the dangers of automated trading, we are likely moving away from human-operated exchanges.

(See also: Trading on sentiment — Sentiment analysis gives algorithmic trading an edge.)

The data black market: Sony and stolen credit card data

With a number of high profile database breaches as of late, it's no surprise that many of the news stories addressing the topic of data are concerned with consumers' privacy and security. A recent post in The New York Times is no exception. "How Credit Card Data Is Stolen and Sold looks at the recent attack on the Sony Playstation Network and the implications of the massive amount of stolen credit card information — not just on consumers, but on the black market for such data.

"According to a number of security researchers," writes Nick Bilton, "the sale of stolen information and credit cards often takes place completely underground in secret credit forums, where hackers exchange or sell data. These forums are closed to the public, and people who join the groups are vetted by forum administrators to ensure they are not from law enforcement." The story goes on to suggest that the possible influx of millions of credit card numbers — there have been boasts of over 2.2 million numbers stolen from Sony — will flood the black market and lower prices.

No doubt, security breaches certainly need to be taken seriously, but so does the mainstream media's treatment of hackers and marketplaces for data.

(See also: Anatomy of a phish — In light of recent security snafus, it's worth reviewing the basics of phish detection and prevention.)

The predictive power of geography undergraduate students

News that U.S. Special Forces had shot and killed Osama Bin Laden came as a surprise announcement Sunday night. But the location of Bin Laden's hideout may not have seemed like such a shock to UCLA geography professors Thomas Gillespie and John Agnew. Working with a a class of undergraduate students, the professors published a paper in 2009 predicting Bin Laden's whereabouts. According to the probabilistic model they created, there was an 88.9% chance that Bin Laden was hiding out in the region in which he was eventually found.

According to Science Insider, the Bin Laden tracking effort initially began as an undergraduate class project. Using remote sensing data from satellites and reporting on Bin Laden's movements since his last known location, students devised a model predicting where he was likely to be. They predicted he would be found in a town, based on a geographical theory called island biogeography. "The theory was basically that if you're going to try and survive, you're going to a region with a low extinction rate: a large town," Gillespie said in the article. "We hypothesized he wouldn't be in a small town where people could report on him."

The undergraduates' project was so well done that Gillespie says he wrote up the results and submitted the paper to the MIT International Review. Other researchers were skeptical of the students' predictions and thought they were "overconfident" in predicting the location down to specific buildings." No word, of course, on whether the Navy Seals read academic geography journals.

Got data news?

Feel free to email me.

Photo: Bull at the New York Stock Exchange by Walter Rodriguez, on Flickr


Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!