Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

October 04 2013

Déchainer contre les algorithmes - The Atlantic

Déchainer contre les algorithmes - The Atlantic
http://www.theatlantic.com/technology/archive/2013/10/rage-against-the-algorithms/280255

Comment connaître les biais des algorithmes ? Par l’ingénierie inversée, estime Nicholas Diakopoulos pour The Atlantic. Au Wall Street Journal, une équipe de journaliste a sondé les plateformes d’ecommerce pour identifier les cas de tarification dynamiques. Au Daily Beast, Michael Keller a regarder la fonction de correction d’orthographe de l’iPhone pour voir les mots qui n’étaient pas dans le correcteur. Pour Slate, Nicholas Diakopoulos a observé les algorithmes d’auto-complétion pour déterminer (...)

#media #algorithme #bigdata

September 29 2013

Big Data et procédure équitable - SSRN

Big Data et procédure équitable - SSRN
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2325784

La chercheuse Kate Crawford et le chercheur Jason Schultz proposent dans le dernier numéro de la « Boston College Law Review » une nouvelle forme de #régulation pour les citoyens à l’heure du Big Data. Ils partent du constat qu’) l’heure du ciblage comportemental, les informations personnelles permettant l’identification ont explosé. L’approche Big Data est en dehors des cadres de protection actuels de la vie privée et a pour conséquence de marginaliser le schéma réglementaire existant et ce d’autant (...)

#bigdata #regulation #droit

August 06 2013

Le patron d'Amazon rachète le « Washington Post »

Le patron d’Amazon rachète le "Washington Post"
http://www.lemonde.fr/actualite-medias/article/2013/08/05/le-patron-d-amazon-rachete-le-washington-post_3457822_3236.html

Le groupe #Washington_Post a annoncé, lundi 5 août, la cession de ses activités d’édition, dont le quotidien portant son nom, au patron-fondateur du groupe de distribution en ligne #Amazon, #Jeff_Bezos, pour 250 millions de dollars. « L’acheteur est une entité qui appartient à M. Bezos en tant qu’individu, et pas Amazon Inc », précise le communiqué du groupe.

Et pas que le quotidien : http://www.slate.com/blogs/moneybox/2013/08/05/bezos_bought_a_bit_more_than_the_post.html

Quel #management va s’imposer dans ce groupe de #presse ? Des hypothèses là : #disruption
http://qz.com/112073/how-things-are-about-to-change-at-the-washington-post-now-that-jeff-bezos-is-in-
http://qzprod.files.wordpress.com/2013/08/e-ink-mobius.jpg

Jeff le #libertarien se veut rassurant :
http://www.washingtonpost.com/national/jeff-bezos-on-post-purchase/2013/08/05/e5b293de-fe0d-11e2-9711-3708310f6f4d_story.html

I won’t be leading The Washington Post day-to-day.

Je m’attendais pas à celle-là ce matin... http://seenthis.net/messages/162928

L’occasion de lire :

En Amazonie. Infiltré dans le « meilleur des mondes »
http://www.monde-diplomatique.fr/2013/08/RIMBERT/49581

En plus je viens de voir Le capital de Costa-Gavras (oui le truc avec Gad Elmaleh) où ça cause de #hft et de grands enfants qui jouent qui jouent jusqu’à ce que ça pète... alors je vous dis pas l’état dans lequel ça me met.

Quant à Gorge Profonde, on attend encore sa réaction.

C’est quand même marrant que le parangon du #journalisme d’#investigation soit racheté par un des maîtres des #bigdata marchandes. En ces temps de persécution des #whistleblowers. Vivement la fusion Publicis / Omnicom !

July 17 2013

Les big data sont le meilleur et le pire ennemi de votre marque « FredCavazza.net FredCavazza.net

Les big data sont le meilleur et le pire ennemi de votre marque « FredCavazza.net FredCavazza.net
http://www.fredcavazza.net/2013/07/15/les-big-data-sont-le-meilleur-et-le-pire-ennemi-de-votre-marque

Avec le Big Data, n’essaye-t-on pas, comme trop souvent, de substituer d’anciennes pratiques aux nouvelles, s’interroge Fred Cavazza. Qui met en garde contre l’abus de corrélations confiés à des technologies que vous ne maîtrisez pas. Tags : internetactu fing internetactu2net #marketing (...)

#bigdata

July 03 2013

affordance.info : Neutralité des algorithmes et pertinence des profils sont dans un bateau.

affordance.info : Neutralité des algorithmes et pertinence des profils sont dans un bateau.
http://affordance.typepad.com/mon_weblog/2013/07/neutralite-des-algorithmes-et-pertinence-des-profils.html

« Les algorithmes ne peuvent pas être neutres », rappelle Olivier Ertzscheid. « Le fantasme de la neutralité des algorithmes est très proche de celui d'une dictature éclairée. Si on veut la neutralité, il faut une opacité totale sur le mode de calcul appliqué. Mais si on a une opacité totale, nul ne peut plus être garant de cette neutralité que ceux qui mettent en place l'#algorithme. » L'opacité ne peut donc pas être la solution, en déduit le chercheur.

"La pertinence des profils est la clé algorithmique (...)

#bigdata

June 25 2012

Four short links: 25 June 2012

  1. Stop Treating People Like Idiots (Tom Steinberg) -- governments miss the easy opportunities to link the tradeoffs they make to the point where the impacts are felt. My argument is this: key compromises or decisions should be linked to from the points where people obtain a service, or at the points where they learn about one. If my bins are only collected once a fortnight, the reason why should be one click away from the page that describes the collection times.
  2. UK Study Finds Mixed Telemedicine Benefits -- The results, in a paper to the British Medical Journal published today, found telehealth can help patients with long-term conditions avoid emergency hospital care, and also reduce deaths. However, the estimated scale of hospital cost savings is modest and may not be sufficient to offset the cost of the technology, the report finds. Overall the evidence does not warrant full scale roll-out but more careful exploration, it says. (via Mike Pearson)
  3. Pay Attention to What Nick Denton is Doing With Comments (Nieman Lab) -- Most news sites have come to treat comments as little more than a necessary evil, a kind of padded room where the third estate can vent, largely at will, and tolerated mainly as a way of generating pageviews. This exhausted consensus makes what Gawker is doing so important. Nick Denton, Gawker’s founder and publisher, Thomas Plunkett, head of technology, and the technical staff have re-designed Gawker to serve the people reading the comments, rather than the people writing them.
  4. Informed Consent Source of Confusion (Nature) -- fascinating look at the downstream uses of collected bio data and the difficulty in gaining informed consent: what you might learn about yourself (do I want to know I have an 8.3% greater chance of developing Alzheimers? What would I do with that knowledge besides worry?), what others might learn about you (will my records be subpoenable?), and what others might make from the knowledge (will my data be used for someone else's financial benefit?). (via Ed Yong)

June 15 2012

Top Stories: June 11-15, 2012

Here's a look at the top stories published across O'Reilly sites this week.

A reduced but important future for desktop computing
Josh Marinacci says most people will rely on mobile devices to handle their computing needs, but a select and small group of power users will continue to use desktop machines.

Big ethics for big data
"Ethics of Big Data" authors Kord Davis and Doug Patterson explore ownership, anonymization, privacy, and ways to evaluate and establish ethical data practices within an organization.

Stories over spreadsheets
Imagine a future where clear language supplants spreadsheets. In a recent interview, Narrative Science CTO Kris Hammond explained how we might get there.


Data in use from public health to personal fitness
Releasing public data can't fix the health care system by itself, but it provides tools as well as a model for data sharing.


What is DevOps?
NoOps, DevOps — no matter what you call it, operations won't go away. Ops experts and development teams will jointly evolve to meet the challenges of delivering reliable software to customers.


Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif. Save 20% on registration with the code RADAR20.

June 11 2012

Big ethics for big data

As the collection, organization and retention of data has become commonplace in modern business, the ethical implications behind big data have also grown in importance. Who really owns this information? Who is ultimately responsible for maintaining it? What are the privacy issues and obligations? What uses of technology are ethical — or not — when it comes to big data?

These are the questions authors Kord Davis (@kordindex) and Doug Patterson (@dep923) address in "Ethics of Big Data." In the following interview, the two share thoughts about the evolution of the term "big data," ethics in the era of massive information gathering, and the new technologies that raise their concerns for the big data ecosystem.

How do you define "big data"?

Douglas Patterson: The line between big data and plain old data is something that moves with the development of the technology. The new developments in this space make old questions about privacy and other ethical issues far more pressing. What happens when it's possible to know where just about everyone is or what just about everyone watches or reads? From the perspective of business models and processes, "impact" is probably a better way to think about "big" than in terms of current trends in NoSQL platforms, etc.

One useful definition of big data — for those who, like us, don't think it's best to tie it to particular technologies — is that big data is data big enough to raise practical rather than merely theoretical concerns about the effectiveness of anonymization.

Kord Davis: The frequently-cited characteristics "volume, velocity, and variety" are useful landmarks — persistent features such as the size of datasets, the speed at which they can be acquired and queried, and the wide range of formats and file types generating data.

The impact, however, is where ethical issues live. Big data is generating a "forcing function" in our lives through its sheer size and speed. Recently, CNN published a story similar to an example in our book. Twenty-five years ago, our video rental history was deemed private enough that Congress enacted a law to prevent it from being shared in hopes of reducing misuse of the information. Today, millions of people want to share that exact same information with each other. This is a direct example of how big data's forcing function is literally influencing our values.

The influence is a two-way street. Much like the scientific principle that we can't observe a system without changing it, big data can't be used without an impact — it's just too big and fast. Big data can amplify our values, making them much more powerful and influential, especially when they are collected and focused toward a specific desired outcome.

Big data tends to be a broad category. How do you narrow it down?

Douglas Patterson: One way is the anonymization of datasets before they're released publicly, acted on to target advertising, etc. As the legal scholar Paul Ohm puts it, "data can be either useful or perfectly anonymous, but never both."

So, suppose I know things about you in particular: where you've eaten, what you've watched. It's very unlikely that I'm going to end up violating your privacy by releasing the "information" that there's one particular person who likes carne asada and British sitcoms. But if I have that information about 100 million people, patterns emerge that do make it possible to tie data points to particular named, located individuals.

Kord Davis: Another approach is the balance between risk and innovation. Big data represents massive opportunities to benefit business, education, healthcare, government, manufacturing, and many other fields. The risks, however, to personal privacy, the ability to manage our individual reputations and online identities, and what it might mean to lose — or gain — ownership over our personal data are just now becoming topics of discussion, some parts of which naturally generate ethical questions. To take advantage of the benefits big data innovations offer, the practical risks of implementing them need to be understood.

How do ethics apply to big data?

Kord Davis: Big data itself, like all technology, is ethically neutral. The use of big data, however, is not. While the ethics involved are abstract concepts, they can have very real-world implications. The goal is to develop better ways and means to engage in intentional ethical inquiry to inform and align our actions with our values.

There are a significant number of efforts to create a digital "Bill of Rights" for the acceptable use of big data. The White House recently released a blueprint for a Consumer Privacy Bill of Rights. The values it supports include transparency, security, and accountability. The challenge is how to honor those values in everyday actions as we go about the business of doing our work.

Do you anticipate friction between data providers (people) and data aggregators (companies) down the line?

Douglas Patterson: Definitely. For example: you have an accident and you're taken to the hospital unconscious for treatment. Lots of data is generated in the process, and let's suppose it's useful data for developing more effective treatments. Is it obvious that that's your data? It was generated during your treatment, but also with equipment the hospital provided, based on know-how developed over decades in various businesses, universities, and government-linked institutions, all in the course of saving your life. In addition to generating profits, that same data may help save lives down the road. Creating the data was, so to speak, a mutual effort, so it's not obvious that it's your data. But it's also not obvious that the hospital can just do whatever it wants with it. Maybe under the right circumstances, the data could be de-anonymized to reveal what sort of embarrassing thing you were doing when you got hurt, with great damage to your reputation. And giving or selling data down the line to aggregators and businesses that will attempt to profit from it is one thing the hospital might want to do with the data that you might want to prevent — especially if you don't get a percentage.

Questions of ownership, questions about who gets to say what may and may not be done with data, are where the real and difficult issues arise.

Which data technologies raise ethical concerns?

Douglas Patterson: Geolocation is huge — think of the flap over the iPhone's location logging a while back, or how much people differ over whether or not it's creepy to check yourself or a friend into a location on Facebook or Foursquare. Medical data is going to become a bigger and bigger issue as that sector catches up.

Will lots of people wake up someday and ask for a "do over" on how much information they've been giving away via the "frictionless sharing" of social media? As a teacher, I was struck by how little concern my students had about this — contrasted with my parents, who find something pretty awful about broadcasting so much information. The trend seems to be in favor of certain ideas about privacy going the way of the top hat, but trends like that don't always continue.

Kord Davis: The field of predictive analytics has been around for a long time, but the development of big data technologies has increased accessibility to large datasets and the ability to data mine and correlate data using commodity hardware and software. The potential benefits are massive. A promising example is that longitudinal studies in education can gather and process significantly more minute data characteristics and we have no idea what we might learn. Which is precisely the point. Being able to assess a more refined population of cohorts may well turn out to unlock powerful ways to improve education. Similar conditions exist for healthcare, agriculture, and even being able to predict weather more reliably and reducing damage from catastrophic natural weather events.

On the other hand, the availability of larger datasets and the ability to process and query against them makes it very tempting for organizations to share and cross-correlate to gain deeper insights. If you think it's difficult to identify values and align them with actions within a single organization, imagine how many organizations the trail of your data exhaust touches in a single day.

Even a simple, singular transaction, such as buying a pair of shoes online touches your bank, the merchant card processor, the retail or wholesale vendor, the shoe manufacturer, the shipping company, your Internet service provider, the company that runs or manages the ecommerce engine that makes it possible, and every technology infrastructure organization that supports them. That's a lot of opportunity for any single bit of your transaction to be stored, shared, or otherwise mis-used. Now imagine the data trail for paying your taxes. Or voting — if that ever becomes widely available.

What recent events point to the future impact of big data?

Douglas Patterson: For my money, the biggest impact is in the funding of just about everything on the web by either advertising dollars or investment dollars chasing advertising dollars. Remember when you used to have to pay for software? Now look at what Google will give you for free, all to get your data and show you ads. Or, think of the absolutely pervasive impact of Facebook on the lives of many of its users — there's very little about my social life that hasn't been affected by it.

Down the road there may be more Orwellian or "Minority Report" sorts of things to worry about — maybe we're already dangerously close now. On the positive side again, there will doubtless be some amazing things in medicine that come out of big data. Its impact is only going to get bigger.

Kord Davis: Regime change efforts in the Middle East and the Occupy Movement all took advantage of big data technologies to coordinate and communicate. Each of those social movements shared a deep set of common values, and big data allowed them to coalesce at an unprecedented size, speed, and scale. If there was ever an argument for understanding more about our values and how they inform our actions, those examples are powerful reminders that big data can influence massive changes in our lives.

This interview was edited and condensed.

Ethics of Big Data — This book outlines a framework businesses can use to maintain ethical practices while working with big data.

Related:

June 08 2012

Four short links: 8 June 2012

  1. HAproxy -- high availability proxy, cf Varnish.
  2. Opera Reviews SPDY -- thoughts on the high-performance HTTP++ from a team with experience implementing their own protocols. Section 2 makes a good intro to the features of SPDY if you've not been keeping up.
  3. Jetpants -- Tumblr's automation toolkit for handling monstrously large MySQL database topologies. (via Hacker News)
  4. LeakedIn -- check if your LinkedIn password was leaked. Chris Shiflett had this site up before LinkedIn had publicly admitted the leak.

June 07 2012

Four short links: 7 June 2012

  1. Electric Imp -- yet another group working on the necessary middleware for ubiquitous networked devices.
  2. How Big Data Transformed the Dairy Industry (The Atlantic) -- cutting-edge genomics company Illumina has precisely one applied market: animal science. They make a chip that measures 50,000 markers on the cow genome for attributes that control the economically important functions of those animals.
  3. The Curious Case of Internet Privacy (Cory Doctorow) -- I'm with Cory on the perniciousness of privacy-digesting deals between free sites and users, but I'm increasingly becoming convinced that privacy is built into business models and not technology.
  4. Chronoline (Github) -- Javascript to make a horizontal timeline out of a list of events.

June 04 2012

Can Future Advisor be the self-driving car for financial advice?

Future AdvisorLast year, venture capitalist Marc Andreessen famously wrote that software is eating the world. The impact of algorithms upon media, education, healthcare and government, among many other verticals, is just beginning to be felt, and with still unfolding consequences for the industries disrupted.

Whether it's the prospect of IBM's Watson offering a diagnosis to a patient or Google's self-driving car taking over on the morning commute, there are going to be serious concerns raised about safety, power, control and influence.

Doctors and lawyers note, for good reason, that their public appearances on radio, television and the Internet should not be viewed as medical or legal advice. While financial advice may not pose the same threat to a citizen as an incorrect medical diagnosis or treatment, poor advice could have pretty significant downstream outcomes.

That risk isn't stopping a new crop of startups from looking for a piece of the billions of dollars paid every year to financial advisors. Future Advisor launched in 2010 with the goal of providing better financial advice through the Internet using data and algorithms. They're competing against startups like Wealthfront and Betterment, among others.

Not everyone is convinced of the validity of this algorithmically mediated approach to financial advice. Mike Alfred, the co-founder of BrightScope (which has liberated financial advisor data itself), wrote in Forbes this spring that online investment firms are wrong about financial advisors:

"While singularity proponents may disagree with me here, I believe that some professions have a fundamentally human component that will never be replaced by computers, machines, or algorithms. Josh Brown, an independent advisor at Fusion Analytics Investment Partners in NYC, recently wrote that 'for 12,000 years, anywhere someone has had wealth through the history of civilization, there's been a desire to pay others for advice in managing it.' In some ways, it's no different from the reason why many seek out the help of a psychiatrist. People want the comfort of a human presence when things aren't going well. A computer arguably may know how to allocate funds in a normal market environment, but can it talk you off the cliff when things go to hell? I don't think so. Ric Edelman, Chairman & CEO of Edelman Financial Services, brings up another important point. According to him, 'most consumers are delegators and procrastinators, and need the advisor to get them to do what they know they need to do but won't do if left on their own'."

To get the other side of this story, I recently talked with Bo Lu (@bolu), one of the two co-founders of Future Advisor. Lu explained how the service works, where the data comes from and whether we should fear the dispassionate influence of our new robotic financial advisor overlords.

Where did the idea for Future Advisor come from?

Lu: The story behind Future Advisor is one of personal frustration. We started the company in 2010 when my co-founder and I were working at Microsoft. Our friends who had reached their mid-20s were really making money for the first time in their lives. They were now being asked to make decisions, such as "Where do I open an IRA? What do I do with my 401K?" As is often the case, they went to the friend who had the most experience, which in this case turned out to be me. So I said, "Well, let's just find you guys a good financial advisor and then we'll do this," because somehow in my mind, I thought, "Financial advisors do this."

It turned out that all of the financial advisors we found fell into two distinct classes. One were folks that were really nice but essentially in very kind words said, "Maybe you'd be more comfortable at the lower stakes table." We didn't meet any of their minimums. You needed a million dollars or at least a half million to get their services.

The other kinds of financial advisors who didn't have minimums immediately started trying to sell my friends term life insurance and annuities. I'm like, "These guys are 25. There's no reason for you to be doing this." Then I realized there was a misalignment of incentives there. We noticed that our friends were making a small set of the same mistakes over and over again, such as not having the right diversification for their age and their portfolio, or paying too much in mutual fund fees. Most people didn't understand that mutual funds charged fees and were not being tax efficient. We said, "Okay, this looks like a data problem that we can help solve for you guys." That's the genesis out of which Future Advisor was born.

What problem are you working on solving?

Bo Lu: Future Advisor is really trying to do one single thing: deliver on the vision that high-quality financial advice should be able to be produced cheaply and, thus, be broadly accessible to everyone.

If you look at the current U.S. market of financial advisors and you multiply the number of financial advisors in the U.S. — which is roughly a quarter-million people — by what is generally accepted to be a full book of clients, you'll realize that even at full capacity, the U.S. advisor market can serve only about 11% of U.S. households.

In serving that 11% of U.S. households, the advisory market for retail investing makes about $20 billion. This is a classic market where a service is extremely expensive but in being so can only serve a small percentage of the addressable market. As we walked into this, we realized that we're part of something bigger. If you look at 60 years ago, a big problem was that everyone wanted a color television and they just weren't being manufactured quickly or cheaply enough. Manufacturing scale has caught up to us. Now, everything you want you generally can have because manufactured things are cheap. Creating services is still extremely expensive and non-scalable. Healthcare as a service, education as a service and, of course, financial services, financial advising service comes to mind. What we're doing is taking information technology, like computer science, to scale a service in the way the electrical engineering of our forefathers scaled manufacturing.

How big is the team? How are you working together?

Bo Lu: The team has eight people in Seattle. It's almost exactly half finance and half engineering. We unabashedly have a bunch of engineers from MIT, which is where my co-founder went to school, essentially sucking the brains out of the finance team and putting them in software. It's really funny because a lot of the time when we design an algorithm, we actually just sit down and say, "Okay, let's look at a bunch of examples and see what the intuitive decisions are of science people and then try to encode them."

We rely heavily on the existing academic literature in both computational finance and economics because a lot of this work has been done. The interesting thing is that the knowledge is not the problem. The knowledge exists, and it's unequivocal in the things that are good for investors. Paying less in fees is good for investors. Being more tax efficient is good for investors. How to do that is relatively easy. What's hard for the industry for a long time has been to scalably apply those principles in a nuanced way to everybody's unique situation. That's something that software is uniquely good at doing.

How do you think about the responsibility of providing financial advice that traditionally has been offered by highly certified professionals who've taken exams, worked at banks, and are expensive to get to because of that professional experience?

Bo Lu: There's a couple of answers to that question, one of which is the folks on our team have the certifications that people look for. We've got certified financial advisors*, CFAs, which is a private designation on the team. We have math PhDs from the University of Washington on the team. The people who create the software are the caliber of people that you would want to be sitting down with you and helping you with your finances in the first place.

The second part of that is that we ourselves are a registered investment advisor. You'll see many websites that on the bottom say, "This is not intended to be financial advice." We don't say that. This is intended to be financial advice. We're registered federally with the SEC as a registered investment advisor and have passed all of the exams necessary.

*In the interview, Lu said that FutureAdvisor has 'certified financial advisors'. In this context, CFA stood for something else: The Future Advisor team includes Simon Moore, a chartered financial analyst, who advises the startup on investing algorithms design.

Where does the financial data behind the site come from?

Bo Lu: From the consumer side, the site has only four steps. These four steps are very familiar to anyone who's used a financial advisor before. A client signs up for the products. It's a free web service, designed to help everyone. In step one, they answer a couple of questions about their personal situation: age, how much they make, when they want to retire. Then they're asked the kinds of questions that good financial advisors ask, such as your risk tolerance. Here, you start to see that we rely on academic work as much as possible.

There is a great set of work out of the University of Kentucky on risk tolerance questionnaires. Whereas most companies just use some questionnaire they came up with internally, we went and scoured literature to find exact questions that were specifically worded — and have been tested under those wordings to yield statistically significant deviations in determining risk tolerance. So we use those questions. With that information, the algorithm can then come up with a target portfolio allocation for the customer.

In step two, the customer can synchronize or import data from their existing financial institutions into the software. We use Yodlee, which you've written about before. It's the same technology that Mint used to import detailed data about what you already hold in your 401K, in your IRA, and in all of your other investment accounts.

Step three is the dashboard. The dashboard shows your investments at a level that makes sense, rather than current brokerages where when you log in, they tell you how much money you have, with a list of funds you have, and how much they've changed in the last 24 hours of trading. We answer four questions on the dashboard.

  1. Am I on track?
  2. Am I well-diversified for this goal?
  3. Am I overpaying in hidden fees in my mutual funds?
  4. Am I as tax efficient as I could be?

We answer those four questions and then in the final step of the process, we give algorithmically-generated, step-by-step instructions about how to improve your portfolio. This includes specific advice like "this many shares of Fund X to buy this many shares of Fund Y" in your IRA. When the consumer sees this, he or she can go and, with this help, clean up their portfolios. It's kind of like diagnosis and prescription for your portfolio.

There are three separate streams of data underlying the product. One is the Yodlee stream, which is detailed holdings data from hundreds of financial institutions. Two is data about what's in a fund. That comes from Morningstar. Morningstar, of course, gets it from the SEC because mutual funds are required to disclose this. So we can tell, for example, if a fund is an international fund or a domestic fund, what the fees are, and what it holds. The third dataset is from the datasets that we have to tier in ourselves, which is 401K data from the Department of Labor.

On top of this triad of datasets sits our algorithm, which has undergone six to eight months of beta testing with customers. (We launched the product in March 2012.) That algorithm asks, "Okay, given these three datasets, what is the current state of your portfolio? What is the minimum number of moves to reduce both transaction costs and any capital gains that you might incur to get you from where you are to roughly where you need to be?" That's how the product works under the covers.

What's the business model?

Bo Lu: You can think of it as similar to Redfin. Redfin allows individual realtors to do more work by using algorithms to help them do all of the repetitive parts. Our product and the web service is free and will always be free. Information wants to be free. That's how we work in software. It doesn't cost us anything for an additional person to come and use the website.

The way that Future Advisor makes money is that we charge for advisor time. A small percentage of customers will have individual questions about their specific situation or want to talk to a human being and have them answer some questions. This is actually good in two ways.

One, it helps the transition from a purely human service to what we think will eventually be an almost purely digital service. People who are somewhere along that continuum of wanting someone to talk to but don't need someone full-time to talk to can still do that.

Two, those conversations are a great way for us to find out, in aggregate, what the things are that the software doesn't yet do or doesn't do well. Overall, if we take a ton of calls that are all the same, then it means there's an opportunity for the software to step in, scale that process, and help people who don't want to call us or who can't afford to call us to get that information.

What's the next step?

Bo Lu: This is a problem that has a dramatic possible impact attached to it. Personal investing, what the industry calls "retail investing," is a closed-loop system. Money goes in, and it's your money, and it stays there for a while. Then it comes out, and it's still your money. There's very little additional value creation by the financial advisory industry.

It may sound like I'm going out on a limb to say this, but it's generally accepted that the value creation of you and I putting our hard-earned money into the market is actually done by companies. Companies deploy that capital, they grow, and they return that capital in the form of higher stock prices or dividends, fueling the engine of our economic growth.

There are companies across the country and across the world adding value to people's lives. There's little to no value to be added by financial advisors trying to pick stocks. It's actually academically proven that there's negative value to be added there because it turns out the only people who make money are financial advisors.

This is a $20 billion market. But really what that means is that it's a $20 billion tax on individual American investors. If we're successful, we're going to reduce that $20 billion tax to a much smaller number by orders of magnitude. The money that's saved is kept by individual investors, and they keep more of what's theirs.

Because of the size of this market and the size of the possible impact, we are venture-backed because we can really change the world for the better if we're successful. There are a bunch of the great folks in the Valley who have done a lot of work in money and the democratization of software and money tools.

What's the vision for the future of your startup?

Bo Lu: I was just reading your story about smart disclosure a little while ago. There's a great analogy in there that I think applies aptly to us. It's maps. The first maps were paper. Today if you look at the way a retail investor absorbs information, it's mostly paper. They get a prospectus in the mail. They have a bunch of disclosures they have to sign — and the paper is extremely hard to read. I don't know if you've ever tried to read a prospectus; it's something that very few of us enjoy. (I happen to be one of them, but I understand if not everyone's me.) They're extremely hard to parse.

Then we moved on to the digital age of folks taking the data embedded in those prospectuses and making them available. That was Morningstar, right? Now we're moving into the age of folks taking that data and mating it with other data, such as 401K data and your own personal financial holdings data, to make individual personalized recommendations. That's Future Advisor the way it is today.

But just as maps moved from paper maps to Google Maps, it didn't stop there. It moves and has moved to self-autonomous cars. There will be a day when you and I don't ever have to look at a map because, rather than the map being a tool to help me make the decision to get somewhere, the map will be a part of a service I use that just gets the job done. It gets me from point A to point B.

In finance, the job is to invest my money properly. Steward it so that it grows, so that it's there for me when I retire. That's our vision as well. We're going to move from being an information service to actually doing it for you. It's just a default way so that if you do nothing, your financial assets are well taken care of. That's what we think is the ultimate vision of this: Everything works beautifully and you no longer have to think about it.

We're now asked to make ridiculous decisions about spreading money between a checking account, an IRA, a savings account and a 401K, which really make no sense to most of us. The vision is to have one pot of money that invests itself correctly, that you put money into when you earn money. You take money out when you spend it. You don't have to make any decisions that you were never trained nor educated to make about your own personal finances because it just does the right thing. The self-driving car is our vision.

Connecting the future of personal finance with an autonomous car is an interesting perspective. Just as with outsourcing driving, however, there's the potential for negative outcomes. Do you have any concerns about the algorithm going awry?

Bo Lu: We are extremely cognizant of the weighty matters that we are working with here. We have a ton of testing that happens internally. You could even criticize us, as a software development firm, in that we're moving slower than other software development firms. We're not going to move as quickly as Twitter or Foursquare because, to be honest, if they mess up, it's not that big a deal. We're extremely careful about it.

At the same time, I think the Google self-driving car analogy is apt because people immediately say, "Well, what if the car gets into an accident?" Those kinds of fears exist in all fields that matter.


Analysis: Why this matters

"The analogy that comes to mind for me isn't the self-driving car," commented Mike Loukides, via email. "It's personalized medicine."

One of the big problems in health care is that to qualify treatments, we do testing over a very wide sample, and reject it if it doesn't work better than a placebo. But what about drugs that are 100% effective on 10% of the population, but 0% effective on 90%? They're almost certainly rejected. It strikes me that what Future Advisor is doing isn't so much helping you to go on autopilot, but getting beyond generic prescriptions and generating customized advice, just as a future MD might be able to do a DNA sequence in his office and generate a custom treatment.

The secret sauce for Future Advisor is the combination of personal data, open government data and proprietary algorithms. The key to realizing value, in this context, is combining multiple data streams with a user interface that's easy for a consumer to navigate. That combination has long been known by another name: It's a mashup. But the mashups of 2012 have something that those of 2002 didn't have, at least in volume or quality: data.

Future Advisor, Redfin (real estate) or Castlight (healthcare) are all interesting examples of entrepreneurs creating data products from democratized government data. Future Advisor uses data from consumers and the U.S. Department of Labor, Redfin synthesizes data from economists and government agencies, and Castlight uses health data from the U.S. Department of Health and Human Services. In each case, they provide a valuable service and/or product by making sense of that data deluge.

Related:

May 31 2012

Strata Week: MIT and Massachusetts bet on big data

Here are a few of the big data stories that caught my attention this week.

MIT makes a big data push

MIT unveiled its big data research plans this week with a new initiative: bigdata@csail. CSAIL is the university's Computer Science and Artificial Intelligence Laboratory. According to the initiative's website, the project will "identify and develop the technologies needed to solve the next generation data challenges which require the ability to scale well beyond what today's computing platforms, algorithms, and methods can provide."

The research will be funded in part by Intel, which will contribute $2.5 million per year for up to five years. As part of the announcement, Massachusetts Governor Deval Patrick added that his state was forming a Massachusetts Big Data initiative that would provide matching grants for big data research, something he hopes will make the state "well-known for big data research."

Cisco's predictions for the Internet

Cisco released its annual forecast for Internet networking. Not surprisingly, Cisco projects massive growth in networking, with annual global IP traffic reaching 1.3 zettabytes by 2016. "The projected increase of global IP traffic between 2015 and 2016 alone is more than 330 exabytes," according to the company's press release, "which is almost equal to the total amount of global IP traffic generated in 2011 (369 exabytes)."

Cisco points to a number of factors contributing to the explosion, including more Internet-connected devices, more users, faster Internet speeds, and more video.

Open data startup Junar raises funding

The Chilean data startup Junar announced this week that it had raised a seed round of funding. The startup is an open data platform with the goal of making it easy for anyone to collect, analyze, and publish. GigaOm's Barb Darrow writes:

"Junar's Open Data Platform promises to make it easier for users to find the right data (regardless of its underlying format); enhance it with analytics; publish it; enable interaction with comments and annotation; and generate reports. Throughout the process it also lets user manage the workflow and track who has accessed and downloaded what, determine which data sets are getting the most traction etc."

Junar joins a number of open data startups and marketplaces that offer similar or related services, including Socrata and DataMarket.

Have data news to share?

Feel free to email me.

OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.

Save 20% on registration with the code RADAR

May 24 2012

Knight Foundation grants $2 million for data journalism research

Every day, the public hears more about technology and media entrepreneurs, from when they started in the garages and the dorm rooms, all the way up until when they go public, get acquired or go spectacularly bust. The way that the world mourned the passing of Steve Jobs last year and that young people now look to Mark Zuckerberg as a model for what's possible offer some insight into that dynamic.

For those who want to follow in their footsteps, the most interesting elements of those stories will be the muddy details of who came up with the idea, who wrote the first lines of code, who funded them, how they were mentored and then how the startup executed upon their ideas.

Today, foundations and institutions alike are getting involved in the startup ecosystem, but with a different hook than the venture capitalists on Sand Hill Road in California or Y Combinator: They're looking for smart, ambitious social entrepreneurs who want to start civic startups and increase the social capital of the world. From the Code for America Civic Accelerator to the Omidyar Foundation to Google.org to the Knight Foundation's News Challenge, there's more access to seed capital than ever before.

There are many reasons to watch what the Knight Foundation is doing, in particular, as it shifts how it funds digital journalism projects. The foundation's grants are going toward supporting many elements of the broader open government movement, from civic media to government transparency projects to data journalism platforms.

Many of these projects — or elements and code from them — have a chance at becoming part of the plumbing of digital democracy in the 21st century, although we're still on the first steps of the long road of that development.

This model for catalyzing civic innovation in the public interest is, in the broader sweep of history, still relatively new. (Then again, so is the medium you're reading this post on.) One barrier that the Internet has helped lower is in the process of discovering and selecting good ideas to fund and letting bad ideas fall to the wayside. Another is changing how ideas are capitalized through microfunding approaches or how distributing opportunities for participation in helping products or services go to market now can happen though crowdfunding platforms like Kickstarter.

When the Pebble smartwatch received $10 million through Kickstarter this year, it offered a notable data point into how this model could work. We'll see how others follow.

These models could contribute to the development of small pieces of civic architecture around the world, loosely joining networks in civil society with mobile technology, lightweight programming languages and open data.

After years of watching how the winners of the Knight News Challenges have — or have not — contributed to this potential future, its architects are looking at big questions: How should resources be allocated in newsrooms? What should be measured? Are governments more transparent and accountable due to the use of public data by journalists? What data is available? What isn't? What's useful and relevant to the lives of citizens? How can data visualization, news applications and interactive maps inform and engage readers?

In the context of these questions, the fact that the next Knight News Challenge will focus on data will create important new opportunities to augment the practice of journalism and accelerate the pace of open government. John Bracken (@jsb), the Knight Foundation's program director for journalism and media innovation, offered an explanation for this focus on the foundation's blog:

"Knight News Challenge: Data is a call for making sense of this onslaught of information. 'As data sits teetering between opportunity and crisis, we need people who can shift the scales and transform data into real assets,' wrote Roger Ehrenberg earlier this year.

"Or, as danah boyd has put it, 'Data is cheap, but making sense of it is not.'

"The CIA, the NBA's Houston Rockets, startups like BrightTag and Personal ('every detail of your life is data') — they're all trying to make sense out of data. We hope that this News Challenge will uncover similar innovators discovering ways for applying data towards informing citizens and communities."

Regardless of what happens with this News Challenge, some of those big data questions stand a much better chance of being answered because of the Knight Foundation's $2 million grant to Columbia University to research and distribute best practices for digital reporting, data visualizations and measuring impact.

Earlier this spring, I spoke with Emily Bell, the director of the Tow Center for Digital Journalism, about how this data journalism research at Columbia will close the data science "skills gap" in newsrooms. Bell is now entrusted with creating the architecture for learning that will teach the next generation of data journalists at Columbia University.

In search of the reasoning behind the grant, I talked to Michael Maness (@MichaelManess), vice president of journalism and media innovations at the Knight Foundation. Our interview, lightly edited for content and clarity, follows.

The last time I checked, you're in charge of funding ideas that will make the world better through journalism and technology. Is that about right?

Michael Maness: That's the hope. What we're trying to do is make sure that we're accelerating innovation in the journalism and media space that continues to help inform and engage communities. We think that's vital for democracy. What I do is work on those issues and fund ideas around that to not only make it easier for journalists to do their work, but citizens to engage in that same practice.

The Knight News Challenge has changed a bit over the last couple of years. How has the new process been going?

Michael Maness: I've been in the job a little bit more than a year. I came in at the tail end of 2011 and the News Challenge of 2011. We had some great winners, but we noticed that in the amount of time from when you applied in the News Challenge to when you were funded could be up to 10 months, by the time everything was done, and certainly eight months in terms of the process. So we reduced that to about 10 weeks. It's intense for the judges to do that, but we wanted to move more quickly, recognizing the speed of disruption and the energy of innovation and how fast it's moving.

We've also switched to a thematic theme. We're going to do three [themes] this year. The point of it is to fund as fast as possible those ideas that we think are interesting and that we think will have a big impact.

This last round was around networks. The reason we focused on networks is the apparent rise of network power. The second reason is we get people, for example, that say, "This is the new Twitter for X" or "This is the new Facebook for journalists." Our point is actually, you should be using and leveraging existing things for that.

We found when we looked back at the last five years of the News Challenge that people who came in with networks or built networks in accordance with what they're doing had a higher and faster scaling rate. We want to start targeting areas to do that, too.

We hear a lot about entrepreneurs, young people and the technology itself, but schools and libraries seem really important to me. How will existing institutions be part of the future that you're funding and building?

Michael Maness: One of the things that we're doing is moving into more "prototyping" types of grants and then finding ways of scaling those out, helping get ideas into a proof-of-concept phase so users kick the tires and look for scaling afterward.

In terms of the institutions, one of the things that we've seen that's been a bit of a frustration point is making sure that when we have innovations, [we're] finding the best ways to parlay those into absorption in these kinds of institutions.

A really good standout for that, from a couple years ago as a News Challenge winner, is DocumentCloud, which has been adopted by a lot of the larger legacy media institutions. From a university standpoint, we know one of the things that is key is getting involvement with students as practitioners. They're trying these things out and they're doing the two kinds of modeling that we're talking about. They're using the newest tools in the curriculum.

That's one of the reasons we made the grant [to Columbia.] They have a good track record. The other reason is that you have a real practitioner there with Emily Bell, doing all of her digital work from The Guardian and really knowing how to implement understandings and new ways of reporting. She's been vital. We see her as someone who has lived in an actual newsroom, pulling in those digital projects and finding new ways for journalists to implement them.

The other aspect is that there are just a lot of unknowns in this space. As we move forward, using these new tools for data visualization, for database reporting, what are the things that work? What are the things that are hard to do? What are the ideas that make the most impact? What efficiencies can we find to help newsrooms do it? We didn't really have a great body of knowledge around that, and that's one of the things that's really exciting about the project at Columbia.

How will you make sure the results of the research go beyond Columbia's ivy-covered walls?

Michael Maness: That was a big thing that we talked about, too, because it's not in us to do a lot of white papers around something like this. It doesn't really disseminate. A lot of this grant is around making sure that there are convocations.

We talk a lot about the creation of content objects. If you're studying data visualization, we should be making sure that we're producing that as well. This will be something that's ongoing and emerging. Definitely, a part of it is that some of these resources will go to hold gatherings, to send people out from Columbia to disseminate [research] and also to produce findings in a way that can be moved very easily around a digital ecosystem.

We want to make sure that you're running into this work a lot. This is something that we've baked into the grant, and we're going to be experimenting with, I think, as it moves forward. But I hear you, that if we did all of this — and it got captured behind ivy walls — it's not beneficial to the industry.

Related:

Four short links: 24 May 2012

  1. Last Saturday My Son Found His People at the Maker Faire -- aww to the power of INFINITY.
  2. Dictionaries Linking Words to Concepts (Google Research) -- Wikipedia entries for concepts, text strings from searches and the oppressed workers down the Text Mines, and a count indicating how often the two were related.
  3. Magic Wand (Kickstarter) -- I don't want the game, I want a Bluetooth magic wand. I don't want to click the OK button, I want to wave a wand and make it so! (via Pete Warden)
  4. E-Commerce Performance (Luke Wroblewski) -- If a page load takes more than two seconds, 40% are likely to abandon that site. This is why you should follow Steve Souders like a hawk: if your site is slower than it could be, you're leaving money on the table.

May 22 2012

Four short links: 22 May 2012

  1. New Zealand Government Budget App -- when the NZ budget is announced, it'll go live on iOS and Android apps. Tablet users get details, mobile users get talking points and speeches. Half-political, but an interesting approach to reaching out to voters with political actions.
  2. Health Care Data Dump (Washington Post) -- 5B health insurance claims (attempted anonymized) to be released. Researchers will be able to access that data, largely using it to probe a critical question: What makes health care so expensive?
  3. Perl 5.16.0 Out -- two epic things here: 590k lines of changes, and announcement quote from Auden. Auden is my favourite poet, Perl my favourite programming language.
  4. WYSIHTML5 (GitHub) -- wysihtml5 is an open source rich text editor based on HTML5 technology and the progressive-enhancement approach. It uses a sophisticated security concept and aims to generate fully valid HTML5 markup by preventing unmaintainable tag soups and inline styles.

May 17 2012

Four short links: 17 May 2012

  1. The Mythology of Big Data (PDF) -- slides from a Strata keynote by Mark R. Madsen. A lovely explanation of the social impediments to the rational use of data. (via Hamish MacEwan)
  2. Scamworld -- amazing deconstruction of the online "get rich quick" scam business. (via Andy Baio)
  3. Ceres: Solving Complex Problems with Computing Muscle -- Johnny Lee Chung explains the (computer vision) uses of the open source Ceres Non-Linear Least Squares Solver library from Google.
  4. How to Start a Think Tank (Guardian) -- The answer to the looming crisis of legitimacy we're facing is greater openness - not just regarding who met who at what Christmas party, but on the substance of policy. The best way to re-engage people in politics is to change how politics works - in the case of our project, to develop a more direct way for the people who use and provide public and voluntary services to create better social policy. Hear, hear. People seize on the little stuff because you haven't given them a way to focus something big with you.

May 16 2012

The chicken and egg of big data solutions

Before I came to O'Reilly I was building the "big data and disruptive analytics practice" at a major systems integrator. It was a blast to spend every week talking to customers in different industries who were waking up to the possibilities that technologies like Hadoop offered their businesses. Many of these businesses are going to fundamentally change as they embrace this stuff (or be replaced by those that do). But there's a catch.

Twenty years or so ago large integrators made big business building applications on the then-new relational paradigm. They put in Oracle databases with custom code, wrote PowerBuilder apps on Sybase, and of course lots of businesses rolled their own with VB and SQL Server. It was an era of custom coding where Oracle, Sybase, SQL Server, Informix and etc. were thought of as platforms to build stuff on.

Then the market matured and shifted to package solution implementation. ERP, CRM, …, etc. The big guys focused on integrating again and told their clients there was no ROI in building custom stuff. ROI would come from integrating best-of-breed solutions. Databases became commodity back ends to the applications that were always the real focus.

Now along comes big data, NoSQL, data science, and all that stuff and it seems like we're starting the cycle over again. But this time clients, having been well trained over the last decade or so, aren't having any of that "build it from scratch" mentality. They know that Hadoop and other new technologies can be transformative to their business, but they want it packaged up and solution'ified like they are used to. I heard a lot of "let us know when you have a solution already built or available to buy that does X" in the last year.

Also, lots of the shops that do this stuff at scale are built and staffed around the package implementation model and have shed many of the skills they used to have for custom work. Everything from staffing models to methodologies are oriented toward package installation.

So, here we are with all of this disruptive technology, but we seem to have lost the institutional wherewithal to do anything with it in a lot of large companies. Of course that fact was hard on my numbers. I had a great pipeline of companies with pain to solve, and great technologies to solve it, but too much of the time it was hard to close it without readymade solutions.

Every week I talked to the companies building these new platforms to share leads and talk about their direction. After a while I started cutting them off when they wanted to talk about the features of their next release. I just got to the point where I didn't really care, it just wasn't all that relevant to my customers. I mean, it's important that they are making the platforms more manageable and building bridges to traditional BI, ETL, RDBMS, and the like. But the focus was too much on platforms and tools.

I wanted to know "What are you doing to encourage solution development? Are you staffing a support system for ISVs? What startups and/or established players are you aware of that are building solutions on this platform?" So when I saw this tweet I let out a little yelp. Awesome! The lack of ready-to-install solutions was getting attention, and from Mike Olsen.



You can watch the rest of what Mike Olson said here and you'll find he tells a similar story about the RDBMS historical parallel.

I talked to Mike a few weeks ago to find out what was behind his comment and explore what else they are doing to support solution development. It boils down to what he said — he will help connect you with money — plus a newly launched partner program designed to provide better support to ISVs among others. Also, the continued attention to APIs and tools like Pig and Hive should make it easier for the solution ecosystem to develop. It can only be good for his business to have lots of other companies directly solving business problems, and simply pulling in his platform.

Hortonworks also started a partner program in the fall and I think we'll see a lot more emphasis on this across the space this year. However, at the moment wherever I look (Hortonworks partners, Cloudera Partners, Accel big data portfolio) the focus today remains firmly on platform and tools or partnering with integrators. Tresata, a startup focused on financial risk management, pops up in in a lot of lists as the obvious odd one out — an actual domain-specific solution.

What about other people that could be building solutions? Is it the maturity level of the technology, the lack of penetration of Hadoop etc. into your customer's data centers, or some combination of other factors that is slowing things down?

Of course, during the RDBMS adoption it took a lot of years before the custom era was over and thoroughly replaced by the era of package implementation. The question I'm pondering is whether customer expectations and the pace of technology will make it happen faster this time? Or is the disruptive value of big data going to continue to accrue only to risk-taking early adopters for the foreseeable future?

If you are building a startup based on a solution or application that leverages big data technology, and you aren't being stealthy, I'd love to hear about it in the comments.

Related:

May 11 2012

Four short links: 11 May 2012

  1. Stanford Med School Contemplates Flipped Classroom -- the real challenge isn't sending kids home with videos to watch, it's using tools like OceanBrowser to keep on top of what they're doing. Few profs at universities have cared whether students learned or not.
  2. Inclusive Tech Companies Win The Talent War (Gina Trapani) -- she speaks the truth, and gently. The original CNN story flushed out an incredible number of vitriolic commenters apparently lacking the gene for irony.
  3. Buyers and Sellers Guide to Web Design and Development Firms (Lance Wiggs) -- great idea, particularly "how to be a good client". There are plenty of dodgy web shops, but more projects fail because of the clients than many would like to admit.
  4. What Does It Mean to Say That Something Causes 16% of Cancers? (Discover Magazine) -- hey, all you infographic jockeys with your aspirations to add Data Scientist to your business card: read this and realize how hard it is to make sense of a lot of numbers and then communicate that sense. Data Science isn't about Hadoop any more than Accounting is about columns. Both try to tell a story (the original meaning of your company's "accounts") and what counts is the informed, disciplined, honest effort of knowing that your story is honest.

May 02 2012

Four short links: 2 May 2012

  1. Punting on SxSW (Brad Feld) -- I came across this old post and thought: if you can make money by being a dick, or make money by being a caring family person, why would you choose to be a dick? As far as I can tell, being a dick is optional. Brogrammers, take note. Be more like Brad Feld, who prioritises his family and acts accordingly.
  2. Probabilistic Structures for Data Mining -- readable introduction to useful algorithms and datastructures showing their performance, reliability, and resources trade-off. (via Hacker News)
  3. Dataset -- a Javascript library for transforming, querying, manipulating data from different sources.
  4. Many HTTPS Servers are Insecure -- 75% still vulnerable to the BEAST attack.

April 27 2012

Publishing News: Tor sets content free

Here are a few stories that caught my eye in the publishing space this week.

Tor breaks the stick

Tor-Forge-Logo.JPGJoe Wikert, O'Reilly GM and publisher, asked this week, "What if DRM goes away?" As kismet would have it, publisher Tom Doherty Associates, which publishes popular science fiction/fantasy imprint Tor under Macmillan, stepped up to drop DRM and find out. An announcement post on Tor.com stated that by July, the company's "entire list of e-books will be available DRM-free." President and publisher Tom Doherty said for the announcement:

"Our authors and readers have been asking for this for a long time. They're a technically sophisticated bunch, and DRM is a constant annoyance to them. It prevents them from using legitimately purchased e-books in perfectly legal ways, like moving them from one kind of e-reader to another."

Author Cory Doctorow said the move "might be the watershed for ebook DRM, the turning point that marks the moment at which all ebooks end up DRM-free. It's a good day." Author Charlie Stross took a look at the big picture and what this might mean not only for the future of publishers, but for book retailers, supply chains and ebook reading technology. In part, he said the oligopoly may be in jeopardy:

"Longer term, removing the requirement for DRM will lower the barrier to entry in ebook retail, allowing smaller retailers (such as Powells) to compete effectively with the current major incumbents. This will encourage diversity in the retail sector, force the current incumbents to interoperate with other supply sources (or face an exodus of consumers), and undermine the tendency towards oligopoly. This will, in the long term, undermine the leverage the large vendors currently have in negotiating discount terms with publishers while improving the state of midlist sales."

Jeremy Trevathan, publisher at Tor UK's parent Pan Macmillan, told The Guardian that Macmillan has "no thought of extending [the drop of DRM] beyond science fiction and fantasy publishing. But it's in the air. We've not talked about this to other publishers, but I can't imagine they haven't been thinking about this, too."

The future of publishing has a busy schedule.
Stay up to date with Tools of Change for Publishing events, publications, research and resources. Visit us at oreilly.com/toc.

Harvard offers up big data and open access research

Harvard University recently made a couple of notable moves to open up access to its data and research. Last week, Harvard's faculty and advisory council sent a memo to faculty members regarding periodical subscriptions. The memo opened: "We write to communicate an untenable situation facing the Harvard Library. Many large journal publishers have made the scholarly communication environment fiscally unsustainable and academically restrictive."

Ian Sample at The Guardian reported:

"According to the Harvard memo, journal subscriptions are now so high that to continue them 'would seriously erode collection efforts in many other areas, already compromised.' The memo asks faculty members to encourage their professional organisations to take control of scholarly publishing, and to consider submitting their work to open access journals and resigning from editorial boards of journals that are not open access."

This week, The New York Times (NYT) reported that "Harvard is making public the information on more than 12 million books, videos, audio recordings, images, manuscripts, maps, and more things inside its 73 libraries." Access to this volume of metadata is likely to fuel innovation for developers. The NYT report stated:

"At a one-day test run with 15 hackers working with information on 600,000 items, [David Weinberger, co-director of Harvard's Library Lab] said, people created things like visual timelines of when ideas became broadly published, maps showing locations of different items, and a 'virtual stack' of related volumes garnered from various locations."

The post noted the "metadata will be available for bulk download both from Harvard and from the Digital Public Library of America, which is an effort to create a national public library online."

News scoops for sale or rent

There also was a dustup in the news space this week. It began with Felix Salmon's post at Reuters suggesting the New York Times could rake in revenue by selling advance access to its feature stories to hedge funds. (This was all brought on by the newspaper's feature piece on a Wal-Mart bribe inquiry on a Saturday and the market response the following Monday.)

Salmon argued:

"The main potential problem I see here is that if such an arrangement were in place, corporate whistleblowers might be risking prosecution as insider traders. But I'm sure the lawyers could work that one out. The church-lady types would I'm sure faint with horror. But if hedge funds are willing to pay the NYT large sums of money to be able to get a glimpse of stories before they're made fully public, what fiduciary could simply turn such hedge funds away?"

GigaOm's Mathew Ingram posted a response from a journalism ethics standpoint:

"One of the things that bothers me about this idea is that I think there is still some kind of public-service or public-policy value in journalism, and especially the news — I don't think it is just another commodity that should be designed to make as much money as possible. And if the New York Times were to take stories that are arguably of social significance and provide them to hedge funds in advance, I think that would make it a very different type of entity than it is now. What if it was a story about a dangerous drug or national security?"

Salmon posted a follow-up argument, in part responding to Ingram:

"The journalism-ethics angle to this hasn't really been fleshed out, though. Mathew Ingram, for instance, says that if news is being put out in the public service, then it shouldn't be 'just another commodity'; if the NYT were to go down this road, then 'that would make it a very different type of entity than it is now.' It's all very vague and hand-wavey."

All three posts in this back-and-forth exchange (here, here and here) as well as the debate on Twitter that Ingram storified here are well worth the read.

Related:

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl