Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

April 29 2013

Google Glass and the Future

I just read a Forbes article about Glass, talking about the split between those who are “sure that it is the future of technology, and others who think society will push back against the technology.”

I don’t see this as a dichotomy (and, to be fair, I’m not sure that the author does either). I expect to see both, and I’d like to think a bit more about what these two apparently opposing sides mean.

Push back is inevitable. I hope there’s a significant push back, and that it has some results. Not because I’m a Glass naysayer, but because we, as technology users, are abused so often, and push back so weakly, that it’s not funny. Facebook does something outrageous; a few technorati whine; they add option 1023 to their current highly intertwined 1022 privacy options that have been designed so they can’t be understood or used effectively; and sooner or later, it all dies down. A hundred fifty users have left Facebook, and half a million more have joined. When Apple puts another brick in their walled garden, a few dozen users (myself included) bitch and moan, but does anyone leave? Personally, I’m tired of getting warnings whenever I install software that doesn’t come from the Apple Store (I’ve used the Store exactly twice), and I absolutely expect that a not-too-distant version of OS X users won’t me allow to install software from “untrusted” sources, including software I’ve written. Will there be push back? Probably. Will it be effective? I don’t know; if things go as they are now, I doubt it.

There will be push back against Glass; and that’s a good thing. I think Google, of all the companies out there, is most likely to listen and respond positively. I say that partly because of efforts like the Data Liberation Front, and partly because Eric Schmidt has acknowledged that he finds many aspects of Glass creepy. But going beyond Glass: As a community of users, we need to empower ourselves to push back. We need to be able to push back effectively against Google, but more so against Apple, Facebook, and many other abusers of our data, rather than passively accept the latest intrusion as an inevitability. If Glass does nothing more than teach users that they can push back, and teach large corporations how to respond constructively, it will have accomplished much.

Is Glass the future? Yes; at least, something like Glass is part of the future. As a species, we’re not very good at putting our inventions back into the box. About three years ago, there was a big uptick in interest in augmented reality. You probably remember: Wikitude, Layar, and the rest. You installed those apps on your phone. They’re still there. You never use them (at least, I don’t). The problem with consumer-grade AR up until now has been that it was sort of awkward walking around looking at things through your phone’s screen. (Commercial AR–heads-up displays and the like–is a completely different ball game.) Glass is the first attempt at broadly useful platform for consumer AR; it’s a game changer.

Is it possible that Glass will fail? Sure; I know more failed startups than I can count where the engineers did something really cool, and when they released it, the public said “what is that, and why do you think we’d want it?” Google certainly isn’t immune from that disease, which is endemic to an engineering-driven culture; just think back to Wave. I won’t deny that Google might shelve Glass if they consider unproductive, as they’ve shelved many popular applications. But I believe that Google is playing long-ball here, and thinking far beyond 2014 or 2015. In a conversation about Bitcoin last week, I said that I doubt it will be around in 20 years. But I’m certain we will have some kind of distributed digital currency, and that currency will probably look a lot like Bitcoin. Glass is the same. I have no doubt that something like Glass is part of our future. It’s a first, tentative, and very necessary step into a new generation of user interfaces, a new way of interacting with computing systems and integrating them into our world. We probably won’t wear devices around on our glasses; it may well be surgically implanted. But the future doesn’t happen if you only talk about hypothetical possibilities. Building the future requires concrete innovation, building inconvenient and “creepy” devices that nevertheless point to the next step. And it requires people pushing back against that innovation, to help developers figure out what they really need to build.

Glass will be part of our future, though probably not in its current form. And push back from users will play an essential role in defining the form it will eventually take.

October 11 2012

Culture transmission is bi-directional

I read this piece in the New York Times the other day and have read it two or three more times since then. It dives into the controversy around DARPA’s involvement in hacker space funding. But frankly, every time I come across this controversy, I’m baffled.

I usually associate this sort of government distrust with Tea Party-led Republicans. The left, and even many of us in the middle, generally have more faith in government institutions. We’re more likely to view government as a tool to implement the collective will of the people. Lots of us figure that government is necessary, or at least useful, to accomplish things that are too big or hairy for any other group of citizens to achieve (in fact, a careful reading of Hayek will show even he thought so – commence comment flame war in 3 ..2 ..1 …).

So, to summarize, the right dislikes big government and typically the left embraces it. At least, right up until the moment the military is involved. Then the right worships big government (largely at the temple of the History Channel) and the left despises it.

Of course, I don’t know anything about the politics of the people criticizing this DARPA funding, just that they are worried that defense money will be a corrupting influence on the maker movement. Which would imply that they think Defense Department values are corrupting. And they might be right to have some concerns. While the U.S. military services are probably the single most competent piece of our entire government, the defense industrial complex that equips them is pretty damned awful. It’s inefficient, spends more time on political than actual engineering, and is where most of the world’s bad suits go to get rumpled. And there is no doubt that money is a vector along which culture and values will readily travel, so I suppose it’s reasonable to fear that the maker movement could be changed by it.

But what everyone seems to be missing is that this isn’t a one-way process and the military, via DARPA, is essentially saying “we want to absorb not just your technology but the culture of openness by which you create it.” That’s an amazing opportunity and shouldn’t be ignored. The money is one vector, but the interactions, magical projects, and collaboration are another, perhaps more powerful vector, along which the values of the maker movement can be swabbed directly into one of the most influential elements of our society. This is opportunity!

O’Reilly is participating in the DARPA MENTOR program and Dale has already discussed our involvement at length. So I need to disclose it, but this post isn’t about that. This post is about the idea that the military has been a change agent in our society many times before. This is an opportunity to do it again and for makers to influence how it happens.

For quite a few years, I worked in the defense space and, frankly, took a lot of crap for it from my friends on the left coast. But I always felt that the military was an important part of American society regardless of whether you agreed with its purpose or actual use, and that the best way to counter its less desirable tendencies was to engage with it. So while I worked my day job I also spent those years aggressively advocating open source software, emergent and incremental software processes, and “permissionless programming” web platforms for the DoD. I thought that the military could benefit from all of these things, but I also explicitly felt that they were a vector along which the cultural attributes of openness, transparency, and experimentation would readily travel. Those open and emergent ideas were a culture virus and I intended to shed them everywhere I could.

If you’re a technologist, you know that the military has always pushed the envelope. Silicon Valley itself began with Stanford’s government partnership during the Second World War. The world’s first interactive computer was Whirlwind, a component piece of the massive air defense program SAGE. So, if your vision is to unleash a democratized third industrial revolution based on the maker model, this is your opportunity. If you can insert open culture and values into the defense establishment at the same time, even better.

September 20 2012

Congress launches Congress.gov in beta, doesn’t open the data

The Library of Congress is now more responsive — at least when it comes to web design. Today, the nation’s repository for its laws launched a new beta website at Congress.gov and announced that it would eventually replace Thomas.gov, the 17-year-old website that represented one of the first significant forays online for Congress. The new website will educate the public looking for information on their mobile devices about the lawmaking process, but it falls short of the full promise of embracing the power of the Internet. (More on that later).

Tapping into a growing trend in government new media, the new Congress.gov features responsive design, adapting to desktop, tablet or smartphone screens. It’s also search-centric, with Boolean search and, in an acknowledgement that most of its visitors show up looking for information, puts a search field front and center in the interface. The site includes member profiles for U.S. Senators and Representatives, with associated legislative work. In a nod to a mainstay of social media and media websites, the new Congress.gov also has a “most viewed bills” list that lets visitors see at a glance what laws or proposals are gathering interest online. (You can download a fact sheet on all the changes as a PDF).

On the one hand, the new Congress.gov is a dramatic update to a site that desperately needed one, particularly in a historic moment where citizens are increasingly connecting to the Internet (and one another) through their mobile devices.

On the other hand, the new Congress.gov beta has yet to realize the potential of Congress publishing bulk open legislative data. There is no application programming interface (API) for open government developers to build upon. In many ways, the new Congress.gov replicates what was already available to the public at sites like Govtrack.us and OpenCongress.org.

In response to my tweets about the site, former law librarian Meg Lulofs Kuhagan (@librarylulu) noted on Twitter that there’s “no data whatsoever, just window dressing” in the new site — but that “it looks good on my phone. More #opengov if you have a smartphone.”

Aaron E. Myers, the director of new media for Senator Major Leader Harry Reid, commented on Twitter that legislative data is a “tough nut to crack,” with the text of amendments, SCOTUS votes and treaties missing from new Congress.gov. In reply, Chris Carlson, the creative director for the Library of Congress, tweeted that that information is coming soon and that all the data that is currently in Thomas.gov will be available on Congress.gov.

Emi Kolawole, who reviewed the new Congress.gov for the Washington Post, reported that more information, including the categories Meyers cited, will be coming to the site soon, during its beta, including the Congressional Record and Index. Here’s hoping that Congress decides to publish all of its valuable Congressional Research Reports, too. Currently, the public has to turn to OpenCRS.com to access that research.

Carlson was justifiably proud of the beta of Congress.gov: “The new site has clean URLs, powerful search, member pages, clean design,” he tweeted. “This will provide access to so many more people who only have a phone for internet.”

While the new Congress.gov is well designed and has the potential to lead to more informed citizens, the choice to build a new website versus release the data disappointed some open government advocates.

“Another hilarious/clueless misallocation of resources,” commented David Moore, co-founder of OpenCongress. “First liberate bulk open gov data; then open API; then website.”

“What’s noticeable about this evolving beta website, besides the major improvements in how people can search and understand legislative developments, is what’s still missing: public comment on the design process and computer-friendly bulk access to the underlying data,” wrote Daniel Schuman, legislative counsel for the Sunlight Foundation. “We hope that Congress will now deeply engage with the public on the design and specifications process and make sure that legislative information is available in ways that most encourage analysis and reuse.”

Kolawole asked Congressional officials about bulk data access and an API and heard that the capacity is there but the approval is not. “They said the system could handle it, but they haven’t received congressional auth. to do it yet,” she tweeted.

Vision and bipartisan support for open government on this issue does exist among Congressional leadership. There has been progress on this front in the 112th Congress: the U.S. House started publishing machine-readable legislative data at docs.house.gov this past January.

“Making legislative data easily available in machine-readable formats is a big victory for open government, and another example of the new majority keeping its pledge to make Congress more open and accountable,” said Speaker of the House John Boehner.

Last December, House Minority Whip Steny Hoyer commented upon on how technology is affecting Congress, his caucus and open government in the executive branch:

For Congress, there is still a lot of work to be done, and we have a duty to make the legislative process as open and accessible as possible. One thing we could do is make THOMAS.gov — where people go to research legislation from current and previous Congresses — easier to use, and accessible by social media. Imagine if a bill in Congress could tweet its own status.

The data available on THOMAS.gov should be expanded and made easily accessible by third-party systems. Once this happens, developers, like many of you here today, could use legislative data in innovative ways. This will usher in new public-private partnerships that will empower new entrepreneurs who will, in turn, yield benefits to the public sector.

For any of that vision of civic engagement and entrepreneurship to can happen around Web, the Library of Congress will need to fully open up the data. Why hasn’t it happened yet, given bipartisan support and a letter from the Speaker of the House?

techPresident managing editor Nick Judd asked the Library of Congress about Congress.gov. The director of the communications for the Library of Congress, Gayle Osterberg, suggested in an email in response that Congress hasn’t been clear about the manner for data release.

“Congress has said what to do on bulk access,” commented Schuman. “See the joint explanatory statement. “There is support for bulk access.”

In June 2012, the House’s leadership has issued a bipartisan statement that adopted the goal of “provid[ing] bulk access to legislative information to the American people without further delay,” putting releasing bulk data among its “top priorities in the 112th Congress” and directed a task force “to begin its important work immediately.”

The 112th Congress will come to a close soon. The Republicans swept into the House in 2010 promising a new era of innovation and transparency. If Speaker Boehner, Rep. Hoyer and their colleagues want to end these two divisive years on a high note, fully opening legislative data to the People would be an enduring legacy. Congressional leaders will need to work with the Library of Congress to make that happen.

All that being said, the new Congress.gov is in beta and looks dramatically improved. The digital infrastructure of the federal legislative system got a bit better today, moving towards a more adaptive government. Stay tuned, and give the Library of Congress (@LibraryCongress) some feedback: there’s a new button for it on every page.

This post has been updated with comments from Facebook, a link and reporting from techPresident, and a clarification from Daniel Schuman regarding the position of the House of Representatives.

August 29 2012

President Obama participates in first Presidential AMA on Reddit

Starting around 4:30 PM ET today, President Barack Obama made history by going onto Reddit to answer questions about anything for an hour. Reddit, one of the most popular social news sites on the Internet, has been hosting “Ask Me Anything” forums — or AMAs – for years, including sessions with prominent legislators like Representative Darrell Issa (R-CA), but to host a sitting President of the United States will elevate Reddit’s prominence in the intersection of technology and politics. AllThingsD has the story of Reddit got the President onto the site. Reddit co-founder Alexis Ohanian told Peter Kafka that “There are quite a few redditors at 1600 Pennsylvania Ave and at the campaign HQ — given the prominence of reddit, it’s an easy sell.”

President Obama made some news in the process, with respect to the Supreme Court decision that allowed super political action committees, or “Super PACs,” to become part of the campaign finance landscape.

“Over the longer term, I think we need to seriously consider mobilizing a constitutional amendment process to overturn Citizens United (assuming the Supreme Court doesn’t revisit it),” commented President Obama. “Even if the amendment process falls short, it can shine a spotlight of the super-PAC phenomenon and help apply pressure for change.”

President Obama announced that he’d be participating in the AMA in a tweet and provided photographic evidence that he was actually answering questions in an image posted to Reddit (above) and in a second tweet during the session.

The timing of the AMA was at least a little political, coming after a speech in Virginia and falling upon the third day of the Republic National Convention, but it is unequivocally a first, in terms of a president directly engaging with the vibrant Reddit community. Many people also tweeted that they were having trouble accessing the page during the AMA, as tens of thousands of users tried to access the forum. According to The Verge, President Obama’s AMA was the most popular post in Reddit’s history, with more than 200,000 visitors on the site concurrently. (Presidential Q&As apparently melts servers almost as much as being Biebered.)

Today’s AMA is only the latest example of presidents experimenting with online platforms, from President Clinton and President Bush posting text on WhiteHouse.gov to President Obama joining rebooting that platform on Drupal. More recently, President Obama has participated in a series of online ‘town halls’ using social media, including Twitter, Facebook, LinkedIn and the first presidential Hangout on Google+.

His use of all them deserves to be analyzed critically, in terms of whether the platforms and events were being used to shine the credential of a tech-savvy chief executive in an election year or to genuinely answer the questions and concerns of the citizens he serves.

In analyzing the success of such experiment in digital democracy, it’s worth looking at whether the questions answered were based upon the ones most citizens wanted to see asked (on Reddit, counted by upvotes) and whether the answers given were rehashed talking points or specific to the intent of the questions asked. On the first part of that rubric, President Obama scored high: he answered each of the top-voted questions in the AMA, along with a few personal ones.

 

On the rest of those counts, you can judge for yourself. The president’s answers are below:

“Hey everybody – this is barack. Just finished a great rally in Charlottesville, and am looking forward to your questions. At the top, I do want to say that our thoughts and prayers are with folks who are dealing with Hurricane Isaac in the Gulf, and to let them know that we are going to be coordinating with state and local officials to make sure that we give families everything they need to recover.”

On Internet freedom: “Internet freedom is something I know you all care passionately about; I do too. We will fight hard to make sure that the internet remains the open forum for everybody – from those who are expressing an idea to those to want to start a business. And although their will be occasional disagreements on the details of various legislative proposals, I won’t stray from that principle – and it will be reflected in the platform.”

On space exploration: “Making sure we stay at the forefront of space exploration is a big priority for my administration. The passing of Neil Armstrong this week is a reminder of the inspiration and wonder that our space program has provided in the past; the curiosity probe on mars is a reminder of what remains to be discovered. The key is to make sure that we invest in cutting edge research that can take us to the next level – so even as we continue work with the international space station, we are focused on a potential mission to a asteroid as a prelude to a manned Mars flight.”

On helping small businesses and relevant bills: “We’ve really focused on this since I came into office – 18 tax cuts for small business, easier funding from the SBA. Going forward, I want to keep taxes low for the 98 percent of small businesses that have $250,000 or less in income, make it easier for small business to access financing, and expand their opportunities to export. And we will be implementing the Jobs Act bill that I signed that will make it easier for startups to access crowd-funding and reduce their tax burden at the start-up stage.”

Most difficult decision you had to make this term? ”The decision to surge our forces in afghanistan. Any time you send our brave men and women into battle, you know that not everyone will come home safely, and that necessarily weighs heavily on you. The decision did help us blunt the taliban’s momentum, and is allowing us to transition to afghan lead – so we will have recovered that surge at the end of this month, and will end the war at the end of 2014. But knowing of the heroes that have fallen is something you never forget.”

On the influence of money in politics ”Money has always been a factor in politics, but we are seeing something new in the no-holds barred flow of seven and eight figure checks, most undisclosed, into super-PACs; they fundamentally threaten to overwhelm the political process over the long run and drown out the voices of ordinary citizens. We need to start with passing the Disclose Act that is already written and been sponsored in Congress – to at least force disclosure of who is giving to who. We should also pass legislation prohibiting the bundling of campaign contributions from lobbyists. Over the longer term, I think we need to seriously consider mobilizing a constitutional amendment process to overturn Citizens United (assuming the Supreme Court doesn’t revisit it). Even if the amendment process falls short, it can shine a spotlight of the super-PAC phenomenon and help apply pressure for change.”

On prospects for recent college grads – in this case, a law school grad: I understand how tough it is out there for recent grads. You’re right – your long term prospects are great, but that doesn’t help in the short term. Obviously some of the steps we have taken already help young people at the start of their careers. Because of the health care bill, you can stay on your parent’s plan until you’re twenty six. Because of our student loan bill, we are lowering the debt burdens that young people have to carry. But the key for your future, and all our futures, is an economy that is growing and creating solid middle class jobs – and that’s why the choice in this election is so important. The other party has two ideas for growth – more taxs cuts for the wealthy (paid for by raising tax burdens on the middle class and gutting investments like education) and getting rid of regulations we’ve put in place to control the excesses on wall street and help consumers. These ideas have been tried, they didnt work, and will make the economy worse. I want to keep promoting advanced manufacturing that will bring jobs back to America, promote all-American energy sources (including wind and solar), keep investing in education and make college more affordable, rebuild our infrastructure, invest in science, and reduce our deficit in a balanced way with prudent spending cuts and higher taxes on folks making more than $250,000/year. I don’t promise that this will solve all our immediate economic challenges, but my plans will lay the foundation for long term growth for your generation, and for generations to follow. So don’t be discouraged – we didn’t get into this fix overnight, and we won’t get out overnight, but we are making progress and with your help will make more.”

First thing he’ll do on November 7th: “Win or lose, I’ll be thanking everybody who is working so hard – especially all the volunteers in field offices all across the country, and the amazing young people in our campaign offices.”

How do you balance family life and hobbies with being POTUS? ”It’s hard – truthfully the main thing other than work is just making sure that I’m spending enough time with michelle and the girls. The big advantage I have is that I live above the store – so I have no commute! So we make sure that when I’m in DC I never miss dinner with them at 6:30 pm – even if I have to go back down to the Oval for work later in the evening. I do work out every morning as well, and try to get a basketball or golf game in on the weekends just to get out of the bubble. Speaking of balance, though, I need to get going so I’m back in DC in time for dinner. But I want to thank everybody at reddit for participating – this is an example of how technology and the internet can empower the sorts of conversations that strengthen our democracy over the long run. AND REMEMBER TO VOTE IN NOVEMBER – if you need to know how to register, go to Gottaregister.com. By the way, if you want to know what I think about this whole reddit experience – NOT BAD!”

On +The White House homebrew recipe ”It will be out soon! I can tell from first hand experience, it is tasty.”

A step forward for digital democracy?

The most interesting aspect of that Presidential Hangout was that it introduced the possibility of unscripted moments, where a citizen could ask an unexpected question, and the opportunity for followups, if an answer wasn’t specific enough.

Reddit doesn’t provide quite the same mechanism for accountability at a live Hangout, in terms of putting an elected official on the spot to answer. Unfortunately, the platform of Reddit itself falls short here: there’s no way to force a politician to circle back and give a better answer, in the way, say, Mike Wallace might have on “60 Minutes.”

Alexis Madrigal, one of the sharpest observers of technology and society currently gracing the pages of the Atlantic, is clear about the issues with a Reddit AMA: “it’s a terrible format for extracting information from a politician.”

Much as many would like to believe that the medium determines the message, a modern politician is never unmediated. Not in a pie shop in Pennsylvania, not at a basketball game, not while having dinner, not on the phone with NASA, not on TV, not doing a Reddit AMA. Reddit is not a mic accidentally left on during a private moment. The kind of intimacy and honesty that Redditors crave does not scale up to national politics, where no one ever lets down his or her guard. Instead of using the stiffness and formality of the MSM to drive his message home, Obama simply used the looseness and casual banter of Reddit to drive his message home. Here more than in almost anything else: Tech is not the answer to the problems of modern politics.

Today’s exchange, however, does hint at the tantalizing dynamic that makes it alluring: that the Internet is connecting you and your question to the most powerful man in the world, directly, and that your online community can push for him to answer it.

President Obama ended today’s AMA by thanking everyone on Reddit for participating and wrote that “this is an example of how technology and the internet can empower the sorts of conversations that strengthen our democracy over the long run.”

Well, it’s a start. Thank you for logging on today, Mr. President. Please come back online and answer some more follow up questions.

Reposted byRK RK

August 17 2012

Wall Street’s robots are not out to get you

ABOVE by Lyfetime, on FlickrTechnology is critical to today’s financial markets. It’s also surprisingly controversial. In most industries, increasing technological involvement is progress, not a problem. And yet, people who believe that computers should drive cars suddenly become Luddites when they talk about computers in trading.

There’s widespread public sentiment that technology in finance just screws the “little guy.” Some of that sentiment is due to concern about a few extremely high-profile errors. A lot of it is rooted in generalized mistrust of the entire financial industry. Part of the problem is that media coverage on the issue is depressingly simplistic. Hyperbolic articles about the “rogue robots of Wall Street” insinuate that high-frequency trading (HFT) is evil without saying much else. Very few of those articles explain that HFT is a catchall term that describes a host of different strategies, some of which are extremely beneficial to the public market.

I spent about six years as a trader, using automated systems to make markets and execute arbitrage strategies. From 2004-2011, as our algorithms and technology became more sophisticated, it was increasingly rare for a trader to have to enter a manual order. Even in 2004, “manual” meant instructing an assistant to type the order into a terminal; it was still routed to the exchange by a computer. Automating orders reduced the frequency of human “fat finger” errors. It meant that we could adjust our bids and offers in a stock immediately if the broader market moved, which enabled us to post tighter markets. It allowed us to manage risk more efficiently. More subtly, algorithms also reduced the impact of human biases — especially useful when liquidating a position that had turned out badly. Technology made trading firms like us more profitable, but it also benefited the people on the other sides of those trades. They got tighter spreads and deeper liquidity.

Many HFT strategies have been around for decades. A common one is exchange arbitrage, which Time magazine recently described in an article entitled “High Frequency Trading: Wall Street’s Doomsday Machine?”:

A high-frequency trader might try to take advantage of minuscule differences in prices between securities offered on different exchanges: ABC stock could be offered for one price in New York and for a slightly higher price in London. With a high-powered computer and an ‘algorithm,’ a trader could buy the cheap stock and sell the expensive one almost simultaneously, making an almost risk-free profit for himself.

It’s a little bit more difficult than that paragraph makes it sound, but the premise is true — computers are great for trades like that. As technology improved, exchange arb went from being largely manual to being run almost entirely via computer, and the market in the same stock across exchanges became substantially more efficient. (And as a result of competition, the strategy is now substantially less profitable for the firms that run it.)

Market making — posting both a bid and an offer in a security and profiting from the bid-ask spread — is presumably what Knight Capital was doing when it experienced “technical difficulties.” The strategy dates from the time when exchanges were organized around physical trading pits. Those were the bad old days, when there was little transparency and automation, and specialists and brokers could make money ripping off clients who didn’t have access to technology. Market makers act as liquidity providers, and they are an important part of a well-functioning market. Automated trading enables them to manage their orders efficiently and quickly, and helps to reduce risk.

So how do those high-profile screw-ups happen? They begin with human error (or, at least, poor judgment). Computerized trading systems can amplify these errors; it would be difficult for a person sending manual orders to simultaneously botch their markets in 148 different companies, as Knight did. But it’s nonsense to make the leap from one brokerage experiencing severe technical difficulties to claiming that automated market-making creates some sort of systemic risk. The way the market handled the Knight fiasco is how markets are supposed to function — stupidly priced orders came in, the market absorbed them, the U.S. Securities and Exchange Commission (SEC) and the exchanges adhered to their rules regarding which trades could be busted (ultimately letting most of the trades stand and resulting in a $440 million loss for Knight).

There are some aspects of HFT that are cause for concern. Certain strategies have exacerbated unfortunate feedback loops. The Flash Crash illustrated that an increase in volume doesn’t necessarily mean an increase in real liquidity. Nanex recently put together a graph (or a “horrifying GIF“) showing the sharply increasing number of quotes transmitted via automated systems across various exchanges. What it shows isn’t actual trades, but it does call attention to a problem called “quote spam.” Algorithms that employ this strategy generate a large number of buy and sell orders that are placed in the market and then are canceled almost instantly. They aren’t real liquidity; the machine placing them has no intention of getting a fill — it’s flooding the market with orders that competitor systems have to process. This activity leads to an increase in short-term volatility and higher trading costs.

The New York Times just ran an interesting article on HFT that included data on the average cost of trading one share of stock. From 2000 to 2010, it dropped from $.076 to $.035. Then it appears to have leveled off, and even increased slightly, to $.038 in 2012. If (as that data suggests) we’ve arrived at the point where the “market efficiency” benefit of HFT is outweighed by the risk of increased volatility or occasional instability, then regulators need to step in. The challenge is determining how to disincentivize destabilizing behavior without negatively impacting genuine liquidity providers. One possibility is to impose a financial transaction tax, possibly based on how long the order remains in the market or on the number of orders sent per second.

Rethinking regulation and market safeguards in light of new technology is absolutely appropriate. But the state of discourse in the mainstream press — mostly comprised of scare articles about “Wall Street’s terrifying robot invasion” — is unfortunate. Maligning computerized strategies because they are computerized is the wrong way to think about the future of our financial markets.

Photo: ABOVE by Lyfetime, on Flickr

Related:

August 14 2012

The complexity of designing for everywhere

In her new book The Mobile Frontier, author Rachel Hinman (@Hinman) says the mobile design space is a wide-open frontier, much like space exploration or the Wild West, where people have room to “explore and invent new and more human ways for people to interact with information.”

In the following interview, Hinman talks about the changing landscape of computing — GUIs becoming NUIs — and delves into the future of mobile and how designers and users alike will make the journey.

What is mobile’s biggest strength? What about it is creating a new frontier?

Rachel Hinman: Humans have two legs, making us inherently mobile beings. Yet for the last 50 years, we’ve all settled into a computing landscape that assumes a static context of use. Mobile’s biggest strength is that it maps to this inherent human characteristic to be mobile.

The static, PC computing landscape is known and understood. Mobile is a frontier because there’s still much we don’t understand and much yet to be discovered. There are lots of breakthroughs in mobile yet to come, making it an exciting place for those who can stomach the uncertainty and ambiguity to be.

You talk about “mobile context” in your book. What does that involve and why is it important?

Rachel Hinman: Most designers have been steeped in a tradition of creating experiences with few context considerations, though they may not realize it. Books, websites, software programs, and even menus for interactive televisions share an implicit and often overlooked commonality: use occurs in relatively static and predictable environments. In contrast, most mobile experiences are situated in highly dynamic and unpredictable environments. Issues of context tend to blindside most designers new to mobile.

Compelling mobile experiences share a common characteristic — they are empathetic to the constraints of the mobile context. Underneath all the hoopla that mobile folks make about the importance of context is the recognition of a skill that everyone interested in this medium must develop: both empathy and curiosity for the complexity of designing for everywhere. It’s not a skill most people grow overnight, but rather something we learn through trial and error. And like any skill, the learning never stops.

How do you see the ethics and privacy issues surrounding sensors playing out?

Rachel Hinman: I think what’s interesting about the privacy issue is that it’s not a technical or UX “problem” as much as it is a cultural/social/ethical issue. Information has always had value. Before the widespread acceptance of the Internet, we lived in a world where information wasn’t as accessible, and there was a much higher level of information symmetry. The Internet has changed that and is continuing to change that.

Now, people recognize their information has value and some (not all) are willing to exchange that information if they’ll receive value in return. I think experiences that get called out for privacy violation problems are often built by companies that don’t have a handle on how this issue is evolving, companies that are still living in a time when users’ relationships to information were less transparent. That kind of thoughtlessness just doesn’t fly anymore.

I interviewed Alex Rainert, the head of product for Foursquare, and a quote I distinctly remember related to this very issue. He said:

“It seems weird to think that in our lifetimes, we had computers in our homes that were not connected to a network, but I can vividly remember that. But that’s something my daughter will never experience. I think a similar change will happen with some of the information sharing questions that we have today.”

I think he’s right. I think it’s easy when we talk about privacy and information sharing to believe and act as if people’s sensibilities will forever be the way they are now. This childlike instinct has its charms, but it’s usually wrong and particularly dangerous for designers and people creating user experiences. People who think deeply about the built world necessarily must view these issues as fungible and ever evolving, not fixed.

I think Alex said it best in the interview when he said:

“I think the important thing to remember is that some problems are human problems. They’re problems a computer can’t solve. I’m definitely not one of those people who says stuff like, ‘We think phones will know what you want to do before you want to do it.’ I think there’s a real danger to over rely on the algorithm to solve human problems. I think it’s finding the right balance of how could you leverage the technology to help improve someone’s experience, but not expect that you’re going to be able to wholeheartedly hand everything over to a computer to solve.”

In your book, you talk about a paradigm shift. What is the Mobile NUI Paradigm and what sort of paradigm shift is underway?

Rachel Hinman: Paradigm shifts happen when enough people recognize that the underlying values, beliefs, and ideas any given paradigm supports should be changed or are no longer valid. While GUI, WYSIWYG, files, hierarchical storage, and the very metaphor of a desktop were brilliant inventions, they were created before the PC was ubiquitous, email was essentially universal, and the World Wide Web became commonplace. For all its strengths, the desktop paradigm is a static one, and the world is longing for a mobile paradigm. We’ve reached the edges of what GUI can do.

Just like the Apple Macintosh released in 1984 ushered in the age of the graphical user interface, Apple’s iPhone was a hero product that served as an indicator of the natural evolution of the next wave of user interfaces. The fast and steady uptake of mobile touchscreen devices in all shapes and sizes since 2007 indicates a fundamental change is afoot. A natural UI (NUI) evolution has started. GUIs will be supplanted by NUIs in the not-so-distant future.

While “NUI domination” may seem inevitable, today we are situated in a strange GUI/NUI chasm. While there are similarities and overlap between graphical user interfaces and natural user interfaces, there are obvious differences in the characteristics and design principles of each. What makes a GUI experience successful is very different from the attributes that make a NUI experience successful. This is where much of the design confusion comes into play for both designers of NUIs and users of NUIs. We’re still stuck in a valley between the two paradigms. Just as we look back today on the first GUIs with nostalgia for their simplicity, the NUI interfaces we see today in mobile devices and tablets are larval examples of what NUIs will grow to become. NUIs are still new — the design details and conventions are still being figured out.

Who is doing mobile right at this point? Who should we watch going forward?

Rachel Hinman: There are a couple categories of people who I think are doing mobile right. The first are those who have figured out how to create experiences that can shapeshift across multiple devices and contexts. They’re the folks who have figured out how to separate content from form and allow content to be a fluid design material. Examples include Flipboard, the Windows 8 platform, as well as content providers like the Scripps network, which handles interactive content for TV networks like FoodNetwork and HGTV.

Another category is novel mobile apps that are pushing boundaries of mobile user experience in interesting ways. I’m a big fan of Foursquare because it combines social networks with a user’s sense of place to deliver a unique mobile experience. An app called Clear is pushing the boundaries of gesture-based UIs in an interesting way. It’s nice to see voice coming into its own with features like Siri. Then there are apps like DrawSomething; it’s an experience that is so simple, yet so compelling — addictive even.

There’s also tons of interesting mobile work being done in places like Africa — in some ways even more cutting-edge than the U.S. or Europe because of the cultural implications of the work. Frontline SMS, Ushahidi and M-Pesa are shining examples of mobile technology that’s making a huge impact.

What does mobile look like in 10 years?

Rachel Hinman: I think the biggest change that will happen in the next 10 years is that we likely won’t even differentiate a computing experience as being “mobile.” Instead, we will assume all computing experiences are mobile. I predict in the not-so-distant future, we will reflect on the desktop computing experience much in the same way my parents reflect on punch card computing systems or telephones with long cords that used to hang on kitchen walls. What seems novel and new now will be the standard sooner than we can probably imagine.

This interview was edited and condensed.

The Mobile Frontier — This book looks at how invention in the mobile space requires casting off anchors and conventions and jumping head first into a new and unfamiliar design space.

Related:

August 13 2012

With new maps and apps, the case for open transit gets stronger

OpenTripPlanner logoEarlier this year, the news broke that Apple would be dropping default support for transit in iOS 6. For people (like me) who use the iPhone to check transit routes and times when they travel, that would mean losing a key feature. It also has the potential to decrease the demand for open transit data from cities, which has open government advocates like Clay Johnson concerned about public transportation and iOS 6.

This summer, New York City-based non-profit Open Plans launched a Kickstarter campaign to fund a new iPhone transit app to fill in the gap.

“From the public perspective, this campaign is about putting an important feature back on the iPhone,” wrote Kevin Webb, a principal at Open Plans, via email. “But for those of us in the open government community, this is about demonstrating why open data matters. There’s no reason why important civic infrastructure should get bound up in a fight between Apple and Google. And in communities with public GTFS, it won’t.”

Open Plans already had a head start in creating a patch for the problem: they’ve been working with transit agencies over the past few years to build OpenTripPlanner, an open source application that uses open transit data to help citizens make transit decisions.

“We were already working on the back-end to support this application but decided to pursue the app development when we heard about Apple’s plans with iOS,” explained Webb. “We were surprised by the public response around this issue (the tens of thousands who joined Walkscore’s petition and wanted to offer a constructive response).”

Crowdfunding digital city infrastructure?

That’s where Kickstarter and crowdfunding come into the picture. The Kickstarter campaign would help Open Plans make OpenTripPlanner a native iPhone app, followed by Android and HTML5 apps down the road. Open Plans’ developers have decided that given mobile browser limitations in iOS, particularly the speed of JavaScript apps, an HTML5 app isn’t a replacement for a native app.

Kickstarter has emerged as a platform for more than backing ideas for cool iPod watches or services. Increasingly, it’s looking like Kickstarter could be a new way for communities to collectively fund the creation of civic apps or services for their towns that government isn’t agile enough to deliver for them. While that’s sure to make some people in traditional positions of power uneasy, it also might be a way to do an end-around traditional procurement processes — contingent upon cities acting as platforms for civic startups to build upon.

“We get foundation and agency-based contract support for our work already,” wrote Webb. “However, we’ve discovered that foundations aren’t interested in these kinds of rider-facing tools, and most agencies don’t have the discretion or the budget to support the development of something universal. As a result, these kinds of projects require speculative investment. One of the awesome things about open data is that it lets folks respond directly and constructively by building something to solve a need, rather than waiting on others to fix it for them.

“Given our experience with transit and open data, we knew that this was a solvable problem; it just required someone to step up to the challenge. We were well positioned to take on that role. However, as a non-profit, we don’t have unlimited resources, so we’d ask for help. Kickstarter seems like the right fit, given the widespread public interest in the problem, and an interesting way to get the message out about our perspective. Not only do we get to raise a little money, but we’re also sharing the story about why open data and open source matter for public infrastructure with a new audience.”

Civic code in active re-use

Webb, who has previously staked out a position that iOS 6 will promote innovation in public transit, says that OpenTripPlanner is already a thriving open source project, with a recent open transit launch in New Orleans, a refresh in Portland and other betas soon to come.

In a welcome development for DC cyclists (including this writer), a version of OpenTripPlanner went live recently at BikePlanner.org. The web app, which notably uses OpenStreetMap as a base layer, lets users either plot a course for their own bike or tap into the Capital Bikeshare network in DC. BikePlanner is a responsive HTML5 app, which means that it looks good and works well on a laptop, iPad, iPhone or Android device.

Focusing on just open transit apps, however, would be to miss the larger picture of new opportunities to build improvements to digital city infrastructure.

There’s a lot more at stake than just rider-facing tools, in Webb’s view — from urban accessibility to extending the GTFS data ecosystem.

“There’s a real need to build a national (and eventually international) transit data infrastructure,” said Webb. “Right now, the USDOT has completely fallen down on the job. The GTFS support we see today is entirely organic, and there’s no clear guidance anywhere about making data public or even creating GTFS in the first place. That means building universal apps takes a lot of effort just wrangling data.”

August 09 2012

The risks and rewards of a health data commons

As I wrote earlier this year in an ebook on data for the public good, while the idea of data as a currency is still in its infancy, it’s important to think about where the future is taking us and our personal data.

If the Obama administration’s smart disclosure initiatives gather steam, more citizens will be able to do more than think about personal data: they’ll be able to access their financial, health, education, or energy data. In the U.S. federal government, the Blue Button initiative, which initially enabled veterans to download personal health data, is now spreading to all federal employees, and it also earned adoption at private institutions like Aetna and Kaiser Permanente. Putting health data to work stands to benefit hundreds of millions of people. The Locker Project, which provides people with the ability to move and store personal data, is another approach to watch.

The promise of more access to personal data, however, is balanced by accompanying risks. Smartphones, tablets, and flash drives, after all, are lost or stolen every day. Given the potential of mhealth, and big data and health care information technology, researchers and policy makers alike are moving forward with their applications. As they do so, conversations and rulemaking about health care privacy will need to take into account not just data collection or retention but context and use.

Put simply, businesses must confront the ethical issues tied to massive aggregation and data analysis. Given that context, Fred Trotter’s post on who owns health data is a crucial read. As Fred highlights, the real issue is not ownership, per se, but “What rights do patients have regarding health care data that refers to them?”

Would, for instance, those rights include the ability to donate personal data to a data commons, much in the same way organs are donated now for research? That question isn’t exactly hypothetical, as the following interview with John Wilbanks highlights.

Wilbanks, a senior fellow at the Kauffman Foundation and director of the Consent to Research Project, has been an advocate for open data and open access for years, including a stint at Creative Commons; a fellowship at the World Wide Web Consortium; and experience in the academic, business, and legislative worlds. Wilbanks will be speaking at the Strata Rx Conference in October.

Our interview, lightly edited for content and clarity, follows.

Where did you start your career? Where has it taken you?

John WilbanksJohn Wilbanks: I got into all of this, in many ways, because I studied philosophy 20 years ago. What I studied inside of philosophy was semantics. In the ’90s, that was actually sort of pointless because there wasn’t much semantic stuff happening computationally.

In the late ’90s, I started playing around with biotech data, mainly because I was dating a biologist. I was sort of shocked at how the data was being represented. It wasn’t being represented in a way that was very semantic, in my opinion. I started a software company and we ran that for a while, [and then] sold it during the crash.

I went to the Worldwide Web Consortium, where I spent a year helping start their Semantic Web for Life Sciences project. While I was there, Creative Commons (CC) asked me to come and start their science project because I had known a lot of those guys. When I started my company, I was at the Berkman Center at Harvard Law School, and that’s where Creative Commons emerged from, so I knew the people. I knew the policy and I had gone off and had this bioinformatics software adventure.

I spent most of the last eight years at CC working on trying to build different commons in science. We looked at open access to scientific literature, which is probably where we had the most success because that’s copyright-centric. We looked at patents. We looked at physical laboratory materials, like stem cells in mice. We looked at different legal regimes to share those things. And we looked at data. We looked at both the technology aspects and legal aspects of sharing data and making it useful.

A couple of times over those years, we almost pivoted from science to health because science is so institutional that it’s really hard for any of the individual players to create sharing systems. It’s not like software, where anyone with a PC and an Internet connection can contribute to free software, or Flickr, where anybody with a digital camera can license something under CC. Most scientists are actually restricted by their institutions. They can’t share, even if they want to.

Health kept being interesting because it was the individual patients who had a motivation to actually create something different than the system did. At the same time, we were watching and seeing the capacity of individuals to capture data about themselves exploding. So, at the same time that the capacity of the system to capture data about you exploded, your own capacity to capture data exploded.

That, to me, started taking on some of the interesting contours that make Creative Commons successful, which was that you didn’t need a large number of people. You didn’t need a very large percentage of Wikipedia users to create Wikipedia. You didn’t need a large percentage of free software users to create free software. If this capacity to generate data about your health was exploding, you didn’t need a very large percentage of those people to create an awesome data resource: you needed to create the legal and technical systems for the people who did choose to share to make that sharing useful.

Since Creative Commons is really a copyright-centric organization, I left because the power on which you’re going to build a commons of health data is going to be privacy power, not copyright power. What I do now is work on informed consent, which is the legal system you need to work with instead of copyright licenses, as well as the technologies that then store, clean, and forward user-generated data to computational health and computational disease research.

What are the major barriers to people being able to donate their data in the same way they might donate their organs?

John Wilbanks: Right now, it looks an awful lot like getting onto the Internet before there was the web. The big ISPs kind of dominated the early adopters of computer technologies. You had AOL. You had CompuServe. You had Prodigy. And they didn’t communicate with each other. You couldn’t send email from AOL to CompuServe.

What you have now depends on the kind of data. If the data that interests you is your genotype, you’re probably a 23andMe customer and you’ve got a bunch of your data at 23andMe. If you are the kind of person who has a chronic illness and likes to share information about that illness, you’re probably a customer at PatientsLikeMe. But those two systems don’t interoperate. You can’t send data from one to the other very effectively or really at all.

On top of that, the system has data about you. Your insurance company has your billing records. Your physician has your medical records. Your pharmacy has your pharmacy records. And if you do quantified self, you’ve got your own set of data streams. You’ve got your Fitbit, the data coming off of your smartphone, and your meal data.

Almost all of these are basically populating different silos. In some cases, you have the right to download certain pieces of the data. For the most part, you don’t. It’s really hard for you, as an individual, to build your own, multidimensional picture of your data, whereas it’s actually fairly easy for all of those companies to sell your data to one another. There’s not a lot of technology that lets you share.

What are some of the early signals we’re seeing about data usage moving into actual regulatory language?

John Wilbanks: The regulatory language actually makes it fairly hard to do contextual privacy waiving, in a Creative Commons sense. It’s hard to do granular permissions around privacy in the way you can do granular conditional copyright grants because you don’t have intellectual property. The only legal tool you have is a contract, and the contracts don’t have a lot of teeth.

It’s pretty hard to do anything beyond a gift. It’s more like organ donation, where you don’t get to decide where the organs go. What I’m working on is basically a donation, not a conditional gift. The regulatory environment makes it quite hard to do anything besides that.

There was a public comment period that just finished. It’s an announcement of proposed rulemaking on what’s called the Common Rule, which is the Department of Health and Human Services privacy language. It was looking to re-examine the rules around letting de-identified data or anonymized data out for widespread use. They got a bunch of comments.

There’s controversy as to how de-identified data can actually be and still be useful. There is going to be, probably, a three-to-five year process where they rewrite the Common Rule and it’ll be more modern. No one knows how modern, but it will be at least more modern when that finishes.

Then there’s another piece in the US — HIPAA — which creates a totally separate regime. In some ways, it is the same as the Common Rule, but not always. I don’t think that’s going to get opened up. The way HIPAA works is that they have 17 direct identifiers that are labeled as identifying information. If you strip those out, it’s considered de-identified.

There’s an 18th bucket, which is anything else that can reasonably identify people. It’s really hard to hit. Right now, your genome is not considered to fall under that. I would be willing to bet within a year or two, it will be.

From a regulatory perspective, you’ve got these overlapping regimes that don’t quite fit and both of them are moving targets. That creates a lot of uncertainty from an investment perspective or from an analytics perspective.

How are you thinking about a “health data commons,” in terms of weighing potential risks against potential social good?

John Wilbanks: I think that that’s a personal judgment as to the risk-benefit decision. Part of the difficulty is that the regulations are very syntactic — “This is what re-identification is” — whereas the concept of harm, benefit, or risk is actually something that’s deeply personal. If you are sick, if you have cancer or a rare disease, you have a very different idea of what risk is compared to somebody who thinks of him or herself as healthy.

What we see — and this is born out in the Framingham Heart Study and all sorts of other longitudinal surveys — is that people’s attitudes toward risk and benefit change depending on their circumstances. Their own context really affects what they think is risky and what they think isn’t risky.

I believe that the early data donors are likely to be people for whom there isn’t a lot of risk perceived because the health system already knows that they’re sick. The health system is already denying them coverage, denying their requests for PET scans, denying their requests for access to care. That’s based on actuarial tables, not on their personal data. It’s based on their medical history.

If you’re in that group of people, then the perceived risk is actually pretty low compared to the idea that your data might actually get used or to the idea that you’re no longer passive. Even if it’s just a donation, you’re doing something outside of the system that’s accelerating the odds of getting something discovered. I think that’s the natural group.

If you think back to the numbers of users who are required to create free software or Wikipedia, to create a cultural commons, a very low percentage is needed to create a useful resource.

Depending on who you talk to, somewhere between 5-10% of all Americans either have a rare disease, have it in their first order family, or have a friend with a rare disease. Each individual disease might not have very many people suffering from it, but if you net them all up, it’s a lot of people. Getting several hundred thousand to a few million people enrolled is not an outrageous idea.

When you look at the existing examples of where such commons have come together, what have been the most important concrete positive outcomes for society?

John Wilbanks: I don’t think we have really even started to see them because most people don’t have computable data about themselves. Most people, if they have any data about themselves, have scans of their medical records.
What we really know is that there’s an opportunity cost to not trying, which is that the existing system is really inefficient, very bad at discovering drugs, and very bad at getting those drugs to market in a timely basis.

That’s one of the reasons we’re doing this is as an experiment. We would like to see exactly how effective big computational approaches are on health data. The problem is that there are two ways to get there.

One is through a set of monopoly companies coming together and working together. That’s how semiconductors work. The other is through an open network approach. There’s not a lot of evidence that things besides these two approaches work. Government intervention is probably not going to work.

Obviously, I come down on the open network side. But there’s an implicit belief, I think, both in the people who are pushing the cooperating monopolies approach and the people who are pushing the open networks approach, that there’s enormous power in the big-data-driven approach. We’re just leaving that on the table right now by not having enough data aggregated.

The benefits to health that will come out will be the ability to increasingly, by looking at a multidimensional picture of a person, predict with some confidence whether or not a drug will work, or whether they’re going to get sick, or how sick they’re going to get, or what lifestyle changes they can make to mitigate an illness. Right now, basically, we really don’t know very much.

Pretty Simple Data Privacy

John Wilbanks discussed “Pretty Simple Data Privacy” during a Strata Online Conference in January 2012. His presentation begins at the 7:18 mark in the following video:

Strata Rx — Strata Rx, being held Oct. 16-17 in San Francisco, is the first conference to bring data science to the urgent issues confronting health care.

Save 20% on registration with the code RADAR20

Photo: Science Commons

August 07 2012

Neue Website „Terms of Service; Didn’t Read” verspricht Überblick im AGB-Dickicht

Wer Webdienste nutzt, kommt nicht umhin, seitenlangen Nutzungsbedingungen – also Verträgen – zuzustimmen, die auch bei iRights.info immer mal wieder Thema waren (Musikdienste, FotodiensteFilme und E-Books). Amerikanische Wissenschaftler haben einmal berechnet, dass der durchschnittliche (US-)Nutzer immerhin 76 Arbeitstage im Jahr damit verbringen müsste, um die Nutzungsbedingungen der von ihm genutzten Webdienste zu lesen.

Weil das niemand macht, gab es schon einige Versuche, solche Verträge nutzerfreundlicher zu gestalten, zum Beispiel mit Kurzversionen ähnlich denen von Creative-Commons-Lizenzen. Die Website „Terms of Service; Didn’t Read” der Initiative Unhosted sammelt solche Klauseln nun für verschiedene Dienste und informiert mit einem Farbsystem über den Umgang mit persönlichen Daten, eingeräumten Nutzungsrechten, Transparenz über Anfragen von Behörden und mehr.

Im Moment sind noch wenige Dienste insgesamt bewertet, schlecht kommt schon jetzt Twitpic weg, gut dagegen zum Beispiel Soundcloud. Das Projekt richtet sich vor allem an amerikanische Nutzer und lässt damit die Frage außen vor, ob die bewerteten Klauseln nach deutschem Recht überhaupt zulässig sind. Und natürlich hängt jede Bewertung auch von den zugrunde gelegten Kriterien (und den Bewertenden) ab. Für einen ersten Überblick ist das Projekt aber auch hierzulande hilfreich. Mitmachen kann man momentan über eine Mailingliste und einen Chat, später soll es geöffnet werden.

[via]

August 02 2012

On co-creation, contests and crowdsourcing

I had decided to update the branding at one of my companies, and that meant re-thinking my logo.

Here’s the old logo:

Original Middleband Group logo

The creative exercise started with a logo design contest posting at 99designs, an online marketplace for crowdsourced graphic design.

When it was all done, I had been enveloped by an epic wave of 200 designs from 38 different designers.

It was a flash mob, a virtual meetup constructed for the express purpose of creating a new logo. The system itself was relatively lean, providing just enough “framing” to facilitate rapid iteration, where lots of derivative ideas could be presented, shaped and then re-shaped again.

The bottom line is that based on the primary goal of designing a new logo, I can say without hesitation that the model works.

Not only did the end product manifest as I hoped it would (see below), but the goodness of real-time engagement was intensely stimulating and richly illuminating. At one point, I was maintaining 10 separate conversations with designers spread across the Americas, Asia and Europe. Talk about parallelizing the creative process.

In the end, the project yielded eight worthy logo designs and not one but two contest winners! It was the creative equivalent of a Chakra experience: cathartic, artistic and outcome-driven at the same time.

Co-creation, crowdsourcing and the Maker movement

Part of my draw to try out this crowdsourced model is that I consider myself a Maker and am a serious devotee of co-creation types of projects, where the line between creator, consumer, customer and service provider is inherently gray.

Why do I like this model? Because it facilitates a rich exchange of ideas and skill sets, and is highly collaborative. It’s part of the larger trend of melding of online, offline, events and exchanges into new types of value chains.

It’s a bucket that includes Kickstarter (funding platform for creative projects), Foo Camp (the wiki of conferences), Maker Faire (festival and celebration of the Maker movement) and X PRIZE (radical breakthroughs through contests), to name a few.

Plus, there’s an authenticity to that which is grass roots — that which opens a new economic domain for direct-to-consumer connections, a new modality for handcrafted, and customized offerings, even more so in a world that is tuned for mass-production.

One only has to scan the project listings at Kickstarter or the exhibitor lists at Maker Faire to see the catalytic role this wave is playing for robot makers, artisan bakers, knitted goods purveyors, sculptors, app makers, device builders and do-it-yourself kit creators. In times of stagnant economic growth, it is heartening to see how much leverage there is when you can integrate discovery, engagement, personalization and monetization, as this model does.

It’s the yin to the yang of homogenization, and as such, has promise to ignite real, durable growth across many different market segments in the years ahead.

The good, bad and ugly of crowdsourced design

With crowdsourced design, I experienced two primary pitfalls and one indirect one.

The two primary ones were:

  1. You run the risk that a designer is modifying someone else’s design. In fact, one of the designers of the 38 who submitted designs got kicked out of the competition for just that reason (i.e., non-original work).
  2. Since it’s an all-or-nothing outcome for the participants, some of the designers will diss each other, which led one designer to pull a design that I actually liked.

The indirect pitfall was the cost dynamic. Namely, given the low cost, a lot of the designers are outside the U.S., which means you could be losing out on senior, higher-dollar U.S. designers, unless you materially up the award that you want to commit to (99designs gives you tools so you can guarantee winners, increase award levels, etc.).

That stated, it’s the 80/20 rule in action: 80% of the designs that captivated me the most came from 20% of the designers. Because of the competitive nature of the format, the back-and-forth process was highly iterative.

Choosing a logo (or two …)

Meanwhile, as we got to the last hours of my logo design project, I faced a dilemma.

When I got down to the final 4-5 candidates, there were two designs that really got under my skin, each from a different designer.

Plus, as Middleband is my “umbrella” company through which a bunch of my different ventures get seeded (before being spun off as separate entities), I could see a scenario where having a second logo path in hand would be a great option to have.

Now, the cool thing about a model like 99designs is that I could affordably acquire two designs (the cost was an incremental $245 to award a second contest winner), and it was push-button easy for all parties.

So that’s what I did. Here are the two winners:

Middleband Group winning logos

Related:

July 31 2012

On email privacy, Twitter’s ToS and owning your own platform

The existential challenge for the Internet and society remains is that the technology platforms constitute what many people regard as the new public square are owned by private companies. If you missed the news, Guy Adams, a journalist at the Independent newspaper in England, was suspended by Twitter after he tweeted the corporate email address of a NBC executive, Gary Zenkel. Zenkel is in charge of NBC’s Olympics coverage.

Like many other observers, I assumed that NBC had seen the tweet and filed an objection with Twitter about the email address being tweeted. The email address, after all, was shared with the exhortation to Adams’ followers to write to Zenkel about frustrations with NBC’s coverage of the Olympics, a number of which Jim Stogdill memorably expressed here at Radar and Heidi Moore compared to Wall Street’s hubris.

Today, Guy Adams published two more columns. The first shared his correspondence with Twitter, including a copy of a written statement from an NBC spokesman called Christopher McCloskey that indicated that NBC’s social media department was alerted to Adams’ tweet by Twittersecond column, which followed the @GuyAdams account being reinstated, indicated that NBC had withdrawn their original complaint. Adams tweeted the statement: “we have just received an update from the complainant retracting their original request. Therefore your account has been unsuspended.”

Since the account is back up, is the case over? A tempest in a Twitter teapot? Well, not so much. I see at least three different important issues here related to electronic privacy, Twitter’s terms of service, censorship and how many people think about social media and the Web.

Is a corporate email address private?

Washington Post media critic Erik Wemple is at a loss to explain how tweeting this corporate email address qualifies public rises to the level of disclosing private information.

Can a corporate email address based upon a known nomenclature used by tens of thousands of people “private?” A 2010 Supreme Court ruling on privacy established that electronic messages sent on a corporate server are not private, at least from the employer. But a corporate email address itself? Hmm. Yes, the corporate email address Adams tweeted was available online prior to the tweet if you knew how to find it in a Web search. Danny Sullivan, however, made a strong case that the email address wasn’t widely available in Google, although Adams said he was able to find it in under a minute. There’s also an argument that because an address can be guessed, it is public. Jeff Jarvis and other journalists are saying it isn’t, using the logic that because NBC’s email nomenclature is standardized, it can be easily deduced. I “co-signed” Reuters’ Jack Shafer’s tweet making that assertion.

The question to ask privacy experts, then, is whether a corporate email address is “private” or not.

Fred Cate, a law professor at the Indiana University Maurer School of Law, however, commented via email that “a corporate email address can be private, in the sense that a company protects it and has a legitimate interest in it not being disclosed.” Can it lost its private character due to unauthorized disclosure online? “The answer is probably and regrettably ‘it depends,’” he wrote. “It depends on the breadth of the unauthorized dissemination and the sensitivity of the information and the likely harm if more widely disclosed. An email address that has been disclosed in public blogs would seem fairly widely available, the information is hardly sensitive, and any harm can be avoided by changing the address, so the argument for privacy seems pretty weak to me.”

Danielle Citron, professor of law at the University of Maryland, argues that because Zenkel did not publish his corporate email address on NBC’s site, there’s an argument, though a weak one, that its corporate email addresses are private information only disclosed to a select audience.

“Under privacy tort common law, an unpublished home address has been deemed by courts to be private for purposes of public disclosure of private fact tort if the publication appeared online, even though many people know the address offline,” wrote Citon in an email. “This arose in a cyber harassment case involving privacy torts. Privacy is not a binary concept, that is, one can have privacy in public, at least according to Nader v. GM, the NY [Court of Appeals] found that GM’s zealous surveillance of Ralph Nader, including looking over his shoulder while he took out money from the bank, constituted intrusion of his seclusion, even though he was in public. Now, the court did not find surveillance itself a privacy violation. It was the fact that the surveillance yielded information Nader would have thought no one could see, that is, how much he took out of the bank machine.”

Email is, however, a different case that home addresses, as Citron allowed. “Far less people know one’s home address — neighbors and friends — if a home address is unlisted whereas email addresses are shared with countless people and there is no analogous means to keep it unpublished like home and phone addresses,” Citron wrote. “These qualities may indeed make it a tough sell to suggest that the email address is private.”

Perhaps ironically, the NBC executive’s email address has now been published by many major media outlets and blogs, making it one of the most public email addresses on the planet. Hello, Streisand effect.

Did Twitter break its own Terms of Service?

Specifically, was tweeting someone’s publicly available *work* email address (available online) a a violation of the Twitter’s rules. To a large extent, this hinges upon the answer to the first issue, of privacy.

If a given email address is already public — and it’s been available online for over a year, one line of thinking goes that it can’t be private. Twitter’s position is that it considers a corporate email address to be private and that sharing it therefore breaks the ToS. Alex McGillivray, Twitter’s general counsel, clarified the company’s approach to trust and safety in a post on Twitter’s blog:

We’ve seen a lot of commentary about whether we should have considered a corporate email address to be private information. There are many individuals who may use their work email address for a variety of personal reasons — and they may not. Our Trust and Safety team does not have insight into the use of every user’s email address, and we need a policy that we can implement across all of our users in every instance.

“I do not think privacy can be defined for third parties by terms of service,” wrote Cate, via email. “If Twitter wants to say that the company will treat its users’ email addresses as private it’s fine, but I don’t think it can convincingly say that  other email addresses available in public are suddenly private.”

“If the corporate email was published online previously by the company or by himself, it likely would not amount to public disclosure of private fact under tort law and likely would not meet the strict terms of the TOS, which says nonpublic. Twitter’s policy about email address stems from its judgment that people should not use its service to publicize non-public email addresses, even though such an address is not a secret and countless people in communication with the person know it,” wrote Citon. “Unless Twitter says explicitly, ‘we are adopting this rule for privacy reasons,’ there are reasons that have nothing to do with privacy that might animate that decision, such as preventing fraud.”

The bottom line is that Twitter is a private company with a Terms of Service. It’s not a public utility, as Dave Winer highlighted yesterday, following up today with another argument for a distributed, open system for microblogging. Simply put, there *are* principles for use of Twitter’s platform. They’re in the Rules, Terms of Service and strictures around its API, the evolution of which was recently walked through over at the real-time report.

Ultimately, private companies are bound by the regulations of the FTC or FCC or other relevant regulatory bodies, along with their own rules, not the wishes of users. If Twitter’s users don’t like them or lose trust, their option is to stop using the service or complain loudly. I certainly agree with Jillian C. York, who argues at the EFF that the Guy Adams case demonstrates that Twitter needs a more robust appeals process.

There’s also the question about how the ToS is applied to celebrities on Twitter, who are an attraction for millions of users. In the past, Justin Bieber tweeted someone else’s personal phone number. Spike Lee tweeted a home address, causing someone to receive death threats in Florida. Neither was suspended. Neither the celebrities nor offenders referenced, according to personal accounts, were suspended. In one case, @QueenOfSpain had to get a court order to see any action taken on death threats on Twitter. Twitter’s Safety team has absolutely taken actions in some cases but it certainly might look like there’s a different standard here. The question to ask is whether tickets were filed for Lee or Bieber by the person who was personally affected. Without a ticket, there would be no suspension. Twitter has not commented on that count, under their policy of not commenting about individual users.

Own your own platform

In the wake of this move, there should be some careful consideration by journalists who use Twitter about where and how they do it. McGillivray did explain where Twitter went awry, confirming that someone on the media partnership side of the house flagged a tweet to NBC and reaffirming the principle that Twitter does not remove content on demand:

…we want to apologize for the part of this story that we did mess up. The team working closely with NBC around our Olympics partnership did proactively identify a Tweet that was in violation of the Twitter Rules and encouraged them to file a support ticket with our Trust and Safety team to report the violation, as has now been reported publicly.

Our Trust and Safety team did not know that part of the story and acted on the report as they would any other.

As I stated earlier, we do not proactively report or remove content on behalf of other users no matter who they are. This behavior is not acceptable and undermines the trust our users have in us. We should not and cannot be in the business of proactively monitoring and flagging content, no matter who the user is — whether a business partner, celebrity or friend. As of earlier today, the account has been unsuspended, and we will actively work to ensure this does not happen again.

As I’ve written elsewhere, looking at Twitter, censorship and Internet freedom, my sense is that, of all of the major social media players, Twitter has been one of the leaders in the technology community for sticking up for its users. It’s taken some notable stands, particularly with respect to the matter of fighting to make Twitter subpoena from the U.S. Justice Department regarding user data public.

“Twitter is so hands off, only stepping in to ban people in really narrow circumstances like impersonation and tweeting personal information like non-public email addresses. It also bans impersonation and harassment understood VERY NARROWLY, as credible threats of imminent physical harm,” wrote Citron.  ”That is Twitter’s choice. By my lights, and from conversations with their safety folks, they are very deferential to speech. Indeed, their whole policy is a “we are a speech platform,” implying that what transpires there is public speech and hence subject to great latitude.” 

Much of the good will Twitter had built up, however, may have evaporated after this week. My perspective is that this episode absolutely drives home (again) the need to own your own platform online, particularly for media entities and government. While there is clearly enormous utility in “going where the people are” online to participate in conversations, share news and listen to learn what’s happening, that activity doesn’t come without strings or terms of service.

To be clear, I don’t plan on leaving Twitter any time soon. I do think that McGillivray’s explanation highlights the need for the company to get its internal house in order, in terms of a church and state relationship between its policy and safety team, which makes suspension decisions, and its media partnerships team, which works with parties that might be aggrieved by what Twitter users are tweeting. If Twitter becomes a media company, a future that this NBC Olympics deal suggests, such distinctions could be just as important for it as the “church and state” relationship between traditional newspaper companies or broadcasters.

While that does mean that a media organization could be censored by a distributed denial of service (DDoS) attack (a tactic used in Russia) and that it must get a domain name, set up Web hosting and a content management system, the barrier to entry on all three counts has radically fallen.

The open Internet and World Wide Web, fragile and insecure as they may seem at times, remain the surest way to publish what you want and have it remain online, accessible to the networked world. When you own your own platform online, it’s much harder for a third party company nervous about the reaction of advertisers or media partners to take your voice away.

Discovering science

The discovery of the Higgs boson gave us a window into the way science works. We’re over the hype and the high expectations kindled by last year’s pre-announcement. We’ve seen the moving personal interest story about Peter Higgs and how this discovery validates predictions he made almost 50 years ago, and which ones weren’t at the time thought “relevant.” Now we have an opportunity to do something more: to take a look at how science works and see what it is made of.

Discovery

Higgs boson image via Wikimedia CommonsHiggs boson image via Wikimedia CommonsFirst and foremost: Science is about discovery. While the Higgs boson was the last piece in the puzzle for the Standard Model, the search for the Higgs wasn’t ultimately about verifying the Standard Model. It has predicted a lot of things successfully; it’s pointless to say that it hasn’t served us well. A couple of years ago, I asked some physicists what would happen if they didn’t find the Higgs, and the answer was uniformly: “That would be the coolest thing ever! We’d have to develop a new understanding of how particle physics works.” At the time, I pointed out that not finding the Higgs might be exciting to physicists, but it would certainly be disastrous for the funding of high-energy physics projects. (“What? We spent all this money to build you a machine to find this particle, and now you say that particle doesn’t even exist?”) But science must move forward, and the desire to rebuild quantum mechanics trumps funding.

Now that we have the Higgs (or something like it), physicists are hoping for a “strange” Higgs: a particle that differs from the Higgs predicted by the Standard model in some ways, a particle that requires a new theory. Indeed, to Nobel laureate Steven Weinberg, a Higgs that is exactly the Higgs predicted by the Standard Model would be a “nightmare.” Discovering something that’s more or less exactly what was predicted isn’t fun, and it isn’t interesting. And furthermore, there are other hints that there’s a lot of work to be done: dark matter and dark energy certainly hint at a physics that doesn’t fit into our current understanding. One irony of the Higgs is that, even if it’s “strange,” it focused too much attention on big, expensive science, to the detriment of valuable, though less dramatic (and less expensive) work.

Science is never so wrong as when it thinks that almost all the questions have been answered. In the late 19th century, scientists thought that physics was just about finished: all that was left were nagging questions about why an oven doesn’t incinerate us with an infinite blast of energy and some weird behavior when you shine ultraviolet light onto electrodes. Solving the pressing problem of black body radiation and the photoelectric effect required the the idea of energy quanta, which led to all of 20th century physics. (Planck’s first steps toward quantum mechanics and Einstein’s work on the photoelectric effect earned them Nobel Prizes.) Science is not about agreement on settled fact; it’s about pushing into the unknown and about the intellectual ferment and discussion that takes place when you’re exploring new territory.

Approximation, not law

Second: Science is about increasingly accurate approximations to the way nature works. Newton’s laws of motion, force equals mass times acceleration and all that, served us well for hundreds of years, until Einstein developed special relativity. Now, here’s the trick: Newtonian physics is perfectly adequate for anything you or I are likely to do in our lifetimes, unless SpaceX develops some form of interstellar space travel. However, relativistic effects are observable, even on Earth: clocks run slightly slower in airliners and slightly faster on the tops of mountains. These effects aren’t measurable with your wristwatch, but they are measurable (and have been measured, with precisely the results predicted by Einstein) with atomic clocks. So, do we say Newtonian physics is “wrong”? It’s good enough, and any physicist would be shocked at a science curriculum that didn’t include Newtonian physics. But neither can we say that Newtonian physics is “right,” if “right” means anything more than “good enough.” Relativity implies a significantly different conception of how the universe works. I’d argue that it’s not just a better approximation, it’s a different (and more accurate) world view. This shift in world view as we go from Newton to Einstein is, to me, much more important than the slightly more accurate answers we get from relativity.

What do “right” and “wrong” mean in this context? Those terms are only marginally useful. And the notion of physical “law” is even less useful. “Laws” are really semantic constructs, and the appearance of the words “physical law” usually signify that someone has missed the point. I cringe when I hear people talk about the “law of gravity.” Because there’s no such thing; Newtonian gravity was replaced by Einsteinian general relativity (both a better approximation and a radically different view of how the universe works), and there are plenty of reasons to believe that general relativity isn’t the end of the story. The bottom line is that we don’t really know what gravity is or how it works, and all we really know about gravity is that our limited Einsteinian understanding probably doesn’t work for really small things and might not work for really big things. There are very good reasons to believe that gravity waves exist (and we’re building the LIGO gravitational interferometer to detect them), but right now, they’re in the same category the Higgs boson was a decade ago. In theory, they should exist, and the universe will get a whole lot more interesting if we don’t find them. So, the only “law of gravity” we understand now is an approximation, and we have no idea what it approximates. And when we find a better approximation (one that explains dark energy, perhaps, or one that shows how gravity worked at the time of the Big Bang), that approximation will come with a significantly different world view.

Whatever we nominate as physical “law” is only law until we find a better law, a better approximation with its own story. Theories are replaced by better theories, which in turn are replaced by better theories. If we actually found a completely accurate “theory of everything” in any discipline, that might be the ultimate success, but it would also be tragic; it would be the end of science if it didn’t raise any further questions.

Simplicity and aesthetics

Aesthetics is a recurring principle both in the sciences (particularly physics) and in mathematics. It’s a particular kind of minimalist aesthetics: all things being equal, science prefers the explanation that makes the fewest assumptions. Gothic or rococo architecture doesn’t fit in. This principle has long been known as Occam’s Razor, and it’s worth being precise about what it means. We often hear about the “simplest explanation,” but merely being simple doesn’t make an explanation helpful. There are plenty of simple explanations. “The Earth sits on the back of four elephants, which stand on a turtle” is simple, but it makes lots of assumptions: the turtle must stand on something (“it’s turtles all the way down”), and the elephants and turtles need to eat. If we’re to accept the elephant theory, we have to assume that there are answers to these questions, otherwise they’re merely unexamined assumptions. Occam’s Razor is not about simplicity, but about assumptions. A theory that makes fewer assumptions may not be simpler, but almost always gives a better picture of reality.

One problem in physics is the number of variables that just have to have the right value to make the universe work. Each of these variables is an assumption, in a sense: they are what they are; we can’t say why any more than we can explain why we live in a three-dimensional universe. Physicists would like to reduce the number to 1 or, even better, 0: the universe would be as irreducible as π and derivable from pure mathematics. I admit that I find this drive a bit perplexing. I would expect the universe to have a large number of constants that just happen to have the right values and can’t be derived either from first principles or from other constants, especially since many modern cosmological theories suggest that universes are being created constantly and only a small number “work.”

However, the driving principle here is that we won’t get anywhere in understanding the universe by saying “what’s the matter with complexity?” In practice, the drive to provide simpler, more compelling descriptions has driven scientific progress. Copernicus’ heliocentric model for the solar system wasn’t more accurate than the geocentric Ptolemaic system. It took Kepler and elliptical orbits to make a heliocentric universe genuinely better. But the Ptolemaic model required lots of tinkering to make it work right, to make the cycles and epicycles fit their observational data about planetary motion.

There are many things about the universe that current theory can’t explain. The positive charge of a proton happens to equal the negative charge of an electron, but there’s no theoretical reason for them to be equal. If they weren’t equal, chemistry would be profoundly different, and life might not be possible. But the anthropic principle (physics is the way it is because we can observe it, and we can’t observe a universe in which we can’t exist) is ultimately unsatisfying; it’s only a clever way of leaving assumptions unchallenged.

Ultimately, the excitement of science has to do with challenging your assumptions about how things work. That challenge lies behind all successful scientific theories: can you call everything into question and see what lies behind the surface? What passes for “common sense” is usually nothing more than unexamined assumptions. To my mind, one of the most radical insights comes from relativity: since it doesn’t matter where you put the origin of your coordinate system, you can put the origin on the Earth if you want. In that sense, the Ptolemaic solar system isn’t “wrong.” The mathematics is more complex, but it all works. So, have we made progress? Counter-intuitive as relativity may seem, in relativity Einstein makes nowhere near as many assumptions as Ptolemy and his followers: very little is assumed besides the constancy of the speed of light and the force of gravity. The drive for such radical simplicity, as a way of forcing us to look behind our “common sense,” is at the heart of science.

Verification and five nines

In the search for the Higgs, we’ve often heard about “five nines,” or a chance of roughly 1 in 100,000 that the result is in error. Earlier results were inconclusive because the level of confidence was only “two nines,” or roughly one in 100. What’s the big difference? One in 100 seems like an acceptably small chance of error.

I asked O’Reilly author and astrophysicist Alasdair Allan (@aallan) about this, and he had an illuminating explanation. There is nothing magical about five nines, or two nines for that matter. The significance is that, if a physical phenomenon is real, if something is actually happening, then you ought to be able to collect enough data to get five nines confidence. There’s nothing “wrong” with an experiment that only gives you two nines, but if it’s actually telling you something real, you should be able to push it to five nines (or six, or seven, if you have enough time and data collecting ability). So, we know that the acceleration due to gravity on the surface of the Earth is 32.2 feet per second per second. In a high school physics lab, you can verify this to about two nines (maybe more if high schools have more sophisticated equipment than they did in my day). With more sophisticated equipment, pushing the confidence level to five nines is a trivial exercise. That’s exactly what happened with the Higgs: the initial results had a confidence level of about two nines, but in the past year, scientists were able to collect more data and get the confidence level up to five nines.

Does the Higgs become “real” at that point? Well, if it is real at all, it was real all along. But what this says is that there’s an experimental result that we can have confidence in and that we can use as the foundation for future results. Notice that this result doesn’t definitively say that CERN has found a Higgs Boson, just that they’ve definitively found something that could be the Higgs (but that could prove to be something different).

Scientists are typically very careful about the results in their claims. Last year’s claims about “faster than light” neutrinos provide a great demonstration of how the scientific process works. The scientists who announced the result didn’t claim that they’d found neutrinos that traveled faster than light; they stated that they had a very strange result indicating that neutrinos traveled faster than light and wanted help from other scientists in understanding whether they had analyzed the data correctly. And even though many scientists were excited by the possibility that relativity would need to be re-thought, a serious effort was made to understand what the result could mean. Ultimately, of course, the researchers discovered that a cable had been attached incorrectly; when that problem was fixed, the anomalous results disappeared. So, we’re safe in a boring world: Neutrinos don’t travel faster than light, and theoretical physicists’ desire to rebuild relativity will have to wait.

While this looks like an embarrassment for science, it’s a great example of what happens when things go right. The scientific community went to work on several fronts: creating alternate theories (which have now all been discarded), exploring possible errors in the calculations (none were found), doing other experiments to measure the speed of neutrinos (no faster-than-light neutrinos were found), and looking for problems with the equipment itself (which they eventually found). Successful science is as much about mistakes and learning from them as it is about successes. And it’s not just neutrinos: Richard Muller, one of the most prominent skeptics on climate change, recently stated that examination of the evidence has convinced him that he was wrong, that “global warming was real … Human activity is almost entirely the cause.” It would be a mistake to view this merely as vindication for the scientists arguing for global warming. Good science needs skeptics; they force you to analyze the evidence carefully, and as in the neutrino case, prevent you from making serious errors. But good scientists also know how to change their minds when the evidence demands it.

If we’re going to understand how to take care of our world in the coming generations, we have to understand how science works. Science is being challenged at every turn: from evolution to climatology to health (Coke’s claim that there’s no connection between soft drinks and obesity, reminiscent of the tobacco industry’s attempts to discredit the link between lung cancer and smoking), we’re seeing a fairly fundamental attack on the basic tools of human understanding. You don’t have to look far to find claims that science is a big conspiracy, funded by whomever you choose to believe funds such conspiracies, or that something doesn’t need to be taken seriously because it’s just a “theory.”

Scientists are rarely in complete agreement, nor do they try to advance some secret agenda. They’re excited by the idea of tearing down their most cherished ideas, whether that’s relativity or the Standard Model. A Nobel Prize rarely awaits someone who confirms what everyone already suspects. But the absence of complete agreement doesn’t mean that there isn’t consensus, and that consensus needs to be taken seriously. Similarly, scientists are always questioning their data: both the data that supports their own conclusions and the data that doesn’t. I was disgusted by a Fox news clip implying that science was untrustworthy because scientists were questioning their theories. Of course they’re questioning their theories. That’s what scientists are supposed to do; that’s how science makes progress. But it doesn’t mean that those theories aren’t the most accurate models we have about how the world, and the universe itself, are put together. If we’re going to understand our world, and our impact on that world, we had better base our understanding on data and use the best models we have.

Higgs boson image via Wikimedia Commons.

July 26 2012

Esther Dyson on health data, “preemptive healthcare” and the next big thing

If we look ahead to the next decade, it’s worth wondering whether the way we think about health and healthcare will have shifted. Will healthcare technology be a panacea? Will it drive even higher costs, creating a broader divide between digital haves and have-nots? Will opening health data empower patients or empower companies?

As ever, there will be good outcomes and bad outcomes, and not just in the medical sense. There’s a great deal of foment around the potential for mobile applications right now, from the FDA’s potential decision to regulate them to a reported high abandonment rate. There are also significant questions about privacy, patient empowerment and meaningful use of electronic healthcare records.

When I’ve talked to US CTO Todd Park or Dr. Farzad Mostashari they’ve been excited about the prospect for health data to fuel better dashboards and algorithms to give frontline caregivers access to critical information about people they’re looking after, providing critical insight at the point of contact.

Kathleen Sebelius, the U.S. Secretary for Health and Human Services, said at this year’s Health Datapalooza that venture capital investment in the Healthcare IT area is up 60 percent since 2009.

Given that context, I was more than a little curious to hear what Esther Dyson (@edyson) is thinking about when she looks at the intersection of healthcare, data and information technology.

"yes, but the sharks must love it!""yes, but the sharks must love it!"

[Photo Credit: Rick Smolan, via Esther Dyson]

Dyson, who started her career as a journalist, is now an angel investor and philanthropist. Dyson is a strong supporter of “preemptive healthcare” – and she’s putting her money where her interest lies, with her investments. She’ll be speaking at the StrataRX conference this October in San Francisco.

Our interview, which was lightly edited for content and clarity, follows.

How do you see healthcare changing?

Dyson: There’s multiple perspectives. The one I’ve got does not invalidate others, nor it is intended to any of trump the others, but it’s the one that I focus on — and that’s really “health” as opposed to “healthcare.”

If you maintain good health, you can avoid healthcare. That’s one of those great and unrealizable goals, but it’s realizable in part. Any healthcare you can avoid because you’re healthy is valuable.

What I’m mostly focused on is trying to change people’s behavior. You’ll get agreement from almost everybody that eating right, not smoking, getting exercise, avoiding too much stress, and sleeping a lot are good for your health.

The challenge is what makes people do those things, and that’s where there’s a real lack of data. So a lot of what I’m doing is investing on space. There’s evidence-based medicine. There’s also evidence-based prevention, and that’s even harder to validate.

Right now, a lot of people are doing a lot of different things. Many of them are collecting data, which over time, with luck, will prove that some of these things I’m going to talk about are valuable.

What does the landscape for healthcare products and services look like to you today?

Dyson: I see three markets.

There’s the traditional healthcare market, which is what people usually talk about. It’s drugs, clinics, hospitals, doctors, therapies, devices, insurance companies, data processors, or electronic health records.

Then there’s the market for bad health, which people don’t talk about a lot, at least not in those terms, but it’s huge. It’s the products and all of the advertising around everything from sugared soft drinks to cigarettes to recreational drugs to things that keep you from going to bed, going to sleep, keep you on the couch, and keep you immobile. I mentioned cigarettes and alcohol, I think. That’s a huge market. People are being encouraged to engage in unhealthy behaviors, whether it’s stuff that might be healthy in moderation or stuff that just isn’t healthy at all.

The new [third] market for health existed already as health clubs. What’s exciting is that there’s now an explicit market for things that are designed to change your behavior. Usually, they’re information and social-based. These are the quantified self – analytical tools, tools for sharing, tools for fostering collaboration or competition with people that behave in a healthy way. Most of those have very little data to back them up. It’s people think they make sense. The business models are still not too clear, because if I’m healthy, who’s going to pay for that? The chances are that if I’ll pay for it, I’m already kind of a health nut and don’t need it as much as someone who isn’t.

Pharma companies will pay for some such things, especially if they think that they can sell people drugs in conjunction with them. I’ll sell you a cholesterol lowering drug through a service that encourages you to exercise, for example. That’s a nice market. You go to the pre-diabetics and you sell them your statin. Various vendors of sports clubs and so forth will fund this. But over time, I expect you’re going to see employers realize the value of this, then finally long-term insurance companies and perhaps government. But it’s a market that operates mostly on faith at this point.

Speaking of faith, Rock Health shared data that around 80 percent of mobile health apps are being abandoned by consumers after two weeks. Thoughts?

Dyson: To me, that’s infant mortality. The challenge is to take the 20 percent and then make those persist. But yeah, you’re right, people try a lot of stuff and it turns out to be confusing and not well-designed, et cetera.

If you look ahead a decade, what are the big barriers for health data and mobile technology playing a beneficial role, as opposed to a more dystopian one?

Dyson: Well, the benign version is we’ve done a lot of experimentation. We’ve discovered that most apps have an 80 percent abandon rate, but the 20 percent that are persisting get better and better and better. So the 80 percent that are abandoned vanish and the marketplace and the vendors focus on the 20 percent. And we get broad adoption. You get onto the subway in New York and everybody’s thin and healthy.

Yeah, that’s not going to happen. But there’s some impact. Employers understand the value of this is. There’s a lot more to do than just these [mobile] apps. The employers start serving only healthy food in the cafeteria. Actually, one big sign is going to be what they serve for breakfast at Strata RX. I was at the Kauffman Life Sciences Entrepreneur Conference and they had muffins, bagels and cream cheese.

Carbohydrates and fat, in other words.

Dyson: And sugar-filled yogurts. That was the first day. They responded to somebody’s tweet [the second day] and it was better. But it’s not just the advertising. It’s the selection of stuff that you get when you go to these events or when you go to a hotel or you go to school or you go to your cafeteria at your office.

Defaults are tremendously important. That’s why I’m a big fan of what Bloomberg’s trying to do in New York. If you really want to buy two servings of soda, that’s fine, but the default serving should be one. I mean personally, I’d get rid of them entirely, but anyway. You know, make the defaults smaller dinner plates. All of this stuff really does have an impact.

Anyway, ten years from now, evidence has shown what works. What works is, in fact, working because people are doing it. A lot of this is social norms have changed. The early adopters have adopted, the late adopters are being carried along in the wake — just like there are still people who smoke, but it’s no longer the norm.

Do you have concerns or hopes for the risks and rewards of open health data releases?

Dyson: If we have a sensible healthcare system, the data will be helpful. Hospitals will say, “Oh my God, this guy’s at-risk, let’s prevent him getting sick.” Hospitals and the payers will know, “Gee, if we let this guy get sick, it’s going to cost us a lot more in the long run. And we actually have a business model that operates long-term rather than simply tries to minimize cost in the short-term.”

And insurance companies will say, “Gee, I’m paying for this guy. I better keep him healthy.” So the most important thing is for us to have a system that works long-term like that.

What role will personal data ownership play in the healthcare system of the future?

Dyson: Well, first we have to define what it is. I mean, from my point-of-view, you own your own data. On the other hand, if you want care, you’ve got to share it.

I think people are way too paranoid about their data. There will, inevitably, be data spills. We should try to avoid them, but we should also not encourage paranoia. If you have a rational economic system, privacy will be an issue, but financial security will not. Those two have gotten kind of mingled in people’s minds.

Yes, I may just want to keep it quiet that I have a sexually transmitted disease, but it’s not going to affect my ability to get treatment or to get insurance if I’ve got it. On the other hand, if I have to pay a little more for my diet soda or my hamburger because it’s being taxed, I don’t think that’s such a bad idea. Not that I want somebody recording how many hamburgers I eat, just tax them — but you don’t need to tax me personally: tax the hamburger.

What about the potential for the quantified self-movement to someday potentially reveal that hamburger consumption to insurers?

Dyson: You know, people are paranoid about insurers. They’re too busy. They’re not tracking the hamburgers you eat. They’re insuring populations. I mean seriously, you know? I went to get insurance and I told Aetna, “You can have my genetic profile.” And they said, “We wouldn’t know what to do with it.” I mean seriously, I’m not saying that’s entirely impossible ever in some kind of dystopia, but I really think people obsess too much about this kind of stuff.

How should — or could — startups in healthcare be differentiating themselves? What are the big problems that they could be working on solving?

Dyson: The whole social aspect. How do you design a game, a social interaction, that encourages people to react the way you want them to react? I mean, it’s just like what’s the difference between Facebook and Friendster. They both had the same potential user base. One was successful; one wasn’t. It’s the quality of the analytics, you show individuals about their behavior. It’s the narratives, the tools and the affordances that you give them for interacting with their friends. It’s like what makes one app different from another. They all use the same data in the end, but some of them are very, very different.

For what it’s worth, of the hundreds of companies that Rock Health or anybody else will tell you about, probably a third of them will disappear. One tenth will be highly successful and will acquire the remaining 57 percent.

What are the models that exist right now of the current landscape of healthcare startups that are really interesting to you? Why?

Dyson: I don’t think there’s a single one. There’s bunches of them occupying different places.

One area I really like is user-generated research and experiments. Obviously, 23andMe*. Deep analysis of your own data and the option to share it with other people and with researchers. User-generated data science research is really fascinating.

And then social affordance, like Kia’s Health Rally, where people interact with one and other. Omada Health (which I’m an investor in) is a Rock Health company which says we can’t do it all ourselves — there’s a designated counselor for a group. It’s right now focused on pre-diabetics.

I love that, partly because I think it’s going to be effective, and partly because I really like it as an employment model. I think our country is too focused on manufacturing and there’s a way to turn more people into health counselors. I mean, I’d take all of the laid off auto workers and turn them into gym teachers, and all the laid off engineers and turn them into data scientists or people developing health apps. Or something like that.

[*Dyson is an investor in 23andMe.]

What’s the biggest myth in the health data world? What’s the thing that drives you up the wall, so to speak?

Dyson: The biggest myth is that any single thing is the solution. The biggest need is for long-term thinking, which is everything from an individual thinking long-term about the impact of behavior to a financial institution thinking long-term and having the incentive to think long-term.

Individuals need to be influenced by psychology. Institutions, and the individuals in them, are employees that can be motivated or not. As an institution, they need financial incentives that are aligned with the long-term rather than the short-term.

That, again, goes back to having a vested interest in the health of people rather than in the cost of care.

Employers, to some extent, have that already. Your employer wants you to be healthy. They want you to show up for work, be cheerful, motivated and well rested. They get a benefit from you being healthy, far beyond simply avoiding the cost of your care.

Whereas the insurance companies, at this point, simply pass it through. If the insurance company is too effective, they actually have to lower their premiums, which is crazy. It’s really not insurance: it’s a cost-sharing and administration role that the insurance companies play. That’s something a lot of people don’t get. That needs to be fixed, one way or another.

July 25 2012

Rethinking regulatory reform in the Internet age

As the cover story of a February issue of The Economist highlighted, concerns about an over-regulated America are cresting in this election year, with headlines from that same magazine decrying “excessive environmental regulation” and calling for more accurate measurement of the cost of regulations. Deleting regulations is far from easy to do but there does appear to be a political tailwind behind doing so.

As a legislator and chairman of the Government Oversight and Reform Committee, it’s fair to say that Representative Darrell Issa (D-CA) been quite active in publicly discussing the issue of regulations and regulatory burdens upon business. As a former technology entrepreneur, and a successful one at that (he’s the wealthiest member of Congress) Rep. Issa does have first-hand knowledge of what it takes to run a business, to bring products to market, and to deal with the various regulations.

In a wide-ranging interview earlier this summer, Rep. Issa commented on a number of issues related to open government and the work of the committee. When we talked about smart disclosure and the reforming the Freedom of Information Act, I posed several questions about regulatory data, in the context of its role in the marketplace for products and services. Our interview on regulation is below, followed by a look at how his office and the White House are trying to use the Web to improve regulatory reform and involve citizens in the debate.

What role does the release of regulatory data from the various agencies, in the form of smart disclosure or other directions, have in creating market transparency, bringing products to market or enabling citizens to understand the quality of said products? What is the baseline for regulation? For instance, after flying a lot recently, I’ve felt grateful the FAA had regulations that meant my flights would be safes when I flew back and forth across the country or ocean. There’s some baseline for the appropriate amount of regulation but it’s never entirely clear what that might be.

Rep. Issa: I’ll give you a good example of why regulations that you believe in, you don’t believe in. Do you believe it’s dangerous to have your cell phone on as you’re going across country?

My understanding is that it is extremely likely that many people’s cellphones have been, in fact, left on while they fly cross country or while they take off and land. The probability of people not having switched them off is high. To date, I have not heard a documented case where a switched on cellphone interfered with the navatronics of the plane. [See Nick Bilton's reporting on the FAA and gadgets in the New York Times.] That logically suggests to me that it’s not as much of a risk as has been posited, but I haven’t seen the data.

Rep Issa: So, on a regulatory basis, your country is lying to you. I’m making the statement as I’m asking the question. Of course your country’s lying to you about the risk. Of course there’s a valid reason to turn off your cell phone: it’s so you won’t be distracted while they’re telling you where the exit is. So rather than say, “Look, we have the right to have you shut off your cellphone and we believe that for safety purposes you should do it, but let’s not kid each other: If you’ve got it on final so you can get your emails a little earlier by 30 seconds and you don’t mind your battery going dead a little faster, it probably has no real risk.’

The fact is your government has regulatory power to regulate an action for which they don’t actually have a good faith belief it’s causing damage. Just the opposite: they have the knowledge that these units are on all the time by accident, in people’s luggage, and our planes still don’t crash.

My problem with regulations is they need to have a cost benefit. And that cost benefit, the burden has to be against the regulator, not for the regulator. So when the EPA says, “You need to take the arsenic out of water,” as they did a number of years ago, and it sounded great, but the number was arbitrary and they had no science. And what ended up happening in New Mexico was that people’s small water districts went out of business. In some cases, people went back to taking what was ever in their well and you go, “Well, why didn’t they have a number that they could justify you absolutely had to have otherwise it was hurting you?” Well, the answer is because they never did the science, they just did the regulations.

So where does the balance lie, in your opinion?

Rep Issa: When it comes to individual liberty, I try to be as absolute as possible. When it comes to regulatory needs, I tend to be as limited as possible, both because of people’s liberty, but also because government has a tendency to want to grow itself. And if you let it grow itself, one day you wake up like the frogs that were slowly boiled because they were put in the water and didn’t notice it getting warm until they were cooked.

When I’ve traveled abroad, I’ve heard from citizens of other countries, particularly in the developing world, that one of the things that they admire about the U.S. is that we have an FDA, an EPA, an FTC and other regulatory bodies which they see holding our quite powerful corporations to some level of account. What do role those institutions have in the 21st Century to hold private interests, which have incredible amounts of power in our world, accountable for the people?

Issa: I gave you the EPA example because there was a debate that ultimately the EPA won on arsenic to the detriment of whole communities who disagreed, who said, you haven’t made the case as to why you picked a particular level. They all supported the idea that water should be clean. The question is at what point of the cost-benefit was it the right level of clean. And I remember that one.

Let me give you one in closing that’s probably perfect. Today, the FDA is unable to ensure that generic cancer and antibiotics are in sufficient supply, which was one of its mandates. And as a result, there’s a whole bootleg market developing — and the left and the right are both concerned about it — for both cancer and antibiotics because there’s a shortage. But the FDA had a regulatory responsibility to ensure that the shortage didn’t occur and they’re failing it. So the FDA has a job it’s not doing.

Additionally, people are traveling to Europe and other places to get drugs which are saving lives because they’re getting approved in those countries quicker. These are western countries with the equivalent of FDA, but they’re getting approved quicker and clinical trials are going better and moving over there.

So when we look at the FDA, you’re not attacking them because you think you shouldn’t have the Food and Drug Administration dealing with particularly the efficacy of medicines, but because the FDA is falling short in the speed to market, getting longer and longer, meaning people are being denied innovative drugs.

Can the Web help with regulatory reform and e-rulemaking?

Representative Issa, whose committee heard testimony on regulatory impediments to job creation last week, is not alone in the U.S. House in his interest in streamlining regulations. This week, Speaker Boehner and his caucus have been pushing to “cut the red tape” limiting or loosening regulations on small businesses until unemployment falls to 6%.
The administration has not been inactive on this front, although it’s fair to say that House Republicans have made clear that its progress towards regulatory reform to date has been unsatisfactory. One early case study can be found in FCC open Internet rules and net neutrality, where OpenInternet.gov was used to collect public feedback for proposed rules. Public comments on OpenInternet.gov were officially entered as official comment, which was something of a watershed in e-rulemaking. The full version of the final rules, however, were not shared with the public until days after they were voted upon.

In January 2011, President Barack Obama issued an executive order focused on reforming regulation regulatory review. One element of the order was particularly notable for observers who watch to see whether citizen engagement is part of open government efforts by this administration: its focus upon public participation in the regulatory process.
As I’ve
written elsewhere, this order is part of a larger effort towards e-rulemaking by the administration. In February 2012, Regulations.gov relaunched with an API and some social media features, with an eye towards gaining more public participation. This electronic infrastructure will almost certainly be carried over into future administrations, regardless of the political persuasion of the incumbent of the Oval Office.

This summer, Cass Sunstein, the administrator of the Office for Information and Regulatory Affairs in the White House, asked the American people for more ideas on how the federal government could “streamline, simplify or eliminate federal regulations to help businesses and individuals.”

As the Wall Street Journal reported last year, the ongoing regulatory review by OIRA is a nod to serious, long-standing concerns in the business community about excessive regulation hampering investment and job creation as citizens struggle to recover from the effects of the Great Recession.

It’s not clear yet if an upgraded Regulations.gov will makes any difference in the quality of regulatory outcomes. Rulemaking and regulatory review are, virtually by their nature, wonky and involve esoteric processes that rely upon knowledge of existing laws and regulations.

In the future, better outcomes might come from smart government approaches, through adopting what Tim O’Reilly has described “algorithmic regulation,” applying the dynamic feedback loops that Web giants use to police their systems against malware and spam in government agencies entrusted with protecting the public interest.

In the present, however, while the Internet could involve many more people in the process, improved outcomes will depend upon an digitally literate populace that’s willing to spend some of its civic surplus on public participation in identifying problematic regulations. That would mean legislators and staff, regulators and agency workers to use the dynamic social Web of 2012 to listen as well as to broadcast.

To put it another way, getting to “Regulations 2.0″ will require “Citizen 2.0″ — and we’ll need the combined efforts of all our schools, universities, libraries, non-profits and open government advocates to have a hope of successfully making that upgrade.

July 23 2012

The dark side of data

Map of France in Google Earth by Steven La Roux

A few weeks ago, Tom Slee published “Seeing Like a Geek,” a thoughtful article on the dark side of open data. He starts with the story of a Dalit community in India, whose land was transferred to a group of higher cast Mudaliars through bureaucratic manipulation under the guise of standardizing and digitizing property records. While this sounds like a good idea, it gave a wealthier, more powerful group a chance to erase older, traditional records that hadn’t been properly codified. One effect of passing laws requiring standardized, digital data is to marginalize all data that can’t be standardized or digitized, and to marginalize the people who don’t control the process of standardization.

That’s a serious problem. It’s sad to see oppression and property theft riding in under the guise of transparency and openness. But the issue isn’t open data, but how data is used.

Jesus said “the poor are with you always” not because the poor aren’t a legitimate area of concern (only an American fundamentalist would say that), but because they’re an intractable problem that won’t go away. The poor are going to be the victims of any changes in technology; it isn’t surprisingly that the wealthy in India used data to marginalize the land holdings of the poor. In a similar vein, when Europeans came to North America, I imagine they told the natives “So, you got a deed to all this land?,” a narrative that’s still being played out with indigenous people around the world.

The issue is how data is used. If the wealthy can manipulate legislators to wipe out generations of records and folk knowledge as “inaccurate,” then there’s a problem. A group like DataKind could go in and figure out a way to codify that older generation of knowledge. Then at least, if that isn’t acceptable to the government, it would be clear that the problem lies in political manipulation, not in the data itself. And note that a government could wipe out generations of “inaccurate records” without any requirement that the new records be open. In years past the monied classes would have just taken what they wanted, with the government’s support. The availability of open data gives a plausible pretext, but it’s certainly not a prerequisite (nor should it be blamed) for manipulation by the 0.1%.

One can see the opposite happening, too: the recent legislation in North Carolina that you can’t use data that shows sea level rise. Open data may be the only possible resource against forces that are interested in suppressing science. What we’re seeing here is a full-scale retreat from data and what it can teach us: an attempt to push the furniture against the door to prevent the data from getting in and changing the way we act.

The digital publishing landscape

Slee is on shakier ground when he claims that the digitization of books has allowed Amazon to undermine publishers and booksellers. Yes, there’s technological upheaval, and that necessarily drives changes in business models. Business models change; if they didn’t, we’d still have the Pony Express and stagecoaches. O’Reilly Media is thriving, in part because we have a viable digital publishing strategy; publishers without a viable digital strategy are failing.

But what about booksellers? The demise of the local bookstore has, in my observation, as much to do with Barnes & Noble superstores (and the now-defunct Borders), as with Amazon, and it played out long before the rise of ebooks.

I live in a town in southern Connecticut, roughly a half-hour’s drive from the two nearest B&N outlets. Guilford and Madison, the town immediately to the east, both have thriving independent bookstores. One has a coffeeshop, stages many, many author events (roughly one a day), and runs many other innovative programs (birthday parties, book-of-the-month services, even ebook sales). The other is just a small local bookstore with a good collection and knowledgeable staff. The town to the west lost its bookstore several years ago, possibly before Amazon even existed. Long before the Internet became a factor, it had reduced itself to cheap gift items and soft porn magazines. So: data may threaten middlemen, though it’s
not at all clear to me that middlemen can’t respond competitively. Or that they are really threatened by “data”, as opposed to large centralized competitors.

There are also countervailing benefits. With ebooks, access is democratized. Anyone, anywhere has access to what used to be available only in limited, mostly privileged locations. At O’Reilly, we now sell ebooks in countries we were never able to reach in print. Our print sales overseas never exceeded 30% of our sales; for ebooks, overseas represents more than half the total, with customers as far away as Azerbaijan.

Slee also points to the music labels as an industry that has been marginalized by open data.  I really refuse to listen whining about all the money that the music labels are losing. We’ve had too many years of crap product generated by marketing people who only care about finding the next Justin Bieber to take the “creative industry” and its sycophants seriously.

Privacy by design

Data inevitably brings privacy issues into play. As Slee points out,(and as Jeff Jonas has before him), apparently insignificant pieces of data can be put together to form a surprisingly accurate picture of who you are, a picture that can be sold. It’s useless to pretend that there won’t be increased surveillance in any forseeable future, or that there won’t be an increase in targeted advertising (which is, technically, much the same thing).

We can bemoan that shift, celebrate it, or try to subvert it, but we can’t pretend that it isn’t happening. We shouldn’t even pretend that it’s new, or that it has anything to do with openness. What is a credit bureau if not an organization that buys and sells data about your financial history, with no pretense of openness?

Jonas’s concept of “privacy by design” is an important attempt to address privacy
issues in big data. Jonas envisions a day when “I have more privacy features than you” is a marketing advantage. It’s certainly a claim I’d like to see Facebook make.

Absent a solution like Jonas’, data is going to be collected, bought, sold, and used for marketing and other purposes, whether it is “open” or not. I do not think we can get to Jonas’s world, where privacy is something consumers demand, without going through a stage where data is open and public. It’s too easy to live with the illusion of privacy that thrives in a closed world.

I agree that the notion that “open data” is an unalloyed public good is mistaken, and Tom Slee has done a good job of pointing that out. It underscores the importance of of a still-nascent ethical consensus about how to use data, along with the importance of data watchdogs, DataKind, and other organizations devoted to the public good. (I don’t understand why he argues that Apple and Amazon “undermine community activism”; that seems wrong, particularly in the light of Apple’s re-joining the EPEAT green certification system for their products after a net-driven consumer protest.) Data collection is going to happen whether we like it or not, and whether it’s open or not. I am convinced that private data is a public bad, and I’m less afraid of data that’s open. That doesn’t make it necessarily a good; that depends on how the data is used, and the people who are using it.

Image Credit: Steven La Roux

Reposted bydatenwolf datenwolf

July 21 2012

Overfocus on tech skills could exclude the best candidates for jobs

At the second RailsConf, David Heinemeier Hansson told the audience about a recruiter trying to hire with “5 years of experience with Ruby on Rails.” DHH told him “Sorry; I’ve only got 4 years.” We all laughed (I don’t think there’s anyone in the technical world who hasn’t dealt with a clueless recruiter), but little did we know this was the shape of things to come.

Last week, a startup in a relatively specialized area advertised a new engineering position for which they expected job candidates to have used their API. That raised a few eyebrows, not the least because it’s a sad commentary on the current jobs situation.

On one hand, we have high unemployment. But on the other hand, at least in the computing industry, there’s no shortage of jobs. I know many companies that are hiring, and all of them are saying they can’t find the people they want. I’m only familiar with the computer industry, which is often out of synch with the rest of the economy. Certainly, in Silicon Valley where you can’t throw a stone without hitting a newly-funded startup, we’d expect a chronic shortage of software developers. But a quick Google search will show you that the complaint is widespread: trucking, nursing, manufacturing, teaching, you’ll see the “lack of qualified applicants” complaint everywhere you look.

Is the problem that there are no qualified people? Or is the problem with the qualifications themselves?

There certainly have been structural changes in the economy, for better or for worse: many jobs have been shipped offshore, or eliminated through automation. And employers are trying to move some jobs back onshore for which the skills no longer exist in the US workforce. But I don’t believe that’s the whole story. A number of articles recently have suggested that the problem with jobs isn’t the workforce, it’s the employers: companies that are only willing to hire people who will drop in perfectly to the position that’s open. Hence, a startup requiring that applicants have developed code using their API.

It goes further: many employers are apparently using automated rejection services which (among other things) don’t give applicants the opportunity to make their case: there’s no human involved. There’s just a resume or an application form matched against a list of requirements that may be grossly out of touch with reality, generated by an HR department that probably doesn’t understand what they’re looking for, and that will never talk to the candidates they reject.

I suppose it’s a natural extension of data science to think that hiring can be automated. In the future, perhaps it will be. Even without automated application processing, it’s altogether too easy for an administrative assistant to match resumes against a checklist of “requirements” and turn everyone down: especially easy when the stack of resumes is deep. If there are lots of applications, and nobody fits the requirements, it must be the applicants’ fault, right? But at this point, rigidly matching candidates against inflexible job requirements isn’t a way to go forward.

Even for a senior position, if a startup is only willing to hire people who have already used its API, it is needlessly narrowing its applicant pool to a very small group. The candidates who survive may know the API already, but what else do they know? Are the best candidates in that group?

A senior position is likely to require a broad range of knowledge and experience, including software architecture, development methodologies, programming languages and frameworks. You don’t want to exclude most of the candidates by imposing extraneous requirements, even if those requirements make superficial sense. Does the requirement that candidates have worked with the API seem logical to an unseasoned executive or non-technical HR person? Yes, but it’s as wrong as you can get, even for a startup that expects new hires to hit the ground running.

The reports about dropping enrollments in computer science programs could give some justification to the claim that there’s a shortage of good software developers. But the ranks of software developers have never been filled by people with computer science degrees. In the early 80s, a friend of mine (a successful software developer) lamented that he was probably the last person to get a job in computing without a CS degree.

At the time, that seemed plausible, but in retrospect, it was completely wrong. I still see many people who build successful careers after dropping out of college, not completing high school, or majoring in something completely unrelated to computing. I don’t believe that they are the exceptions, nor should they be. The best way to become a top-notch software developer may well be to do a challenging programming-intensive degree program in some other discipline. But if the current trend towards overly specific job requirements and automated rejections continues, my friend will be proven correct, just about 30 years early.

A data science skills gap?

What about new areas like “data science”, where there’s a projected shortage of 1.5 million “managers and analysts”?

Well, there will most certainly be a shortage if you limit yourselves to people who have some kind of degree in data science, or a data science certification. (There are some degree programs, and no certifications that I’m aware of, though the related fields of Statistics and Business Intelligence are lousy with certifications). If you’re a pointy-haired boss who needs a degree or a certificate to tell you that a potential hire knows something in an area where you’re incompetent, you’re going to see a huge shortage of talent.

But as DJ Patil said in “Building Data Science Teams,” the best data scientists are not statisticians; they come from a wide range of scientific disciplines, including (but not limited to) physics, biology, medicine, and meteorology. Data science teams are full of physicists. The chief scientist of Kaggle, Jeremy Howard, has a degree in philosophy. The key job requirement in data science (as it is in many technical fields) isn’t demonstrated expertise in some narrow set of tools, but curiousity, flexibility, and willingness to learn. And the key obligation of the employer is to give its new hires the tools they need to succeed.

At this year’s Velocity conference, Jay Parikh talked about Facebook’s boot camp for bringing new engineers up to speed (this segment starts at about 3:30). New hires are expected to produce shippable code in the first week. There’s no question that they’re expected to come up to speed fast. But what struck me was that boot camp is that it’s a 6 week program (plus a couple additional weeks if you’re hired into operations) designed to surround new hires with the help they need to be successful. That includes mentors to help them work with the code base, review their code, integrate them into Facebook culture, and more. They aren’t expected to “hit the ground running.” They’re expected to get up to speed fast, and given a lot of help to do so successfully.

Facebook has high standards for whom they hire, but boot camp demonstrates that they understand that successful hiring isn’t about finding the perfect applicant: it’s about what happens after the new employee shows up.

Last Saturday, I had coffee with Nathan Milford, US Operations manager for Outbrain. We discussed these issues, along with synthetic biology, hardware hacking, and many other subjects. He said “when I’m hiring someone, I look for an applicant that fits the culture, who is bright, and who is excited and wants to learn. That’s it. I’m not going to require that they come with prior experience in every component of our stack. Anyone who wants to learn can pick that up on the job.”

That’s the attitude we clearly need if we’re going to make progress.

July 20 2012

Data Jujitsu: The art of turning data into product

Having worked in academia, government and industry, I’ve had a unique opportunity to build products in each sector. Much of this product development has been around building data products. Just as methods for general product development have steadily improved, so have the ideas for developing data products. Thanks to large investments in the general area of data science, many major innovations (e.g., Hadoop, Voldemort, Cassandra, HBase, Pig, Hive, etc.) have made data products easier to build. Nonetheless, data products are unique in that they are often extremely difficult, and seemingly intractable for small teams with limited funds. Yet, they get solved every day.

How? Are the people who solve them superhuman data scientists who can come up with better ideas in five minutes than most people can in a lifetime? Are they magicians of applied math who can cobble together millions of lines of code for high-performance machine learning in a few hours? No. Many of them are incredibly smart, but meeting big problems head-on usually isn’t the winning approach. There’s a method to solving data problems that avoids the big, heavyweight solution, and instead, concentrates building something quickly and iterating. Smart data scientists don’t just solve big, hard problems; they also have an instinct for making big problems small.

We call this Data Jujitsu: the art of using multiple data elements in clever ways to solve iterative problems that, when combined, solve a data problem that might otherwise be intractable. It’s related to Wikipedia’s definition of the ancient martial art of jujitsu: “the art or technique of manipulating the opponent’s force against himself rather than confronting it with one’s own force.”

How do we apply this idea to data? What is a data problem’s “weight,” and how do we use that weight against itself? These are the questions that we’ll work through in the subsequent sections.

To start, for me, a good definition of a data product is a product that facilitates an end goal through the use of data. It’s tempting to think of a data product purely as a data problem. After all, there’s nothing more fun than throwing a lot of technical expertise and fancy algorithmic work at a difficult problem. That’s what we’ve been trained to do; it’s why we got into this game in the first place. But in my experience, meeting the problem head-on is a recipe for disaster. Building a great data product is extremely challenging, and the problem will always become more complex, perhaps intractable, as you try to solve it.

Before investing in a big effort, you need to answer one simple question: Does anyone want or need your product? If no one wants the product, all the analytical work you throw at it will be wasted. So, start with something simple that lets you determine whether there are any customers. To do that, you’ll have to take some clever shortcuts to get your product off the ground. Sometimes, these shortcuts will survive into the finished version because they represent some fundamentally good ideas that you might not have seen otherwise; sometimes, they’ll be replaced by more complex analytic techniques. In any case, the fundamental idea is that you shouldn’t solve the whole problem at once. Solve a simple piece that shows you whether there’s an interest. It doesn’t have to be a great solution; it just has to be good enough to let you know whether it’s worth going further (e.g., a minimum viable product).

Here’s a trivial example. What if you want to collect a user’s address? You might consider a free-form text box, but writing a parser that can identify a name, street number, apartment number, city, zip code, etc., is a challenging problem due to the complexity of the edge cases. Users don’t necessarily put in separators like commas, nor do they necessarily spell states and cities correctly. The problem becomes much simpler if you do what most web applications do: provide separate text areas for each field, and make states drop-down boxes. The problem becomes even simpler if you can populate the city and state from a zip code (or equivalent).

Now for a less trivial example. A LinkedIn profile includes a tremendous amount of information. Can we use a profile like this to build a recommendation system for conferences? The answer is “yes.” But before answering “how,” it’s important to step back and ask some fundamental questions:

A) Does the customer care? Is there a market fit? If there isn’t, there’s no sense in building an application.

B) How long do we have to learn the answer to Question A?

We could start by creating and testing a full-fledged recommendation engine. This would require an information extraction system, an information retrieval system, a model training layer, a front end with a well-designed user interface, and so on. It might take well over 1,000 hours of work before we find out whether the user even cares.

Instead, we could build a much simpler system. Among other things, the LinkedIn profile lists books.

Book recommendations from LinkedIn profile

Books have ISBN numbers, and ISBN numbers are tagged with keywords. Similarly, there are catalogs of events that are also cataloged with keywords (Lanyrd is one). We can do some quick and dirty matching between keywords, build a simple user interface, and deploy it in an ad slot to a limited group of highly engaged users. The result isn’t the best recommendation system imaginable, but it’s good enough to get a sense of whether the users care. Most importantly, it can be built quickly (e.g., in a few days, if not a few hours). At this point, the product is far from finished. But now you have something you can test to find out whether customers are interested. If so, you can then gear up for the bigger effort. You can build a more interactive user interface, add features, integrate new data in real time, and improve the quality of the recommendation engine. You can use other parts of the profile (skills, groups and associations, even recent tweets) as part of a complex AI or machine learning engine to generate recommendations.

The key is to start simple and stay simple for as long as possible. Ideas for data products tend to start simple and become complex; if they start complex, they become impossible. But starting simple isn’t always easy. How do you solve individual parts of a much larger problem? Over time, you’ll develop a repertoire of tools that work for you. Here are some ideas to get you started.

Use product design

One of the biggest challenges of working with data is getting the data in a useful form. It’s easy to overlook the task of cleaning the data and jump to trying to build the product, but you’ll fail if getting the data into a usable form isn’t the first priority. For example, let’s say you have a simple text field into which the user types a previous employer. How many ways are there to type “IBM”? A few dozen? In fact, thousands: everything from “IBM” and “I.B.M.” to “T.J. Watson Labs” and “Netezza.” Let’s assume that to build our data product it’s necessary to have all these names tied to a common ID. One common approach to disambiguate the results would be to build a relatively complex artificial intelligence engine, but this would take significant time. Another approach would be to have a drop-down list of all the companies, but this would be a horrible user experience due to the length of the list and limited flexibility in choices.

What about Data Jujitsu? Is there a much simpler and more reliable solution? Yes, but not in artificial intelligence. It’s not hard to build a user interface that helps the user arrive at a clean answer. For example, you can:

  • Support type-ahead, encouraging the user to select the most popular term.
  • Prompt the user with “did you mean … ?”
  • If at this point you still don’t have anything usable, ask the user for more help: Ask for a stock ticker symbol or the URL of the company’s home page.

The point is to have a conversation rather than just a form. Engage the user to help you, rather than relying on analysis. You’re not just getting the user more involved (which is good in itself), you’re getting clean data that will simplify the work for your back-end systems. As a matter of practice, I’ve found that trying to solve a problem on the back end is 100-1,000 times more expensive than on the front end.

MapR delivers on the promise of Hadoop, making big data management and analysis a reality for more business users. The award-winning MapR Distribution brings unprecedented dependability, speed, and ease-of-use to Hadoop.

When in doubt, use humans

As technologists, we are predisposed to look for scalable technical solutions. We often jump to technical solutions before we know what solutions will work. Instead, see if you can break down the task into bite-size portions that humans can do, then figure out a technical solution that allows the process to scale. Amazon’s Mechanical Turk is a system for posting small problems online and paying people a small amount (typically a couple of cents) for solutions. It’s come to the rescue of many an entrepreneur who needed to get a product off the ground quickly but didn’t have months to spend on developing an analytical solution.

Here’s an example. A camera company wanted to test a product that would tell restaurant owners how many tables were occupied or empty during the day. If you treat this problem as an exercise in computer vision, it’s very complex. It can be solved, but it will take some PhDs, lots of time, and large amounts of computing power. But there’s a simpler solution. Humans can easily look at a picture and tell whether or not a table has anyone seated at it. So the company took images at regular intervals and used humans to count occupied tables. This gave them the opportunity to test their idea and determine whether the product was viable before investing in a solution to a very difficult problem. It also gave them the ability to find out what their customers really wanted to know: just the number of occupied tables? The average number of people at each table? How long customers stayed at the table? That way, when they start to build the real product, using computer vision techniques rather than humans, they know what problem to solve.

Humans are also useful for separating valid input from invalid. Imagine building a system to collect recipes for an online cookbook. You know you’ll get a fair amount of spam; how do you separate out the legitimate recipes? Again, this is a difficult problem for artificial intelligence without substantial investment, but a fairly simple problem for humans. When getting started, we can send each page to three people via Mechanical Turk. If all agree that the recipe is legitimate, we can use it. If all agree that the recipe is spam, we can reject it. And if the vote is split, we can escalate by trying another set of reviewers or adding additional data to those additional reviewers that allows them to make a better assessment. The key thing is to watch for the signals the humans use to make their decisions. When we’ve identified those signals, we can start building more complex automated systems. By using humans to solve the problem initially, we can learn a great deal about the problem at a very low cost.

Aardvark (a promising startup that was acquired by Google) took a similar path. Their goal was to build a question and answer service that routed users’ questions to real people with “inside knowledge.” For example, if a user wanted to know a good restaurant for a first date in Palo Alto, Calif., Aardvark would route the question to people living in the broader Palo Alto area, then compile the answers. They started by building tools that would allow employees to route the questions by hand. They knew this wouldn’t scale, but it let them learn enough about the routing problem to start building a more automated solution. The human solution not only made it clear what they needed to build, it proved that the technical solution was worth the effort and bought them the time they needed to build it.

In both cases, if you were to graph the work expended versus time, it would look something like this:

Work vs Time graph

Ignore the fact that I’ve violated a fundamental law of data science and presented a graph without scales on the axes. The point is that technical solutions will always win in the long run; they’ll always be more efficient, and even a poor technical solution is likely to scale better than using humans to answer questions. But when you’re getting started, you don’t care about the long run. You just want to survive long enough to have a long run, to prove that your product has value. And in the short term, human solutions require much less work. Worry about scaling when you need to.

Be opportunistic for wins

I’ve stressed building the simplest possible thing, even if you need to take shortcuts that appear to be extreme. Once you’ve got something working and you’ve proven that users want it, the next step is to improve the product. Amazon provides a good example. Back when they started, Amazon pages contained product details, reviews, the price, and a button to buy the item. But what if the customer isn’t sure he’s found what he wants and wants to do some comparison shopping? That’s simple enough in the real world, but in the early days of Amazon, the only alternative was to go back to the search engine. This is a “dead end flow”: Once the user has gone back to the search box, or to Google, there’s a good chance that he’s lost. He might find the book he wants at a competitor, even if Amazon sells the same product at a better price.

Amazon needed to build pages that channeled users into other related products; they needed to direct users to similar pages so that they wouldn’t lose the customer who didn’t buy the first thing he saw. They could have built a complex recommendation system, but opted for a far simpler system. They did this by building collaborative filters to add “People who viewed this product also viewed” to their pages. This addition had a profound effect: Users can do product research without leaving the site. If you don’t see what you want at first, Amazon channels you into another page. It was so successful that Amazon has developed many variants, including “People who bought this also bought” (so you can load up on accessories), and so on.

The collaborative filter is a great example of starting with a simple product that becomes a more complex system later, once you know that it works. As you begin to scale the collaborative filter, you have to track the data for all purchases correctly, build the data stores to hold that data, build a processing layer, develop the processes to update the data, and deal with relevancy issues. Relevance can be tricky. When there’s little data, it’s easy for a collaborative filter to give strange results; with a few errant clicks in the database, it’s easy to get from fashion accessories to power tools. At the same time, there are still ways to make the problem simpler. It’s possible to do the data analysis in a batch mode, reducing the time pressure; rather than compute “People who viewed this also viewed” on the fly, you can compute it nightly (or even weekly or monthly). You can make do with the occasional irrelevant answer (“People who bought leather handbags also bought power screwdrivers”), or perhaps even use Mechanical Turk to filter your pre-computed recommendations. Or even better, ask the users for help.

Being opportunistic can be done with analysis of general products, too. The Wall Street Journal chronicles a case in which Zynga was able to rapidly build on a success in their game FishVille. You can earn credits to buy fish, but you can also purchase credits. The Zynga Analytics team noticed that a particular set of fish was being purchased at six times the rate of all the other fish. Zynga took the opportunity to design several similar virtual fish, for which they charged $3 to $4 each. The data showed that they clearly had stumbled on to something. The common trait was that the translucent feature of the fish was what the customer wanted. Using this combination of quick observations and deploying lightweight tests, they were able to significantly add to their profits.

Ground your product in the real world

We can learn more from Amazon’s collaborative filters. What happens when you go into a physical store to buy something, say, headphones? You might look for sale prices, you might look for reviews, but you almost certainly don’t just look at one product. You look at a few, most likely something located near whatever first caught your eye. By adding “People who viewed this product also viewed,” Amazon built a similar experience into the web page. In essence, they “grounded” their virtual experience to a similar one in the real world via data.

LinkedIn’s People You May Know embodies both Data Jujitsu and grounding the product in the real world. Think about what happens when you arrive at a conference reception. You walk around the outer edge until you find someone you recognize, then you latch on to that person until you see some more people you know. At that point, your interaction style changes: Once you know there are friendly faces around, you’re free to engage with people you don’t know. (It’s a great exercise to watch this happen the next time you attend a conference.)

The same kind of experience takes place when you join a new social network. The first data scientists at LinkedIn recognized this and realized that their online world had two big challenges. First, because it is a website, you can’t passively walk around the outer edges of the group. It’s like looking for friends in a darkened room. Second, LinkedIn is fighting for every second you stay on its site; it’s not like a conference where you’re likely to have a drink or two while looking for friends. There’s a short window, really only a few seconds, for you to become engaged. If you don’t see any point to the site, you click somewhere else and you’re gone.

Earlier attempts to solve this problem, such as address book importers or search facilities, imposed too much friction. They required too much work for the poor user, who still didn’t understand why the site was valuable. But our LinkedIn team realized that a few simple heuristics could be used to determine a set of “people you may know.” We didn’t have the resources to build a complete solution. But to get something started, we could run a series of simple queries on the database: “what do you do,” “where do you live,” “where did you go to school,” and other questions that you might ask someone you met for the first time. We also used triangle closing (if Jane is connected to Mark, and Mark is connected to Sally, Sally and Jane have a high likelihood of knowing each other). To test the idea, we built a customized ad that showed each user the three people they were most likely to know. Clicking on one of those people took you to the “add connection” page. (Of course, if you saw the ad again, the results would have been the same, but the point was to quickly test with minimal impact to the user.) The results were overwhelming; it was clear that this needed to become a full-blown product, and it was quickly replicated by Facebook and all other social networks. Only after realizing that we had a hit on our hands did we do the work required to build the sophisticated machinery necessary to scale the results.

After People You May Know, our LinkedIn team realized that we could use a similar approach to build Groups You May Like. We built it almost as an exercise, when we were familiarizing ourselves with some new database technologies. It took under a week to build the first version and get it on to the home page, again using an ad slot. In the process, we learned a lot about the limitations and power of a recommendation system. On one hand, the numbers showed that people really loved the product. But additional filter rules were needed: Users didn’t like it when the system recommended political or religious groups. In hindsight, this seems obvious, almost funny, but it would have been very hard to anticipate all the rules we needed in advance. This lightweight testing gave us the flexibility to add rules as we discovered we needed them. Since we needed to test our new databases anyway, we essentially got this product “for free.” It’s another great example of a group that did something successful, then immediately took advantage of the opportunities for further wins.

Give data back to the user to create additional value

By giving data back to the user, you can create both engagement and revenue. We’re far enough into the data game that most users have realized that they’re not the customer, they’re the product. Their role in the system is to generate data, either to assist in ad targeting or to be sold to the highest bidder, or both. They may accept that, but I don’t know anyone who’s happy about it. But giving data back to the user is a way of showing that you’re on their side, increasing their engagement with your product.

How do you give data back to the user? LinkedIn has a product called “Who’s Viewed Your Profile.” This product lists the people who have viewed your profile (respecting their privacy settings, of course), and provides statistics about the viewers. There’s a time series view, a list of search terms that have been used to find you, and the geographical areas in which the viewers are located. It’s timely and actionable data, and it’s addictive. It’s visible on everyone’s home page, and it shows the number of profile views, so it’s not static. Every time you look at your LinkedIn page, you’re tempted to click.

Who Viewed Profile box from LinkedIn

And people do click. Engagement is so high that LinkedIn has two versions: one free, and the other part of the subscription package. This product differentiation benefits the casual user, who can see some summary statistics without being overloaded with more sophisticated features, while providing an easy upgrade path for more serious users.

LinkedIn isn’t the only product that provides data back to the user. Xobni analyzes your email to provide better contact management and help you control your inbox. Mint (acquired by Intuit) studies your credit cards to help you understand your expenses and compare them to others in your demographic. Pacific Gas and Electric has a SmartMeter that allows you to analyze your energy usage. We’re even seeing health apps that take data from your phone and other sensors and turn it into a personal dashboard.

In short, everyone reading this has probably spent the last year or more of their professional life immersed in data. But it’s not just us. Everyone, including users, has awakened to the value of data. Don’t hoard it; give it back, and you’ll create an experience that is more engaging and more profitable for both you and your company.

No data vomit

As data scientists, we prefer to interact with the raw data. We know how to import it, transform it, mash it up with other data sources, and visualize it. Most of your customers can’t do that. One of the biggest challenges of developing a data product is figuring out how to give data back to the user. Giving back too much data in a way that’s overwhelming and paralyzing is “data vomit.” It’s natural to build the product that you would want, but it’s very easy to overestimate the abilities of your users. The product you want may not be the product they want.

When we were building the prototype for “Who’s Viewed My Profile,” we created an early version that showed all sorts of amazing data, with a fantastic ability to drill down into the detail. How many clicks did we get when we tested it? Zero. Why? An “inverse interaction law” applies to most users: The more data you present, the less interaction.

Cool interactions graph

The best way to avoid data vomit is to focus on actionability of data. That is, what action do you want the user to take? If you want them to be impressed with the number of things that you can do with the data, then you’re likely producing data vomit. If you’re able to lead them to a clear set of actions, then you’ve built a product with a clear focus.

Expect unforeseen side effects

Of course, it’s impossible to avoid unforeseen side effects completely, right? That’s what “unforeseen” means. However, unforeseen side effects aren’t a joke. One of the best examples of an unforeseen side effect is “My TiVo Thinks I’m Gay.” Most digital video recorders have a recommendation system for other shows you might want to watch; they’ve learned from Amazon. But there are cases wherein a user has watched a particular show (say “Will & Grace”), and then it recommends other shows with similar themes (“The Ellen DeGeneres Show,” “Queer as Folk,” etc.). Along similar lines, An Anglo friend of mine who lives in a neighborhood with many people from Southeast Asia recently told me that his Netflix recommendations are overwhelmed with Bollywood films.

This sounds funny, and it’s even been used as the basis of a sitcom plot. But it’s a real pain point for users. Outsmarting the recommendation engine once it has “decided” what you want is difficult and frustrating, and you stand a good chance of losing the customer. What’s going wrong? In the case of the Bollywood recommendations, the algorithm is probably overemphasizing the movies that have been watched by the surrounding population. With the TiVo, there’s no easy way to tell the system that it’s wrong. Instead, you’re forced to try to outfox it, and users who have tried have discovered that it’s hard to out think an intelligent agent that has gotten the wrong idea.

Improving precision and recall

What tools do we have to think about bad results — things like unfortunate recommendations and collaborative filtering gone wrong? Two concepts, precision and recall, let us describe the problem more precisely. Here’s what they mean:

Precision — The ability to provide a result that exactly matches what’s desired. If you’re building a recommendation engine, can you give a good recommendation every time? If you’re displaying advertisements, will every ad result in a click? That’s high precision.

Recall — The set of possible good recommendations. Recall is fundamentally about inventory: Good recall means that you have a lot of good recommendations, or a lot of advertisements that you can potentially show the user.

It’s obvious that you’d like to have both high precision and high recall. For example, if you’re showing a user advertisements, you’d be in heaven if you have a lot of ads to show, and every ad has a high probability of resulting in a click. Unfortunately, precision and recall often work against each other: As precision increases, recall drops, and vice versa. The number of ads that have a 95% chance of resulting in a click is likely to be small indeed, and the number of ads with a 1% chance is obviously much larger.

So, an important issue in product design is the tradeoff between precision versus recall. If you’re working on a search engine, precision is the key, and having a large inventory of plausible search results is irrelevant. Results that will satisfy the user need to get to the top of the page. Low-precision search results yield a poor experience.

On the other hand, low-precision ads are almost harmless (perhaps because they’re low precision, but that’s another matter). It’s hard to know what advertisement will elicit a click, and generally it’s better to show a user something than nothing at all. We’ve seen enough irrelevant ads that we’ve learned to tune them out effectively.

The difference between these two cases is how the data is presented to the user. Search data is presented directly: If you search Google for “data science,” you’ll get 1.16 billion results in 0.47 seconds (as of this writing). The results on the first few pages will all have the term “data science” in them. You’re getting results directly related to your search; this makes intuitive sense. But the rationale behind advertising content is obfuscated. You see ads, but you don’t know why you were shown those ads. Nothing says, “We showed you this ad because you searched for data science and we know you live in Virginia, so here’s the nearest warehouse for all your data needs.” Since the relationship between the ad and your interests is obfuscated, it’s hard to judge an ad harshly for being irrelevant, but it’s also not something you’re going to pay attention to.

Generalizing beyond advertising, when building any data product in which the data is obfuscated (where there isn’t a clear relationship between the user and the result), you can compromise on precision, but not on recall. But when the data is exposed, focus on high precision.

Subjectivity

Another issue to contend with is subjectivity: How does the user perceive the results? One product at LinkedIn delivers a set of up to 10 job recommendations. The problem is that users focus on the bad recommendations, rather than the good ones. If nine results are spot on and one is off, the user will leave thinking that the entire product is terrible. One bad experience can spoil a consistently good experience. If, over five web sessions, we show you 49 perfect results in a row, but the 50th one doesn’t make sense, the damage is still done. It’s not quite as bad as if the bad result appeared in the first session, but it’s still done, and it’s hard to recover. The most common guideline is to strive for a distribution in which there are many good results, a few great ones, and no bad ones.

That’s only part of the story. You don’t really know what the user will consider a poor recommendation. Here are two sets of job recommendations:

Jobs You May Be Interested In example 1

Jobs You May Be Interested In example 2

What’s important: The job itself? Or the location? Or the title? Will the user consider a recommendation “bad” if it’s a perfect fit, but requires him to move to Minneapolis? What if the job itself is a great fit, but the user really wants “senior” in the title? You really don’t know. It’s very difficult for a recommendation engine to anticipate issues like these.

Enlisting other users

One jujitsu approach to solving this problem is to flip it around and use the social system to our advantage. Instead of sending these recommendations directly to the user, we can send the recommendations to their connections and ask them to pass along the relevant ones. Let’s suppose Mike sends me a job recommendation that, at first glance, I don’t like. One of these two things is likely to happen:

  • I’ll take a look at the job recommendation and realize it is a terrible recommendation and it’s Mike’s fault.
  • I’ll take a look at the job recommendation and try to figure out why Mike sent it. Mike may have seen something in it that I’m missing. Maybe he knows that the company is really great.

At no time is the system being penalized for making a bad recommendation. Furthermore, the product is producing data that now allows us to better train the models and increase overall precision. Thus, a little twist in the product can make a hard relevance problem disappear. This kind of cleverness lets you take a problem that’s extraordinarily challenging and gives you an edge to make the product work.

Referral Center example from LinkedIn

Ask and you shall receive

We often focus on getting a limited set of data from a user. But done correctly, you can engage the user to give you more useful, high-quality data. For example, if you’re building a restaurant recommendation service, you might ask the user for his or her zip code. But if you also ask for the zip code where the user works, you have much more information. Not only can you make recommendations for both locations, but you can predict the user’s typical commute patterns and make recommendations along the way. You increase your value to the user by giving the user a greater diversity of recommendations.

In keeping with Data Jujitsu, predicting commute patterns probably shouldn’t be part of your first release; you want the simplest thing that could possibly work. But asking for the data gives you the potential for a significantly more powerful and valuable product.

Take heed not just to demand data. You need to explain to the user why you’re asking for data; you need to disarm the user’s resistance to providing more information by telling him that you’re going to provide value (in this case, more valuable recommendations), rather than abusing the data. It’s essential to remember that you’re having a conversation with the user, rather than giving him a long form to fill out.

Anticipate failure

As we’ve seen, data products can fail because of relevance problems arising from the tradeoff between precision and recall. Design your product with the assumption that it will fail. And in the process, design it so that you can preserve the user experience even if it fails.

Two data products that demonstrate extremes in user experience are Sony’s AIBO (a robotic pet), and interactive voice response systems (IVR), such as the ones that answer the phone when you call an airline to change a flight.

Let’s consider the AIBO first. It’s a sophisticated data product. It takes in data from different sensors and uses this data to train models so that it can respond to you. What do you do if it falls over or does something similarly silly, like getting stuck walking into a wall? Do you kick it? Curse at it? No. Instead, you’re likely to pick it up and help it along. You are effectively compensating for when it fails. Let’s suppose instead of being a robotic dog, it was a robot that brought hot coffee to you. If it spilled the coffee on you, what would your reaction be? You might both kick it and curse at it. Why the difference? The difference is in the product’s form and execution. By making the robot a dog, Sony limited your expectations; you’re predisposed to cut the robot slack if it doesn’t perform correctly.

Now, let’s consider the IVR system. This is also a sophisticated data product. It tries to understand your speech and route you to the right person, which is no simple task. When you call one these systems, what’s your first response? If it is voice activated, you might say, “operator.” If that doesn’t work, maybe you’ll say “agent” or “representative.” (I suspect you’ll be wanting to scream “human” into the receiver.) Maybe you’ll start pressing the button “0.” Have you ever gone through this process and felt good? More often than not, the result is frustration.

What’s the difference? The IVR product inserts friction into the process (at least from the customer’s perspective), and limits his ability to solve a problem. Furthermore, there isn’t an easy way to override the system. Users think they’re up against a machine that thinks it is smarter than they are, and that is keeping them from doing what they want. Some could argue that this is a design feature, that adding friction is a way of controlling the amount of interaction with customer service agents. But, the net result is frustration for the customer.

You can give your data product a better chance of success by carefully setting the users’ expectations. The AIBO sets expectations relatively low: A user doesn’t expect a robotic dog to be much other than cute. Let’s think back to the job recommendations. By using Data Jujitsu and sending the results to the recipient’s network, rather than directly to him, we create a product that doesn’t act like an overly intelligent machine that the user is going to hate. By enlisting a human to do the filtering, we put a human face behind the recommendation.

One under-appreciated facet of designing data products is how the user feels after using the product. Does he feel good? Empowered? Or disempowered and dejected? A product like the AIBO, or like job recommendations sent via a friend, is structured so that the user is predisposed toward feeling good after he’s finished.

In many applications, a design treatment that gives the user control over the outcome can go far to create interactions that leave the user feeling good. For example, if you’re building a collaborative filter, you will inevitably generate incorrect recommendations. But you can allow the user to tell you about poor recommendations with a button that allows the user to “X” out recommendations he doesn’t like.

Facebook uses this design technique when they show you an ad. They also give you control to hide the ad, as well as an an opportunity to tell them why you don’t think the ad is relevant. The choices they give you range from not being relevant to being offensive. This provides an opportunity to engage users as well as give them control. It turns annoyance into empowerment; rather than being a victim of the bad ad targeting, users get to feel that they can make their own recommendations about which ads they will see in the future.

Facebook ad targeting and customization

Putting Data Jujitsu into practice

You’ve probably recognized some similarities between Data Jujitsu and some of the thought behind agile startups: Data Jujitsu embraces the notion of the minimum viable product and the simplest thing that could possibly work. While these ideas make intuitive sense, as engineers, many of us have to struggle against the drive to produce a beautiful, fully-featured, massively complex solution. There’s a reason that Rube Goldberg cartoons are so attractive. Data Jujitsu is all about saying “no” to our inner Rube Goldberg.

I talked at the start about getting clean data. It’s impossible to overstress this: 80% of the work in any data project is in cleaning the data. If you can come up with strategies for data entry that are inherently clean (such as populating city and state fields from a zip code), you’re much better off. Work done up front in getting clean data will be amply repaid over the course of the project.

A surprising amount of Data Jujitsu is about product design and user experience. If you can design your product so that users are predisposed to cut it some slack when it’s wrong (like the AIBO or, for that matter, the LinkedIn job recommendation engine), you’re way ahead. If you can enlist your users to help, you’re ahead on several levels: You’ve made the product more engaging, and you’ve frequently taken a shortcut around a huge data problem.

The key aspect of making a data product is putting the “product” first and “data” second. Saying it another way, data is one mechanism by which you make the product user-focused. With all products, you should ask yourself the following three questions:

  1. What do you want the user to take away from this product?
  2. What action do you want the user to take because of the product?
  3. How should the user feel during and after using your product?

If your product is successful, you will have plenty of time to play with complex machine learning algorithms, large computing clusters running in the cloud, and whatever you’d like. Data Jujitsu isn’t the end of the road; it’s really just the beginning. But it’s the beginning that allows you to get to the next step.

Strata Conference + Hadoop World — The O’Reilly Strata Conference, being held Oct. 23-25 in New York City, explores the changes brought to technology and business by big data, data science, and pervasive computing. This year, Strata has joined forces with Hadoop World. Save 20% on registration with the code RADAR20

Related:

 

Economic impact of open source on small business

A few months back, Tim O’Reilly and Hari Ravichandran, founder and CEO of Endurance International Group (EIG), had a discussion about the web hosting business. They talked specifically about how much of Hari’s success had been enabled by open source software. But Hari wasn’t just telling his success story to Tim, but rather was more interested in finding ways to give back to the communities that made his success possible. The two agreed that both companies would work together to produce a report making clear just how much of a role open source software plays in the hosting industry, and by extension, in enabling the web presence of millions of small businesses.

We hope you will read this free report while thinking about all the open source projects, teams and communities that have contributed to the economic succes of small businesses or local governments, yet it’s hard to measure their true economic impact. We combed through mountains of data, built economic models, surveyed customers and had discussions with small and medium businesses (SMB) to pull together a fairly broad-reaching dataset on which to base our study. The results are what you will find in this report.

Here are a few of the findings we derived from Bluehost data (an EIG company) and follow-on research:

  • 60% of web hosting usage is by SMBs, 71% if you include non-profits. Only 22% of hosted sites are for personal use.
  • WordPress is a far more important open source product than most people give it credit for. In the SMB hosting market, it is as widely used as MySQL and PHP, far ahead of Joomla and Drupal, the other leading content management systems.
  • Languages commonly used by high-tech startups, such as Ruby and Python, have little usage in the SMB hosting market, which is dominated by PHP for server-side scripting and JavaScript for client-side scripting.
  • Open source hosting alternatives have at least a 2:1 cost advantage relative to proprietary solutions.

Given that SMBs are widely thought to generate as much as 50% of GDP, the productivity gains to the economy as a whole that can be attributed to open source software are significant. The most important open source programs contributing to this expansion of opportunity for small businesses include Linux, Apache, MySQL, PHP, JavaScript, and WordPress. The developers of these open source projects and the communities that support them are truly unsung heroes of the economy!


Tim O’Reilly hosted a discussion at OSCON 2012 to examine the report’s findings. He was joined by Dan Handy, CEO of Bluehost; John Mone, EVP Technology at Endurance International Group; Roger Magoulas, Director of Market Research at O’Reilly; and Mike Hendrickson, VP of Content Strategy at O’Reilly. The following video contains the full discussion:

Related:

July 19 2012

“It’s impossible for me to die”

Julien Smith believes I won’t let him die.

The subject came up during our interview at Foo Camp 2012 — part of our ongoing foo interview series — in which Smith argued that our brains and innate responses don’t always map to the safety of our modern world:

“We’re in a place where it’s fundamentally almost impossible to die. I could literally — there’s a table in front of me made of glass — I could throw myself onto the table. I could attempt to even cut myself in the face or the throat, and before I did that, all these things would stop me. You would find a way to stop me. It’s impossible for me to die.”

[Discussed at the 5:16 mark in the associated video interview.]

Smith didn’t test his theory, but he makes a good point. The way we respond to the world often doesn’t correspond with the world’s true state. And he’s right about that not-letting-him-die thing; myself and the other people in the room would have jumped in had he crashed through a pane of glass. He would have then gone to an emergency room where the doctors and nurses would usher him through a life-saving process. The whole thing is set up to keep him among the living.

Acknowledging the safety of an environment isn’t something most people do by default. Perhaps we don’t want to tempt fate. Or maybe we’re wired to identify threats even when they’re not present. This disconnect between our ancient physical responses and our modern environments is one of the things Smith explores in his book The Flinch.

“Your body, all that it wants from you is to reproduce as often as possible and die,” Smith said during our interview. “It doesn’t care about anything else. It doesn’t want you to write a book. It doesn’t want you to change the world. It doesn’t even want you to live that long. It doesn’t care … Our brains are running on what a friend of mine would call ‘jungle surplus hardware.’ We want to do things that are totally counter and against what our jungle surplus hardware wants.” [Discussed at 2:00]

In his book, Smith says a flinch is an appropriate and important response to fights and car crashes and those sorts of things. But flinches also bubble up when we’re starting a new business, getting into a relationship and considering other risky non-life-threatening events. According to Smith, these are the flinches that hold people back.

“Your world has a safety net,” Smith writes in the book. “You aren’t in free fall, and you never will be. You treat mistakes as final, but they almost never are. Pain and scars are a part of the path, but so is getting back up, and getting up is easier than ever.”

There are many people in the world who face daily danger and the prospect of catastrophic outcomes. For them, flinches are essential survival tools. But there are also people who are surrounded by safety and opportunity. As hard as it is for a worrier like me to admit it (I’m writing this on an airplane, so fingers crossed), I’m one of them. A fight-or-flight response would be an overreaction to 99% of the things I encounter on a daily basis.

Now, I’m not about to start a local chapter of anti-flinchers, but I do think Smith has a legitimate point that deserves real consideration. Namely, gut reactions can be wrong.

Real danger and compromised thinking

To be clear, Smith isn’t suggesting we blithely ignore those little voices in the backs of our heads when a real threat is brewing.

“You can’t assume that you’re wrong, and you can’t assume that you’re right,” he said, relaying advice he received from a security expert. “You can just assume that you’re unable to process this decision properly, so step away from it and then decide from another vantage point. If you can do that, you’re fundamentally, every day, going to make better decisions.” [Discussed at 4:10]

I was surprised by this answer. I figured a guy who wrote a book about the detriments of flinches would compare threatening circumstances with other unlikely events, like lightning strikes and lottery wins. But Smith is doing something more thoughtful than rejecting fear outright. He’s working within a framework that challenges assumptions about our physical and mental processes. You can’t trust your brain or your body if you’re incapable of processing the threat. The success of your survival method, whatever it may be, depends on your capabilities. So, what you have to do is know when you’re compromised, get out of there, and then give yourself the opportunity to assess under better circumstances.

Other things from the interview

At the end of the interview I asked Smith about the people and projects he follows. He pointed toward Peter Thiel because he admires people who see different versions of the future. Smith also tracks the audacious moves made by startups, and he looks for ways those same actions and perspectives can be applied in non-startup environments. The goal is to to “see if we come up with a better society or a better individual as a result.”

You can see the full interview from Foo Camp in the following video:

Associated photo on home and category pages: Broken Glass on Concrete by shaire productions, on Flickr

Related:

July 17 2012

Some sideways thinking about cyberwarfare

When we hear the term “cyberwarfare” we think of government-backed hackers stealing data, or releasing viruses or other software exploits to disrupt another country’s capabilities, communications, or operations. We imagine terrorists or foreign hackers planning to destroy America’s power grid, financial systems, or communications networks, or stealing our secrets.

I’ve been thinking, though, that it may be useful to frame the notion of cyberwarfare far more broadly.  What if we thought of JP Morgan’s recent trading losses not simply as a “bad bet” but as the outcome of a cyberwar between JP Morgan and hedge funds?  More importantly, what if we thought of the Euro’s current troubles in part as the result of a cyberwar between the financial industry and the EU?

When two nations with differing goals attack each other, we call it warfare.  But when financial firms attack each other, or the financial industry attacks the economy of nations, we tell ourselves that it’s “the efficient market” at work.  In fact the Eurozone crisis is  a tooth-and-claw battle between central bankers and firms seeking  profit for themselves despite damage to the livelihoods of millions.

When I see headlines like “Merkel says Euro Rescue Funds Needed Against Speculators” or “Speculators Attacking the Euro” or “Banksters Take Us to the Brink” it’s pretty clear to me that we need to stop thinking of the self-interested choices made by financial firms as “just how it is,” and to think of them instead as hostile activities.  And these activities are largely carried out by software trading bots, making them, essentially, a cyberwar between profiteers and national economies (i.e. the rest of us).

I know that the reality is more complex than those statements might suggest, but reality always is. Financial firms have legitimate interests in profit seeking, and sometimes the discipline of the market is just what national economies need. But when does it go too far? The US and Israel had legitimate security interests in undermining the Iranian nuclear program too.  That doesn’t mean that we didn’t call Stuxnet an act of cyberwarfare.

One of the things we try to do on the Radar blog is to frame things in a way that help people to see the future more broadly.  I do believe that one of the major long-term trends that we need to include in our thinking is that foreign policy (including the possibility of cyberwarfare) is no longer just between nations, but between nations and individuals (whether the collective activists of popular revolutions or the terrorist as the oft-discussed “violent non-state actor“), between nations and big companies, and between companies and industries.

And just as we expect nations not to act out of untrammeled self-interest lest the world go to hell in a handbasket, I think it’s reasonable to ask financial firms to show self-restraint as well. Either that, or expect that at some point, nations may decide to fight back with more than their central banks.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl