Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

June 27 2013

Four short links: 27 June 2013

  1. nitrous.io — IDE “in the cloud”, as “the kids” say.
  2. smartHeadlight — headlight that tracks raindrops and doesn’t send out light to reflect off them back into your eyes causing you to clutch your head and veer off the road into the parking lot of a Hooters to which your wife will NOT enjoy being called to tow your VERY SORRY HONEY ass home. Thank heavens science can save us from this awful hypothetical scenario. (via Greg Linden)
  3. Knight Funds outline.io — it’s a public policy simulator that helps people visualize the impact that public policies like health care reform and school budget changes might have on local economies and communities. Simulators are hugely underused way to get public to understand policy debate. (via Julie Starr)
  4. ZXX Font — designed to be hard to OCR, though a common trick makes it pervious to OCR. Secrecy is not an option on your font menu. (via Beta Knowledge)

April 10 2013

Four short links: 10 April 2013

  1. HyperLapse — this won the Internet for April. Everyone else can go home. Check out this unbelievable video and source is available.
  2. Housing Simulator — NZ’s largest city is consulting on its growth plan, and includes a simulator so you can decide where the growth to house the hundreds of thousands of predicted residents will come from. Reminds me of NPR’s Budget Hero. Notice that none of the levers control immigration or city taxes to make different cities attractive or unattractive. Growth is a given and you’re left trying to figure out which green fields to pave.
  3. Converting To and From Google Map Tile Coordinates in PostGIS (Pete Warden) — Google Maps’ system of power-of-two tiles has become a defacto standard, widely used by all sorts of web mapping software. I’ve found it handy to use as a caching scheme for our data, but the PostGIS calls to use it were getting pretty messy, so I wrapped them up in a few functions. Code on github.
  4. So You Want to Build A Connected Sensor Device? (Google Doc) — The purpose of this document is to provide an overview of infrastructure, options, and tradeoffs for the parts of the data ecosystem that deal with generating, storing, transmitting, and sharing data. In addition to providing an overview, the goal is to learn what the pain points are, so we can address them. This is a collaborative document drafted for the purpose of discussion and contribution at Sensored Meetup #10. (via Rachel Kalmar)

March 19 2013

Four short links: 19 March 2013

  1. VizCities Dev Diary — step-by-step recount of how they brought London’s data to life, SimCity-style.
  2. Google Fibre Isn’t That ImpressiveFor [gigabit broadband] to become truly useful and necessary, we’ll need to see a long-term feedback loop of utility and acceptance. First, super-fast lines must allow us to do things that we can’t do with the pedestrian internet. This will prompt more people to demand gigabit lines, which will in turn invite developers to create more apps that require high speed, and so on. What I discovered in Kansas City is that this cycle has not yet begun. Or, as Ars Technica put it recently, “The rest of the internet is too slow for Google Fibre.”
  3. gov.uk Recommendations on Open SourceUse open source software in preference to proprietary or closed source alternatives, in particular for operating systems, networking software, Web servers, databases and programming languages.
  4. Internet Bad Neighbourhoods (PDF) — bilingual PhD thesis. The idea behind the Internet Bad Neighborhood concept is that the probability of a host in behaving badly increases if its neighboring hosts (i.e., hosts within the same subnetwork) also behave badly. This idea, in turn, can be exploited to improve current Internet security solutions, since it provides an indirect approach to predict new sources of attacks (neighboring hosts of malicious ones).

October 26 2012

Four short links: 26 October 2012

  1. BootMetro (github) — website templates with a Metro (Windows 8) look. (via Hacker News)
  2. Kenya’s Treasury to tax M-Pesa — 10% tax on mobile money-transfer systems. M-Pesa is the largest mobile money transfer service provider in Kenya, with more than 14 million subscribers. [...] It is estimated that M-Pesa reports some 2 million transactions per day. [...] the value of money transferred through mobile platforms jumped by 41 per cent in the first six months of 2012. Neer mind fighting you, you know you’re winning when they tax you! (via Evgeny Mozorov)
  3. Digital Divide and Fibre RolloutAs the group of non-users gets smaller, they are likely to become more seriously disadvantaged. The NBN – and high-speed broadband more generally – will drive a wave of new applications across most areas of life, transforming Australia’s service economy in fundamental ways. Those who are not connected in 2015 may be fewer, but they will be missing out on far more – in education, health, government, commerce, communication and entertainment. The costs will also fall on service providers forced to keep supplying expensive physical and face-to-face services to this declining number of people. This will be particularly significant in remote communities, where health consultations and evacuations by flying doctors, nurses and allied health professionals could potentially be reduced through e-health diagnostics, and where Centrelink still regularly sends teams out to communities. As gov2 expands and services move online, connectivity disadvantages are compounded. (via Ellen Strickland)
  4. Smart Body Smart World (Forrester) — take note of these two consequences of Internet of Things and Quantified Self: Verticals fuse: “Health and wellness” is not its own silo, but is connected to our finances, our shopping habits, our relationships. As bodies get connected, everyone is in the body business. Retail disperses: All retailers become computing retailers, and computing-specific retailers like Best Buy go the way of Blockbuster. You wouldn’t buy a smart toothbrush at a specialty CE store; you’d be more likely to buy it in the channel that solves the rest of your hygiene needs. (via Internet of Things)

October 16 2012

Four short links: 16 October 2012

  1. cir.ca — news app for iPhone, which lets you track updates and further news on a given story. (via Andy Baio)
  2. DataWrangler (Stanford) — an interactive tool for data cleaning and transformation. Spend less time formatting and more time analyzing your data. From the Stanford Visualization Group.
  3. Responsivator — see how websites look at different screen sizes.
  4. Accountable Algorithms (Ed Felten) — When we talk about making an algorithmic public process open, we mean two separate things. First, we want transparency: the public knows what the algorithm is. Second, we want the execution of the algorithm to be accountable: the public can check to make sure that the algorithm was executed correctly in a particular case. Transparency is addressed by traditional open government principles; but accountability is different.

October 09 2012

Four short links: 9 October 2012

  1. Finland Crowdsourcing New Laws (GigaOm) — online referenda. The Finnish government enabled something called a “citizens’ initiative”, through which registered voters can come up with new laws – if they can get 50,000 of their fellow citizens to back them up within six months, then the Eduskunta (the Finnish parliament) is forced to vote on the proposal. Now this crowdsourced law-making system is about to go online through a platform called the Open Ministry. Petitions and online voting are notoriously prone to fraud, so it will be interesting to see how well the online identity system behind this holds up.
  2. WebPlatform — wiki of information about developing for the open web. Joint production of many of the $BIGCOs of the web and the W3C, so will be interesting to see, as it develops, whether it has the best aspects of each or the worst.
  3. Why Your Phone, Cable, Internet Bills Cost So Much (Yahoo) — “The companies essentially have a business model that is antithetical to economic growth,” he says. “Profits go up if they can provide slow Internet at super high prices.” Excellent piece!
  4. Probability and Statistics Cookbook (Matthias Vallentin) — The cookbook contains a succinct representation of various topics in probability theory and statistics. It provides a comprehensive reference reduced to the mathematical essence, rather than aiming for elaborate explanations. CC-BY-NC-SA licensed, LaTeX source on github.

September 27 2012

Four short links: 27 September 2012

  1. Paying for Developers is a Bad Idea (Charlie Kindel) — The companies that make the most profit are those who build virtuous platform cycles. There are no proof points in history of virtuous platform cycles being created when the platform provider incents developers to target the platform by paying them. Paying developers to target your platform is a sign of desperation. Doing so means developers have no skin in the game. A platform where developers do not have skin in the game is artificially propped up and will not succeed in the long run. A thesis illustrated with his experience at Microsoft.
  2. Learnable Programming (Bret Victor) — deconstructs Khan Academy’s coding learning environment, and explains Victor’s take on learning to program. A good system is designed to encourage particular ways of thinking, with all features carefully and cohesively designed around that purpose. This essay will present many features! The trick is to see through them — to see the underlying design principles that they represent, and understand how these principles enable the programmer to think. (via Layton Duncan)
  3. Tablet as External Display for Android Smartphones — new app, in beta, letting you remote-control via a tablet. (via Tab Times)
  4. Clay Shirky: How The Internet Will (One Day) Transform Government (TED Talk) — There’s no democracy worth the name that doesn’t have a transparency move, but transparency is openness in only one direction, and being given a dashboard without a steering wheel has never been the core promise a democracy makes to its citizens.

September 21 2012

Four short links: 21 September 2012

  1. Business Intelligence on FarmsMachines keep track of all kinds of data about each cow, including the chemical properties of its milk, and flag when a particular cow is having problems or could be sick. The software can compare current data with historical patterns for the entire herd, and relate to weather conditions and other seasonal variations. Now a farmer can track his herd on his iPad without having to get out of bed, or even from another state. (via Slashdot)
  2. USAxGITHUB — monitor activity on all the US Federal Government’s github repositories. (via Sarah Milstein)
  3. Rethinking Robotics — $22k general purpose industrial robot. “‘It feels like a true Macintosh moment for the robot world,’ said Tony Fadell, the former Apple executive who oversaw the development of the iPod and the iPhone. Baxter will come equipped with a library of simple tasks, or behaviors — for example, a “common sense” capability to recognize it must have an object in its hand before it can move and release it.” (via David ten Have)
  4. Shift LabsShift Labs makes low-cost medical devices for resource-limited settings. [Crowd]Fund the manufacture and field testing of the Drip Clip [...] a replacement for expensive pumps that dose fluid from IV bags.

August 28 2012

June 25 2012

Four short links: 25 June 2012

  1. Stop Treating People Like Idiots (Tom Steinberg) -- governments miss the easy opportunities to link the tradeoffs they make to the point where the impacts are felt. My argument is this: key compromises or decisions should be linked to from the points where people obtain a service, or at the points where they learn about one. If my bins are only collected once a fortnight, the reason why should be one click away from the page that describes the collection times.
  2. UK Study Finds Mixed Telemedicine Benefits -- The results, in a paper to the British Medical Journal published today, found telehealth can help patients with long-term conditions avoid emergency hospital care, and also reduce deaths. However, the estimated scale of hospital cost savings is modest and may not be sufficient to offset the cost of the technology, the report finds. Overall the evidence does not warrant full scale roll-out but more careful exploration, it says. (via Mike Pearson)
  3. Pay Attention to What Nick Denton is Doing With Comments (Nieman Lab) -- Most news sites have come to treat comments as little more than a necessary evil, a kind of padded room where the third estate can vent, largely at will, and tolerated mainly as a way of generating pageviews. This exhausted consensus makes what Gawker is doing so important. Nick Denton, Gawker’s founder and publisher, Thomas Plunkett, head of technology, and the technical staff have re-designed Gawker to serve the people reading the comments, rather than the people writing them.
  4. Informed Consent Source of Confusion (Nature) -- fascinating look at the downstream uses of collected bio data and the difficulty in gaining informed consent: what you might learn about yourself (do I want to know I have an 8.3% greater chance of developing Alzheimers? What would I do with that knowledge besides worry?), what others might learn about you (will my records be subpoenable?), and what others might make from the knowledge (will my data be used for someone else's financial benefit?). (via Ed Yong)

June 22 2012

The emerging political force of the network of networks

The shape and substance of our networked world is constantly emerging over time, stretching back over decades. Over the past year, the promise of the Internet as a platform for collective action moved from theory to practice, as networked movements of protesters and consumers have used connection technologies around the world in the service of their causes.

This month, more eyes and minds came alive to the potential of this historic moment during the ninth Personal Democracy Forum (PDF) in New York City, where for two intense days the nexus of technology, politics and campaigns came together on stage (and off) in a compelling, provocative mix of TED-style keynotes and lightning talks, longer panels, and the slipstream serendipity of hallway conversations and the backchannel on Twitter.


If you are interested in the intersection of politics, technology, social change and the Internet, PDF has long since become a must-attend event, as many of the most prominent members of the "Internet public" convene to talk about what's changing and why.

The first day began with a huge helping of technology policy, followed with a hint of triumphalism regarding the newfound power of the Internet in politics that was balanced by Jaron Lanier's concern about the impact of the digital economy on the middle class. The conference kicked off with a conversation between two United States Congressmen who were central to the historic online movement that halted the progression of the Stop Online Piracy Act (SOPA) and the Protect IP Act (PIPA) in the U.S. House of Representatives and Senate: Representative Darrell Issa (R-CA) and Senator Ron Wyden (D-OR). You can watch a video of their conversation with Personal Democracy Media founder Andrew Rasiej below:

During this conversation, Rep. Issa and Sen. Ron Wyden introduced a proposal for a "Digital Bill of Rights." They published a draft set of principles on MADISON, the online legislation platform built last December during the first Congressional hackathon.

Both Congressmen pointed to different policy choices that stand to affect billions of people, ranging from proposed legislation about intellectual property, to the broader issue of online innovation and Internet freedom, and international agreements like the Anti-Counterfeiting Trade Agreement (ACTA or the Trans Pacific Partnership). Such policy choices also include online and network security: Rep. Issa sponsored and voted for CISPA, whereas Sen. Wyden is to opposed to a similar legislative approach in the Senate. SOPA, PIPA, ACTA and TPP have all been posted on MADISON for public comment.


On the second day of PDF, conversations and talks turned toward not only what is happening around the networked world but what could be in store for citizens in failed states in the developing world or those inhabiting huge cities in the West, with implications that can be simultaneously exhilarating and discomfiting. There was a strong current of discussion about the power of "adhocracy" and the force of the networked movements that are now forming, dissolving and reforming in new ways, eddying around the foundations of established societal institutions around the globe. Micah Sifry, co-founder of the Personal Democracy Forum, hailed five of these talks as exemplars of the "radical power of the Internet public.

These keynotes, by Chris Soghoian, Dave Parry, Peter Fein, Sascha Meinrath and Deanna Zandt, "could serve as a 50-minute primer on the radical power of the Internet public to change the world, why it's so important to nurture that public, where some of the threats to the Internet are coming from, and how people are routing around them to build a future 'intranet' that might well stand free from governmental and corporate control," wrote Sifry. (Three of them are embedded individually below; the rest you can watch in the complete video catalog at the bottom of this section.)

Given the historic changes in the Middle East and Africa over the past year during the Arab Spring, or the networked protests we've seen during the Occupy movement or over elections in Russia or austerity measures in Greece, it's no surprise that there was great interest in not just talking about what was happening, but why. This year, PDF attendees were also fortunate to hear about the experiences of netizens in China and Russia. The degree of change created by adding wireless Internet connectivity, social networking and online video to increasingly networked societies will vary from country to country. There are clearly powerful lessons that can be gleaned from the experiences of other humans around the globe. Learning where social change is happening (or not) and understanding how our world is changing due to the influence of networks is core to being a digitally literate citizen in the 21st century.

Declaring that we, as a nation or global polity, stand at a historic inflection point for the future of the Open Web or the role of the Internet in presidential politics or the balance of digital security and privacy feels, frankly, like a reiteration of past punditry, going well back to the .com boom in the 1990s.

That said, it doesn't make it less true. We've never been this connected to a network of networks, nor have the public, governments and corporations been so acutely aware of the risks and rewards that those connection technologies pose. It wasn't an accident that Muammar Gaddafi namechecked Facebook before his fall, nor that the current President of the United States (or his opponent in the the upcoming election) are talking directly with the public over the Internet. One area that PDF might have dwelt more upon is the dark side of networks, from organized crime and crimesourcing to government-sponsored hacking to the consequences of poorly considered online videos or updates.

We live in a moment of breathtaking technological changes that stand to disrupt nearly every sector of society, for good or ill. Many thanks to the curators and conveners of this year's conference for amplifying the voices of those whose work focuses on documenting and understanding how our digital world is changing — and a special thanks to all of the inspiring people who are not only being the change they wish to see in the world but making it.

Below, I've embedded a selection of the PDF 12 talks that resonated with me. These videos should serve a starting point, however, not an ending: every person on the program of this year's conference had something important to share, from Baratunde Thurston to Jan Hemme to Susan Crawford to Leslie Harris to Carne Ross to the RIAA's Cary Sherman — and the list goes on and on. You can watch all 45 talks from PDF 2012 (at least, the ones that have been uploaded to YouTube by the Personal Democracy Media team) in the player below:

Yochai Benkler | SOPA/PIPA: A Case Study in Networked Discourse and Activism

In this talk, Harvard law professor Yochai Benkler (@ybenkler) discussed using the Berkman Center's media cloud to trace how the Internet became a networked platform for collective action against SOPA and PIPA. Benkler applies a fascinating term — the "attention backbone" — to describe how influential nodes in a network direct traffic and awareness to research or data. If you're interested in the evolution of the blueprint for democratic participation online, you'll find this talk compelling.

Sascha Meinrath | Commotion and the Rise of the Intranet Era

Mesh networks have become an important — and growing — force for carrying connectivity to more citizens around the world. The work of Sasha Meinrath (@SashaMeinrath) at the Open Technology Institute in the New America Foundation is well worth following.

Mark Surman | Making Movements: What Punk Rock, Scouting, and the Royal Society Can Teach

Mark Surman (@msurman), the executive director of the Mozilla Foundation, shared a draft of his PDF talk prior to the conference. He offered his thoughts on "movement making," connecting lessons from punk rock, scouting and the Royal Society.

With the onrush of mobile apps and swift ride of Facebook, what we think about as the Internet — the open platform that is the World Wide Web — is changing. Surman contrasted the Internet today, enabled by an end-to-end principle, built upon open-source technologies and on open protocols, with the one of permissions, walled gardens and controlled app stores that we're seeing grow around the world. "Tim Berners-Lee built the idea that the web should be LEGO into its very design," said Surman. We'll see how if all of these pieces (loosely joined?) fit as well together in the future.

Juan Pardinas | OGP: Global Steroids for National Reformers

There are substantial responsibilities and challenges inherent in moving forward with the historic Open Government Partnership (OGP) that officially launched in New York City last September. Juan Pardinas (@jepardinas) took the position that OGP will have a positive impact on the world and that the seat civil society has at the partnership's table will matter. By the time the next annual OGP conference rolls around in 2013, history may well have rendered its own verdict on whether this effort will endure to lasting effect.

Given diplomatic challenges around South Africa's proposed secrecy law, all of the stakeholders in the Open Government Partnership will need to keep pressure on other stakeholders if significant progress is going to be made. If OGP is to be judged more than a PR opportunity for politicians and diplomats to make bold framing statements, government and civil society leaders will need to do more to hold countries accountable to the commitments required for participation: all participating countries must submit Action Plans after a bonafide public consultation. Moreover, they'll need to define the metrics by which progress should be judged and be clear with citizens about the timelines for change.

Michael Anti | Walking Along the Great Firewall

Michael Anti (@mranti) is a Chinese journalist and political blogger who has earned global attention for activism in the service of freedom of the press in China. When Anti was exiled from Facebook over its real names policy, his account deletion became an important example for other activists around the world. At PDF, he shared a frank perspective on where free speech stands in China, including how the Chinese government is responding to the challenges of their increasingly networked society. For perspective, there are now more Internet users in China (an estimated 350 million) than the total population of the United States. As you'll hear in Anti's talk, the Chinese government is learning and watching what happens elsewhere.





Masha Gessen | The Future of the Russian Protest Movement

Masha Gessen (@mashagessen), a Russian and American journalist, threw a bucket of ice water on any hopes that increasing Internet penetration or social media would in of themselves lead to improvements in governance, reduce corruption, or improve the ability of Russia's people to petition their government for grievances.





An Xiao Mina | Internet Street Art and Social Change in China

This beautiful and challenging talk by Mina (@anxiaostudio) offered a fascinating insight: memes are the street art of the censored web. If you want to learn more about how Chinese artists and citizens are communicating online, watch this creative, compelling presentation. (Note: there are naked people in this video, which will make it NSFW is some workplaces.)

Chris Soghoian | Lessons from the Bin Laden Raid and Cyberwar

Soghoian (@csoghoian), who has a well-earned reputation for finding privacy and security issues in the products and services of the world's biggest tech companies, offered up a talk that made three strong points:

  1. Automatic security updates are generally quite a good thing for users.
  2. It's highly problematic if governments create viruses that masquerade as such updates.
  3. The federal government could use an official who owns consumer IT security, not just "cybersecurity" in at the corporate or national level.

Zac Moffatt | The Real Story of 2012: Using Digital for Persuasion

Moffatt (@zacmoffatt> is the digital director for the Mitt Romney presidential campaign. In his talk, Moffatt said 2012 will be the first election cycle where persuasion and mobilization will be core elements of the digital experience. Connecting with millions of voters who have moved to the Internet is clearly a strategic priority for his team — and it appears to be paying off. The Guardian reported recently that the Romney campaign is closing the digital data gap with the Obama campaign.


Nick Judd wrote up further analysis of Moffatt's talk on digital strategy over at TechPresident.

Alex Torpey | The Local Revolution

Alex Torpey (@AlexTorpey) attracted widespread attention when he was elected mayor of South Orange New Jersey last year at the age of 23. In the months since he was elected, Torpey has been trying to interest his peers in politics. His talk at PDF focused on asking for more participation in local government and to rethink partisanship: Torpey ran as an independent. As Gov 2.0 goes local, Mayor Torpey looks likely to be one of its leaders.

Gilad Lotan | Networked Power: What We Learn From Data

If you're interested in a data-driven analysis of networked political power and media influence, Gilan Lotan's talk is a must-watch. Lotan, who tweets as @gilgul, crunched massive amounts of tweets to help the people formerly known as the audience to better understand networked movements for change.






Cheryl Contee | The End of the Digital Divide

Jack and Jill Politics co-founder Cheryl Contee (@cheryl) took a profoundly personal approach when she talked about the death and rebirth of the digital divide. She posited that what underserved citizens in the United States now face isn't so much the classic concerns of the 1990s, where citizens weren't connected to the Internet, but rather a skills gap for open jobs and a lack of investment to address those issues in poor and minority communities. She also highlighted how important mentorship can be in bridging that divide. When Contee shared how Yale computer lab director Margaret Krebs helped her, she briefly teared up — and she called on technologists, innovators and leaders to give others a hand up.

Tracing the storify of PDF 12

I published a storify of Personal Democracy Forum 2012 after the event. Incomplete though it may be, it preserves some thoughtful commentary and context shared in the Twittersphere during the event.

June 21 2012

Four short links: 21 June 2012

  1. Test, Learn, Adapt (PDF) -- UK Cabinet Office paper on randomised trials for public policy. Ben Goldacre cowrote.
  2. UK EscapeTheCity Raises GBP600k in Crowd Equity -- took just eight days, using the Crowdcube platform for equity-based crowd investment.
  3. DIY Bio SOPs -- CC-licensed set of standard operating procedures for a bio lab. These are the SOPs that I provided to the Irish EPA as part of my "Consent Conditions" for "Contained Use of Class 1 Genetically Modified Microorganisms". (via Alison Marigold)
  4. Shuffling Cards -- shuffle a deck of cards until it's randomised. That order of cards probably hasn't ever been seen before in the history of mankind.

June 08 2012

mHealth apps are just the beginning of the disruption in healthcare from open health data

Two years ago, the potential of government making health information as useful as weather data felt like an abstraction. Healthcare data could give citizens the same "blue dot" for navigating health and illness akin to the one GPS data fuels on the glowing map of geolocated mobile devices that are in more and more hands.

After all, profound changes in entire industries, take years, even generations, to occur. In government, the pace of progress can feel even slower, measured in evolutionary time and epochs.

Sometimes, history works differently, particularly given the effect of rapid technological changes. It's only a little more than a decade since President Clinton announced he would unscramble global positioning system data (GPS) for civilian use. President Obama's second U.S. chief technology officer, Todd Park, estimated that GPS data is estimated to have unlocked some $90 billion dollars in value in the United States.

In the context, the arc of the Health Data Initiative (HDI) in the United States might leave some jaded observers with whiplash. From a small beginning, the initiative to put health data to work has now expanded around the United States and attracted great interest from abroad, including observers from England National Health Service eager to understand what strategies have unlocked innovation around public data sets.

While the potential of government health data driving innovation may well have felt like an abstraction to many observers, in June 2012, real health apps and services are here -- and their potential to change how society accesses health information, deliver care, lowers costs, connects patients to one another, creates jobs, empowers care givers and cuts fraud is profound. The venture capital community seems to have noticed the opportunity here: according to HHS Secretary Sebelius, investment in healthcare startups is up 60% since 2009.

Headlines about rockstar Bon Jovi 'rocking Datapalooza' and the smorgasbord of health apps on display, however, while both understandable and largely warranted, don't convey the deeper undercurrent of change.

On March 10, 2010, the initiative started with 36 people brainstorming in a room. On June 2, 2010, approximately 325 in-person attendees saw 7 health apps demoed at an historic forum in the theater of Institute of Medicine in Washington, D.C, with another 10 apps packed into an expo in the rotunda outside. All of the apps or services used open government data from the United States Department of Health and Human Services (HHS).

In 2012, 242 applications or services that were based upon or use open data were submitted for consideration to third annual "Health Datapalooza. About 70 health app exhibitors made it to the expo. The conference itself had some 1400 registered attendees, not counting press and staff, and was sold out in advance of the event in the cavernous Washington Convention Center in DC. On Wednesday, I asked Dr. Bob Kucher, now of Venrock Capital and the Brookings Institution, about how the Health Data Initiative has grown and evolved. Dr. Kucher was instrumental to its founding when he served in the Obama administration. Our interview is embedded below:

Revolutionizing the healthcare industry --- in HHS Secretary Sebelius's words, reformulating Wired executive editor Thomas Goetz's 'latent data' to "lazy data" --- has meant years of work unlocking government data and actively engaging the developers, entrepreneurial and venture capital community. While the process of making health data open and machine-readable is far from done, there has been incontrovertible progress in standing up new application programming interfaces (APIs) that enable entrepreneurs, academic institutions and government itself to retrieve it one demand. On Monday, in concert with the Health Data Palooza, a new version of HealthData.gov launched, including the release of new data sets that enable not just hospital quality comparisons but insurance fees as well.

Two years later, the blossoming of the HDI Forum into a massive conference that attracted the interest of the media, venture capitalists and entrepreneurs from around the nation is a short-term development that few people would have predicted in 2010 but that a nation starved for solutions to spiraling healthcare costs and some action from a federal government that all too frequently looks broken is welcome.

"The immense fiscal pressure driving 'innovation' in the health context actually means belated leveraging of data insights other industries take for granted from customer databases," said Chuck Curran, executive director and general counsel or the Network Advertising Initiative, when interviewed at this year's HDI Forum. For example, he suggested, look at "the dashboarding of latent/lazy data on community health, combined with geographic visualizations, to enable “hotspot”-focused interventions, or info about service plan information like the new HHS interface for insurance plan data (including the API).

Curran also highlighted the role that fiscal pressure is having on making both individual payers and employers a natural source of business funding and adoption for entrepreneurs innovating with health data, with apps like My Drugs Costs holding the potential to help citizens and businesses alike cut down on an estimated $95 billion dollars in annual unnecessary spending on pharmaceuticals.

Curran said that health app providers have fully internalized smart disclosure : "it’s not enough to have open data available for specialist analysis -- there must be simplified interfaces for actionable insights and patient ownership of the care plan."

For entrepreneurs eying the healthcare industry and established players within it, the 2012 Health Data Palooza offers an excellent opportunity to "take the pulse of mHealth, as Jody Ranck wrote at GigaOm this week:

Roughly 95 percent of the potential entrepreneur pool doesn’t know that these vast stores of data exist, so the HHS is working to increase awareness through the Health Data Initiative. The results have been astounding. Numerous companies, including Google and Microsoft, have held health-data code-a-thons and Health 2.0 developer challenges. These have produced applications in a fraction of the time it has historically taken. Applications for understanding and managing chronic diseases, finding the best healthcare provider, locating clinical trials and helping doctors find the best specialist for a given condition have been built based on the open data available through the initiative.

In addition to the Health Datapalooza, the Health Data Initiative hosts other events which have spawned more health innovators. RockHealth, a Health 2.0 incubator, launched at its SXSW 2011 White House Startup America Roundtable. In the wake of these successful events, StartUp Health, a network of health startup incubators, entrepreneurs and investors, was created. The organization is focused on building a robust ecosystem that can support entrepreneurs in the health and wellness space.

This health data ecosystem has now spread around the United States, from Silicon Valley to New York to Louisiana. During this year's Health Datapalooza, I spoke with Ramesh Kolluru, a technologist who works at the University of Louisiana, about his work on a hackathon in Louisiana, the "Cajun Codefest," and his impressions of the forum in Washington:

One story that stood out from this year's crop of health data apps was Symcat, an mHealth app that enables people to look up their symptoms and find nearby hospitals and clinics. The application was developed by two medical students at Johns Hopkins University who happened to share a passion for tinkering, engineering and healthcare. They put their passion to work - and somehow found the time (remember, they're in medical school) to build a beautiful, usable health app. The pair landed a $100,000 prize from the Robert Wood Johnson Foundation for their efforts. In the video embedded below, I interview Craig Munsen, one of the medical students, about his application. (Notably, the pair intends to use their prize to invest in the business, not pay off medical school debt.)

There are more notable applications and services to profile from this year's expo - and in the weeks ahead, expect to see some of them here on Radar, For now, it's important now to recognize the work of all of the men and women who have worked so hard over the past two years create public good from public data.

Releasing and making open health data useful, however, is about far more than these mHealth apps: It's about saving lives, improving the quality of care, adding more transparency to a system that needs it, and creating jobs. Park spoke with me this spring about how open data relates to much more than consumer-facing mHealth apps:

As the US CTO seeks to scale open data across federal government by applying the lessons learned in the health data initiative, look for more industries to receive digital fuel for innovation, from energy to education to transit and finance. The White House digital government strategy explicitly embraces releasing open data in APIs to enable more accountability, civic utility and economic value creation.

While major challenges lie ahead, from data quality to security or privacy, the opportunity to extend the data revolution in healthcare to other industries looks more tangible now than it has in years past.

Business publications, including the Wall Street Journal, have woken up to the disruptive potential of open government data As Michael Hickins wrote this week, "The potential applications for data from agencies as disparate as the Department of Transportation and Department of Labor are endless, and will affect businesses in every industry imaginable. Including yours. But if you can think of how that data could let someone disrupt your business, you can stop that from happening by getting there first."

This growing health data movement is not placed within any single individual city, state, agency or company. It's beautifully chaotic, decentralized, and self-propelled, said Park this past week.

"The Health Data Initiative is no longer a government initiative," he said. "It's an American one. "

May 29 2012

US CTO seeks to scale agile thinking and open data across federal government

In the 21st century, federal government must go mobile, putting government services and information at the fingertips of citizens, said United States Chief Technology Officer Todd Park in a recent wide-ranging interview. "That's the first digital government result, outcome, and objective that's desired."

To achieve that vision, Park and U.S. chief information officer Steven VanRoekel are working together to improve how government shares data, architects new digital services and collaborates across agencies to reduce costs and increase productivity through smarter use of information technology.

Park, who was chosen by President Obama to be the second CTO of the United States in March, has been (relatively) quiet over the course of his first two months on the job.

Last Wednesday, that changed. Park launched a new Presidential innovation Fellows program, in concert with VanRoekel's new digital government strategy, at TechCrunch's Disrupt conference in New York City. This was followed by another event for a government audience at the Interior Department headquarters in Washington, D.C. Last Friday, he presented his team's agenda to the President's Council of Advisors on Science and Technology.

"The way I think about the strategy is that you're really talking about three elements," said Park, in our interview. "First, it's going mobile, putting government services at the literal fingertips of the people in the same way that basically every other industry and sector has done. Second, it's being smarter about how we procure technology as we move government in this direction. Finally, it's liberating data. In the end, it's the idea of 'government as a platform.'"

"We're looking for a few good men and women"

In the context of the nation's new digital government strategy, Park announced the launch of five projects that this new class of Innovation Fellows will be entrusted with implementing: a broad Open Data Initiative, Blue Button for America, RFP-EZ, The 20% Campaign, and MyGov.

The idea of the Presidential Innovation Fellows Program, said Park, is to bring in people from outside government to work with innovators inside the government. These agile teams will work together within a six-month time frame to deliver results.

The fellowships are basically scaling up the idea of "entrepreneurs in residence," said Park. "It's a portfolio of five projects that, on top of the digital government strategy, will advance the implementation of it in a variety of ways."

The biggest challenge to bringing the five programs that the US CTO has proposed to successful completion is getting 15 talented men and women to join his team and implement them. There's reason for optimism. Park shared vie email that:

"... within 24 hours of TechCrunch Disrupt, 600 people had already registered via Whitehouse.gov to apply to be a Presidential Innovation Fellow, and another several hundred people had expressed interest in following and engaging in the five projects in some other capacity."

To put that in context, Code for America received 550 applications for 24 fellowships last year. That makes both of these fellowships more competitive than getting in to Harvard in 2012, which received 34,285 applications for its next freshman class. There appears to be considerable appetite for a different kind of public service that applies technology and data for the public good.

Park is enthusiastic about putting open government data to work on behalf of the American people, amplifying the vision that his predecessor, Aneesh Chopra, championed around the country for the past three years.

"The fellows are going to have an extraordinary opportunity to make government work better for their fellow citizens," said Park in our interview. "These projects leverage, substantiate and push forward the whole principle of liberating data. Liberate data."

"To me, one of the aspects of the strategy about which I am most excited, that sends my heart into overdrive, is the idea that going forward, the default state of government data shall be open and machine-readable," said Park. "I think that's just fantastic. You'll want to, of course, evolve the legacy data as fast as you can in that same direction. Setting that as 'this is how we are rolling going forward' — and this is where we expect data to ultimately go — is just terrific."

In the videos and interview that follow, Park talks more about his vision for each of the programs.

A federal government-wide Open Data Initiative

In the video below, Park discusses the Presidential Innovation Fellows program and introduces the first program, which focuses on open data:

Park: The Open Data Initiative is a program to seed and expand the work that we're doing to liberate government data as a platform. Encourage, on a voluntary basis, the liberation of data by corporations, as part of the national data platform, and to actively stimulate the development of new tools and services, and enhance existing tools and services, leveraging the data to help improve Americans' lives in very tangible ways, and create jobs for the future.

This leverages the Open Government Directive to say "look, the default going forward is open data." Also the directive to "API-ize" two high priority datasets and also, in targeted ways, go beyond that, and really push to get more data out there in, critically, machine-readable form, in APIs, and to educate the entrepreneur and innovators of the world that it's there through meetups, and hackathons, and challenges, and "Datapaloozas."

We're doubling down on the Health Data Initiative, we are also launching a much more high-profile Safety Data Initiative, which we kicked off last week. An Energy Data Initiative, which kicked off this week. An education data initiative, which we're kicking off soon, and an Impact Data Initiative, which is about liberating data with respect to inputs and outputs in the non-profit space.

We're also going to be exploring an initiative in the realm of personal finance, enabling Americans to access copies of their financial data from public sector agencies and private sector institutions. So, the format that we're going to be leveraging to execute these initiatives is cloned from the Health Data Initiative.

This will make new data available. It will also take the existing public data that is unusable to developers, i.e. in the form of PDFs, books or static websites, and turn it into liquid machine-readable, downloadable, accessible data via API. Then — because we're consistently hearing that 95% of the innovators and entrepreneurs who could turn our data into magic don't even know the data exists, let alone that it's available to them — engage the developer community and the entrepreneurial community with the data from the beginning. Let them know it's there, get their feedback, make it better.

Blue Button for America

Park: The idea is to develop an open source patient portal capability that will replace MyHealthyVet, which is the Veterans Administration's current patient portal. This will actually allow the Blue Button itself to iterate and evolve more rapidly, so that everY time you add more data to it, it won't require heart surgery. It will be a lot easier, and of course will be open source, so that anyone else who wants to use it can use it as well. On top of that, we're going to do a lot of "biz dev" in America to get the word out about Blue Button and encourage more and more holders of data in the private sector to adopt Blue Button. We're also going to work to help stimulate more tool development by entrepreneurs that can upload Blue Button data and make it useful in all kinds of ways for patients. That's Blue Button for America.

What is RFP-EZ?

Park: The objective is "buying smarter." The project that we're working ON with the Small Business Administration on is called "RFP-EZ."

Basically, it's the idea of setting up a streamlined process for the government to procure solutions from innovative, high-growth tech companies. As you know, most high-growth companies regard the government as way too difficult to sell to.

That A) deprives startups and high-growth companies from the government as a marketplace and, B) perhaps even more problematically, actually deprives the government of their solutions.

The hope here is, through the actions of the RFP-EZ team, to create a process and a prototype that the government can much more easily procure solutions from innovative private firms.

It A) opens up this emerging market called "the government" to high-tech startups and B) infects the government with more of their solutions, which are radically more, pound for pound, effective and cost efficient than a lot of the stuff that the government is currently procuring through conventional channels. That's RFP-EZ.

The 20% Campaign

Park: The 20% Campaign is a project that's being championed by USAID. It's an effort at USAID to, working with other government agencies, NGOs and companies, to catalog the movement of foreign assistance payments from cash to electronics. So, just for example, USAID pays its contractors electronically, obviously, but the contractor who, say, pays highway workers in Afghanistan or the way that police officers get paid in Afghanistan is actually principally via cash. Or has been. And that creates all kinds of waste issues, fraud, and abuse.

The idea is actually to move to electronic payment, including mobile payment — and this has the potential to significantly cut waste, fraud and abuse, to improve financial inclusion, to actually let people on phones, to enable them to access bank accounts set up for them. That leads to all kinds of good things, including safety: it's not ideal to be carrying around large amounts of cash in highly kinetic environments.

The Afghan National Police started paying certain contingents of police officers via mobile phones and mobile payments, as opposed to cash, and what happened is that the police officers started reporting an up to a 30% raise. Of course, their pay hadn't changed, but basically, when it was in cash, a bunch of it got lost. This is obviously a good thing, but it's even more important if you realize that when they were paid what they were paid in cash that they ultimately physically received, that was less than the Taliban in this province was actually paying people to join the Taliban — but the mobile payment, and that level of salary, was greater than the Taliban was paying. That's a critical difference.

It's basically taking foreign assistance payments through the last mile to mobile.

MyGov is the U.S. version of Gov.uk

Park: MyGov is an effort to rapidly prototype a citizen-centric system that allows Americans the information and resources of government that are right for them. Think of it as a personalized channel for Americans to be able to access information resources across government and get feedback from citizens about those information and resources.

How do you plan to scale what you learned while you were HHS CTO to the all of the federal government?

Park: Specifically, we're doing exactly the same thing we did with the Health Data Initiative, kicking off the initiatives with a "data jam" — an ideation workshop where we invite, just like with health data, 40 amazing tech and energy minds, tech and safety innovators, to a room — at the White House, in the case of the Safety Data Initiative, or at Stanford University, in the case of the Energy Initiative.

We walk into the room for several hours and say, "Here's a big pile of data. What would you do with this data?" And they invent 15 or 20 news classes of products or services of the future that we could build with the data. And then we challenge them to, at the end of the session, build prototypes or actual working products, that instantiates their ideas in 90 days, to be highlighted at a White House — hosted Safety Datapalooza, Energy Datapalooza, Education Datapalooza, Impact Datapalooza, etc.

We also take the intellectual capital from the workshops, publish it on the White House website, and publicize the opportunity around the country: Discover the data, come up with your own ideas, build prototypes, and throw your hat in the ring to showcase at a Datapalooza.

What happens at the Datapaloozas — our experience in health guides us — is that, first of all, the prototypes and working products inspire many more innovators to actually build new services, products and features, because the data suddenly becomes really concrete to them, in terms of how it could be used.

Secondly, it helps persuade additional folks in the government to liberate more data, making it available, making it machine-readable, as opposed to saying, "Look, I don't know what the upside is. I can only imagine downsides." What happened in health is, when they went to a Datapalooza, they actually saw that, if data is made available, then at no cost to you and no cost to taxpayers, other people who are very smart will build incredible things that actually enhance your mission. And so you should do the same.

As more data gets liberated, that then leads to more products and services getting built, which then inspires more data liberation, which then leads to more products and services getting built — so you have a virtual spiral, like what's happened in health.

The objective of each of these initiatives is not just to liberate data. Data by itself isn't helpful. You can't eat data. You can't pour data on a wound and heal it. You can't pour data on your house and make it more energy efficient. Data is only useful if it's applied to deliver benefit. The whole point of this exercise, the whole point of these kickoff efforts, is to catalyze the development of an ecosystem of data supply and data use to improve the lives of Americans in very tangible ways — and create jobs.

We have the developers and the suppliers of data actually talk to each other, create value for the American people, and then rinse, wash, repeat.

We're recruiting, to join the team of Presidential Innovation Fellows, entrepreneurs and developers from the outside to come in and help with this effort to liberate data, make it machine-readable, and get it out there to entrepreneurs and help catalyze development of this ecosystem.

We went to TechCrunch Disrupt for a reason: it's right smack dab center in the middle of people we want to recruit. We invite people to check out the projects on WhiteHouse.gov and, if you're interested in applying to be a fellow, indicate their interest. Even if they can't come to DC for 6-plus months to be a fellow, but they want to follow one of the projects or contribute or help in some way, we are inviting them express interest in that as well. For example, if you're an entrepreneur, and you're really interested in the education space, and learning about what data is available in education, you can check out the project, look at the data, and perhaps you can build something really good to show at the Education Datapalooza.

Is open data just about government data? What about smart disclosure?

Park: In the context of the Open Data Initiatives projects, it's not just about liberation of government health data: it's also about government catalyzing the release, on a voluntary basis, of private sector data.

Obviously, scaling Blue Button will extend the open data ecosystem. We're also doubling down on Green Button. I was just in California to host discussions around Green Button. Utilities representing 31 million households and businesses have now committed to make Green Button happen. Close to 10 million households and businesses already have access to Green Button data.

There's also a whole bunch of conversation happening about, at some point later this year, having the first utilities add the option of what we're calling "Green Button Connect." Right now, the Green Button is a download, where you go to a website, hit a green button and bam, you download your data. Green Button Connect is the ability for you to say as a consumer, "I authorize this third party to receive a continuous feed of my electricity usage data."

That creates massive additional opportunity for new products and services. That could go live later this year.

As part of the education data initiative, we are pursuing the launch and scale up of something called "My Data," which will have a red color button. (It will probably, ultimately, be called "Red Button.") This is the ability for students and their families to download an electronic copy of their student loan data, of their transcript data, of their academic assessment data.

That notion of people getting their own data, whether it's your health data, your education data, your finance data, your energy use data, that's an important part of these open data initiatives as well, with government helping to catalyze the release of that data to then feed the ecosystem.

How does open data specifically relate to the things that Americans care about, access to healthcare, reducing energy bills, giving their kids more educational opportunities, and job creation? Is this just about apps?

Park: In healthcare, for example, you'll see a growing array of examples that leverage data to create tangible benefit in many, many ways for Americans. Everything from helping me find the right doctor or hospital for my family to being notified of a clinical trial that could assist my profile and save my life, and the ability to get the latest and greatest information about how to manage my asthma and diabetes via government knowledge in the National Library of Medicine.

There is a whole shift in healthcare systems away from pay-for-volume of services to basically paying to get people healthy. It goes by lots of different names — accountable care organizations or episodic payment — but the fundamental common theme is that the doctors and hospitals increasingly will be paid to keep people healthy and to co-ordinate their care, and keep them out of the hospital, and out of the ER.

There's a whole fleet of companies and services that utilize data to help doctors and hospitals do that work, like utilize Medicare claims data to help identity segments of a patient population that are at real risk, and need to get to the ER or hospital soon. There are tools that help journalists identify easily public health issues, like healthcare outcomes disparities by race, gender and ethnicity. There are tools that help country commissioners and mayors understand what's going on in a community, from a health standpoint, and make better policy decisions, like showing them food desserts. There's just a whole fleet of rapidly growing services for consumers, for doctors, nurses, journalists, employers, public policy makers, that help them make decisions, help them deliver improved health and healthcare, and create jobs, all at the same time.

That's very exciting. If you look at all of those products and services — and a subset of them are the ones that self-identify to us, to actually be exhibited at the Health Datapaloozas. Look at the 20 healthcare apps that were at the first Datapalooza or the 50 that were at the second. This year, there are 230 companies that are being narrowed down to about a total of 100 that will be at the Datapalooza. They collectively serve millions of people today, either through brand new products and services or through new features on existing platforms. They help people in ways that we would never have thought of, let alone build.

The taxpayer dollars expended here were zero. We basically just took our data, made it available in machine-readable format, educated entrepreneurs that it was there, and they did the rest. Think about these other sectors, and think about what's possible in those sectors.

In education, through making the data that we've made available, you can imagine much better tools to help you shop for the college that will deliver the biggest bang for your buck and is the best fit for your situation.

We've actually made available a bunch of data about college outcomes and are making more data available in machine-readable form so it can feed college search tools much better. We are going to be enabling students to download machine-readable copies of their own financial aid application, student loan data and school records. That will really turbo charge "smart scholarship" and school search capabilities for those students. You can actually mash that up with college outcomes in a really powerful, personalized college and scholarship search engine that is enabled by your personal data plus machine-readable data. Tools that help kids and their parents pick the right college for their education and get the right financial aid, that's something government is going to facilitate.

In the energy space, there are apps and services that help you leverage your Green Button data and other data to really assess your electricity usage compared to that of others and get concrete tips on how you can actually save yourself money. We're already seeing very clever, very cool efforts to integrate gamification and social networking into that kind of app, to make it a lot more fun and engaging — and make yourself money.

One dataset that's particularly spectacular that we're making a lot more usable is the EnergyStar database. It's got 40,000 different appliances, everything from washing machines to servers that consumers and businesses use. We are creating a much, much easier to use public, downloadable NSTAR database. It's got really detailed information on the energy use profiles and performance of each of these 40,000 appliances and devices. Imagine that actually integrated into much smarter services.

On safety, the kinds of ideas that people are bringing together are awesome. They're everything from using publicly available safety data to plot the optimal route for your kid to walk home or for a first responder to travel through a city and get to a place most expeditiously.

There's this super awesome resource on Data.gov called the "Safer Products API," which is published by the Consumer Products Safety Commission (CPSC). Consumers send in safety reports to CPSC, but until March of last year, you had to FOIA [Freedom of Information Act] CPSC to get these. So what they've now done is actually publish an API which not only makes the entire database of these reports public, without you having to FOIA them, but also makes it available through an API.

One of the ideas that came up is that, when people buy products on eBay, Craiglist, etc, all the time, some huge percentage of Americans never get to know about a recall — a recall of a crib, a recall of a toy. And even when a company recalls new products, old products are in circulation. What if someone built the ability to integrate the recall data and attach it to all the stuff in the eBays and Craigslists of the world?

Former CIO Vivek Kundra often touted government recall apps based upon government data during his tenure. Is this API the same thing, shared again, or something new?

Park: I think the smartest thing the government can do with data like product recalls data is not build our own shopping sites, or our own product information sites: it's to get the information out there in machine-readable form, so that lots and lots of other platforms that have audiences with millions of people already, and who are really good at creating shopping experiences or product comparison experiences, get the data into their hands, so that they can integrate it seamlessly into what they do. I feel that that's really the core play that the government should be engaged in.

I don't know if the Safer Products API was included in the recall app. What I do know is that before 2011, you had to FOIA to get the data. I think that even if the government included it in some app the government built, that it's important for it to get used by lots and lots of other apps that have a collective audience that's massively greater than any app the government could itself build.

Another example of this is the Hospital Compare website. The Hospital Compare website has been around for a long time. Nobody knows about it. There was a survey done that found 94% of Americans didn't know that there was hospital quality data that was available, let alone that there was a hospital compare website. So, the notion of A) making the hospital care data downloadable and B), we actually deployed it a year and a half ago in API form at Medicare.gov.

That then makes the data much easier for lots of other platforms to incorporate it, that are far more likely than HospitalCompare.gov to be able to present the information in actionable forms for citizens. Even if we build our own apps, we have to get this data out to lots of other people that can help people with it. To do that, we have to make it machine-readable, we have to put it into RESTFUL APIs — or at least make it downloadable — and get the word out to entrepreneurs that it's something they can use.

This is a stunning arbitrage opportunity. Even if you take all this data and you "API-ize" it, it's not automatic that entrepreneurs are going to know it's there.

Let's assume that the hospital quality data is good — which it is — and that you build it, and put it into an API. If nobody knows about it, you've delivered no value to the American people. People don't care whether you API a bunch of data. What they care about is that when they need to find a hospital, like I did, for my baby, I can get that information.

The private sector, in the places where we have pushed the pedal to the medal on this, has just demonstrated the incredible ability to make this data a lot more relevant and help a lot more people with it than we could have by ourselves.

White House photo used on associated home and category pages: white house by dcJohn, on Flickr

May 22 2012

Four short links: 22 May 2012

  1. New Zealand Government Budget App -- when the NZ budget is announced, it'll go live on iOS and Android apps. Tablet users get details, mobile users get talking points and speeches. Half-political, but an interesting approach to reaching out to voters with political actions.
  2. Health Care Data Dump (Washington Post) -- 5B health insurance claims (attempted anonymized) to be released. Researchers will be able to access that data, largely using it to probe a critical question: What makes health care so expensive?
  3. Perl 5.16.0 Out -- two epic things here: 590k lines of changes, and announcement quote from Auden. Auden is my favourite poet, Perl my favourite programming language.
  4. WYSIHTML5 (GitHub) -- wysihtml5 is an open source rich text editor based on HTML5 technology and the progressive-enhancement approach. It uses a sophisticated security concept and aims to generate fully valid HTML5 markup by preventing unmaintainable tag soups and inline styles.

May 17 2012

Four short links: 17 May 2012

  1. The Mythology of Big Data (PDF) -- slides from a Strata keynote by Mark R. Madsen. A lovely explanation of the social impediments to the rational use of data. (via Hamish MacEwan)
  2. Scamworld -- amazing deconstruction of the online "get rich quick" scam business. (via Andy Baio)
  3. Ceres: Solving Complex Problems with Computing Muscle -- Johnny Lee Chung explains the (computer vision) uses of the open source Ceres Non-Linear Least Squares Solver library from Google.
  4. How to Start a Think Tank (Guardian) -- The answer to the looming crisis of legitimacy we're facing is greater openness - not just regarding who met who at what Christmas party, but on the substance of policy. The best way to re-engage people in politics is to change how politics works - in the case of our project, to develop a more direct way for the people who use and provide public and voluntary services to create better social policy. Hear, hear. People seize on the little stuff because you haven't given them a way to focus something big with you.

May 15 2012

Profile of the Data Journalist: The Data News Editor

Around the globe, the bond between data and journalism is growing stronger. In an age of big data, the growing importance of data journalism lies in the ability of its practitioners to provide context, clarity and, perhaps most important, find truth in the expanding amount of digital content in the world. In that context, data journalism has profound importance for society. (You can learn more about this world and the emerging leaders of this discipline in the newly released "Data Journalism Handbook.")

To learn more about the people who are doing this work and, in some cases, building the newsroom stack for the 21st century, I conducted in-person and email interviews during the 2012 NICAR Conference and published a series of data journalist profiles here at Radar.

John Keefe (@jkeefe) is a senior editor for data news and journalism technology at WNYC public radio, based in New York City, NY. He attracted widespread attention when an online map he built using available data beat the Associated Press with Iowa caucus results earlier this year. He's posted numerous tutorials and resources for budding data journalists, including how to map data onto county districts, use APIs, create news apps without a backend content management system and make election results maps. As you'll read below, Keefe is a great example of a journalist who picked up these skills from the data journalism community and the Hacks/Hackers group.

Our interview follows, lightly edited for content and clarity. (I've also added a Twitter list of data journalist from the New York Times' Jacob Harris.)

Where do you work now? What is a day in your life like?

I work in the middle of the WNYC newsroom -- quite literally. So throughout the day, I have dozens of impromptu conversations with reporters and editors about their ideas for maps and data projects, or answering questions about how to find or download data.

Our team works almost entirely on "news time," which means our creations hit the Web in hours and days more often than weeks and months. So I'm often at my laptop creating or tweaking maps and charts to go with online stories. That said, Wednesday mornings it's breakfast at a Chelsea cafe with collaborators at Balance Media to update each other on longer-range projects and tools we make for the newsroom and then open source, like Tabletop.js and our new vertical timeline.

Then there are key meetings, such as the newsroom's daily and weekly editorial discussions, where I look for ways to contribute and help. And because there's a lot of interest and support for data news at the station, I'm also invited to larger strategy and planning meetings.

How did you get started in data journalism? Did you get any special degrees or certificates?

I've been fascinated with the intersection of information, design and technology since I was a kid. In the last couple of years, I've marveled at what journalists at the New York Times, ProPublica and the Chicago Tribune were doing online. I thought the public radio audience, which includes a lot of educated, curious people, would appreciate such data projects at WNYC, where I was news director.

Then I saw that Aron Pilhofer of the New York Times would be teaching a programming workshop at the 2009 Online News Association annual meeting. I signed up. In preparation, I installed Django on my laptop and started following the beginner's tutorial on my subway commute. I made my first "Hello World!" web app on the A Train.

I also started hanging out at Hacks/Hackers meetups and hackathons, where I'd watch people code and ask questions along the way.

Some of my experimentation made it onto the WNYC's website -- including our 2010 Census maps and the NYC Hurricane Evacuation map ahead of Hurricane Irene. Shortly thereafter, WNYC management asked me to focus on it full-time.

Did you have any mentors? Who? What were the most important resources they shared with you?

I could not have done so much so fast without kindness, encouragement and inspiration from Pilhofer at the Times; Scott Klein, Al Shaw, Jennifer LaFleur and Jeff Larson at ProPublica; , Chris Groskopf, Joe Germuska and Brian Boyer at the Chicago Tribune; and Jenny 8. Lee of, well, everywhere.

Each has unstuck me at various key moments and all have demonstrated in their own work what amazing things were possible. And they have put a premium on sharing what they know -- something I try to carry forward.

The moment I may remember most was at an afternoon geek talk aimed mainly at programmers programmers. After seeing a demo of a phone app called Twilio, I turned to Al Shaw, sitting next to me, and lamented that I had no idea how to play with such things.

"You absolutely can do this," he said.

He encouraged me to pick up Sinatra, a surprisingly easy way to use the Ruby programming language. And I was off.

What does your personal data journalism "stack" look like? What tools could you not live without?

Google Maps - Much of what I can turn around quickly is possible because of Google Maps. I'm also experimenting with MapBox and Geocommons for more data-intensive mapping projects, like our NYC diversity map.

Google Fusion Tables - Essential for my wrangling, merging and mapping of data sets on the fly.

Google Spreadsheets - These have become the "backend" to many of our data projects, giving reporters and editors direct access to the data driving an application, chart or map. We wire them to our apps using Tabletop.js, an open-source program we helped to develop.

TextMate - A programmer's text editor for Mac. There are several out there, and some are free. TextMate is my fave.

The JavaScript Tools Bundle for Textmate - It checks my JavaScript code ever time I save, flagging me to near-invisible, infuriating errors such as a stray comma or a missing parenthesis. I'm certain this one piece of software has given me more days with my kids.

Firebug for Firefox - Lets you see what your code is doing in the browser. Essential for troubleshooting CSS and JavaScript, and great for learning how the heck other people make cool stuff.

Amazon S3 - Most of what we build are static pages of html and JavaScript, which we host in the Amazon cloud and embed into article pages on our CMS.

census.ire.org - A fabulous, easy-to-navigate presentation of US Census data made by a bunch of journo-programmers for Investigative Reporters and Editors. I send someone there probably once a week.

What data journalism project are you the most proud of working on or creating?

I'd have to say our GOP Iowa Caucuses feature. It has several qualities I like:

  • Mashed-up data -- It mixes live, county vote results with Patchwork Nation community types.
  • A new take -- We know other news sites would shade Iowa's counties by the winner; we shaded them by community type and showed who won which categories.
  • Complete sharability -- We made it super-easy for anyone to embed the map into their own site, which was possible because the results came license-free from the state GOP via Google.
  • Key code from another journalist -- The map-rollover coolness comes from code built by Albert Sun, then of the Wall Street Journal and now at the New York Times.
  • Rapid learning -- I taught myself a LOT of JavaScript quickly.
  • Reusability -- We used it for which we did for each state until Santorum bowed out.


Bonus: I love that I made most of it sitting at my mom's kitchen table over winter break.

Where do you turn to keep your skills updated or learn new things?

WNYC's editors and reporters. They have the bug, and they keep coming up with new and interesting projects. And I find project-driven learning is the most effective way to discover new things. New York Public Radio -- which runs WNYC along with classical radio station WQXR, New Jersey Public Radio and a street-level performance space -- also has a growing stable of programmers and designers, who help me build things, teach me amazing tricks and spot my frequent mistakes.

The IRE/NICAR annual conference. It's a meetup of the best journo-programmers in the country, and it truly seems each person is committed to helping others learn. They're also excellent at celebrating the successes of others.

Twitter. I follow a bunch of folks who seem to tweet the best stuff, and try to keep a close eye on 'em.

Why are data journalism and "news apps" important, in the context of the contemporary digital environment for information?

Candidates, companies, municipalities, agencies and non-profit organizations all are using data. And a lot of that data is about you, me and the people we cover.

So first off, journalism needs an understanding of the data available and what it can do. It's just part of covering the story now. To skip that part of the world would shortchange our audience, and our democracy. Really.

And the better we can both present data to the general public and tell data-driven (or -supported) stories with impact, the better we can do great journalism.

May 07 2012

Four short links: 7 May 2012

  1. Liquid Feedback -- MIT-licensed voting software from the Pirate Party. See this Spiegel Online piece about how it is used for more details. (via Tim O'Reilly)
  2. Putting Gestures Into Objects (Ars Technica) -- Disney and CMU have a system called Touché, where objects can tell whether they're being clasped, swiped, pinched, etc. and by how many fingers. (via BoingBoing)
  3. Real-time Facebook 'likes' Displayed On Brazilian Fashion Retailer's Clothes Racks (The Verge) -- each hanger has a digital counter reflecting the number of likes.
  4. Foldit Games Next Play: Crowdsourcing Better Drug Design (Nature Blogs) -- “We’ve moved beyond just determining structures in nature,” Cooper, who is based at the University of Washington’s Center for Game Science in Seattle, told Nature Medicine. “We’re able to use the game to design brand new therapeutic enzymes.” He says players are now working on the ground-up design of a protein that would act as an inhibitor of the influenza A virus, and he expects to expand the drug development uses of the game to small molecule design within the next year.

April 17 2012

Four short links: 17 April 2012

  1. Penguins Counted From Space (Reuters) -- I love the unintended flow-on effects of technological progress. Nobody funded satellites because they'd help us get an accurate picture of wildlife in the Antarctic, but yet here we are. The street finds a use ...
  2. What Makes a Super-Spreader? -- A super-spreader is a person who transmits an infection to a significantly greater number of other people than the average infected person. The occurrence of a super spreader early in an outbreak can be the difference between a local outbreak that fizzles out and a regional epidemic. Cory, Waxy, Gruber, Ms BrainPickings Popova: I'm looking at you. (via BoingBoing)
  3. The Internet Did Not Kill Reading Books (The Atlantic) -- reading probably hasn't declined to the horrific levels of the 1950s.
  4. Data Transparency Hacks -- projects that came from the WSJ Data Transparency Codeathon.

April 10 2012

Open source is interoperable with smarter government at the CFPB

CFPBWhen you look at the government IT landscape of 2012, federal CIOs are being asked to address a lot of needs. They have to accomplish your mission. They need to be able to scale initiatives to tens of thousands of agency workers. They're under pressure to address not just network security but web security and mobile device security. They also need to be innovative, because all of this is supported by the same of less funding. These are common requirements in every agency.

As the first federal "start-up agency" in a generation, some of those needs at the Consumer Financial Protection Bureau (CFPB) are even more pressing. On the other hand, the opportunity for the agency to be smarter, leaner and "open from the beginning" is also immense.

Progress establishing the agency's infrastructure and culture over the first 16 months has been promising, save for larger context of getting a director at the helm. Enabling open government by design isn't just a catchphrase at the CFPB. There has been a bold vision behind the CFPB from the outset, where a 21st century regulator would leverage new technologies to find problems in the economy before the next great financial crisis escalates.

In the private sector, there's great interest right now is finding actionable insight in large volumes of data. Making sense of big data is increasingly being viewed as a strategic imperative in the public sector as well. Recently, the White House put its stamp on that reality with a $200 million big data research and development initiative, including a focus on improving the available tools. There's now an entire ecosystem of software around Hadoop, which is itself open source code. The problem that now exists in many organizations, across the public and private sector, is not so much that the technology to manipulate big data isn't available: it's that the expertise to apply big data doesn't exist in-house. The data science talent shortage is real.

People who work and play in the open source community understand the importance of sharing code, especially when that action leads to improving the code base. That's not necessarily an ethic or a perspective that has been pervasive across the federal government. That does seem to be slowly changing, with leadership from the top: the White House used Drupal for its site and has since contributed modules back into the open source community, including one that helps with 508 compliance.

In an in-person interview last week, CFPB CIO Chris Willey (@ChrisWilleyDC) and acting deputy CIO Matthew Burton (@MatthewBurton) sat down to talk about the agency's new open source policy, government IT, security, programming in-house, the myths around code-sharing, and big data.

The fact that this government IT leadership team is strongly supportive of sharing code back to the open source community is probably the most interesting part of this policy, as Scott Merrill picked up in his post on the CFPB and Github.

Our interview follows.

In addition to being the leader of the CFPB's development team over the past year and half, Burton was just named acting deputy chief information officer. What will that mean?

Willey: He hasn't been leading the software development team the whole time. In fact, we only really had an org chart as of October. In the time that he's been here, Matt has led his team to some amazing things. We're going to talk about a one of them today, but we've also got a great intranet. We've got some great internal apps that are being built and that we've built. We've unleashed one version of the supervision system that helps bank examiners do their work in the field. We've got a lot of faith he's going to do great things.

What it actually means is that he's going to be backing me up as CIO. Even though we're a fairly small organization, we have an awful lot going on. We have 76 active IT projects, for example. We're just building a team. We're actually doubling in size this fiscal year, from about 35 staff to 70, as well as adding lots of contractors. We're just growing the whole pie. We've got 800 people on board now. We're going to have 1,100 on board in the whole bureau by the end of the fiscal year. There's a lot happening, and I recognize we need to have some additional hands and brain cells helping me out.

With respect to building an internal IT team, what's the thinking behind having technical talent inside of an agency like this one? What does that change, in terms of your relationship with technology and your capacity to work?

Burton: I think it's all about experimentation. Having technical people on staff allows an organization to do new things. I think the way most agencies work is that when they have a technical need, they don't have the technical people on staff to make it happen so instead, that need becomes larger and larger until it justifies the contract. And by then, the problem is very difficult to solve.

By having developers and designers in-house, we can constantly be addressing things as they come up. In some cases, before the businesses even know it's a problem. By doing that, we're constantly staying ahead of the curve instead of always reacting to problems that we're facing.

How do you use open source technology to accomplish your mission? What are the tools you're using now?

Willey: We're actually trying to use open source in every aspect of what we do. It's not just in software development, although that's been a big focus for us. We're trying to do it on the infrastructure side as well.

As we look at network and system monitoring, we look at the tools that help us manage the infrastructure. As I've mentioned in the past, we are 100% in the cloud today. Open source has been a big help for us in giving us the ability to manipulate those infrastructures that we have out there.

At the end of the day, we want to bring in the tools that make the most sense for the business needs. It's not about only selecting open source or having necessarily a preference for open source.

What we've seen is that over time, the open source marketplace has matured. A lot of tools that might not have been ready for prime time a year ago or two years ago are today. By bringing them into the fold, we potentially save money. We potentially have systems that we can extend. We could more easily integrate with the other things that we have inside the shop that maybe we built or maybe things that we've acquired through other means. Open source gives us a lot of flexibility because there's a lot of opportunities to do things that we might not be able to do with some proprietary software.

Can you share a couple of specific examples of open source tools that you're using and what you actually use them for within mission?

Willey: On network monitoring, for example, we're using ZFS, which is an open source monitoring tool. We've been working with Nagios as well. Nagios, we actually inherited from Treasury — and while Treasury's not necessarily known for its use of open source technologies, it uses that internally for network monitoring. Splunk is another one that we have been using for web analysis. [After the interview, Burton and Willey also shared that they built the CFPB's intranet on MediaWiki, the software that drives Wikipedia.]

Burton: On the development side, we've invested a lot in Django and WordPress. Our site is a hybrid of them. It's WordPress at its core, with Django on top of that.

In November of 2010, it was actually a few weeks before I started here, Merici [Vinton] called me and said, "Matt, what should we use for our website?"

And I said, "Well, what's it going to do?"

And she said, "At first, it's going to be a blog with a few pages."

And this website needed to be up and running by February. And there was no hosting; there was nothing. There were no developers.

So I said, "Use WordPress."

And by early February, we had our website up. I'm not sure that would have been possible if we had to go through a lengthy procurement process for something not open source.

We use a lot of jQuery. We use Linux servers. For development ops, we use Selenium and Jenkins and Git to manage our releases and source code. We actually have GitHub Enterprise, which although not open source, is very sharing-focused. It encourages sharing internally. And we're using GitHub on the public side to share our code. It's great to have the same interface internally as we're using externally.

Developers and citizens alike can go to github.com/cfpb and see code that you've released back to the public and for other federal agencies. What projects are there?

Burton: These are the ones that came up between basic building blocks. They range from code that may not strike an outside developer as that interesting but that's really useful for the government, all the way to things that we created from scratch that are very developer-focused and are going to be very useful for any developer.

On the first side of that spectrum, there's an app that we made for transit subsidy involvement. Treasury used to manage our transit subsidy balances. That involved going to a webpage that you would print out, write into with a pen and then fax to someone.

Willey: Or scan and email it.

Burton: Right. And then once you'd had your supervisor sign it, faxed it over to someone, eventually, several weeks later, you would get your benefits. We started to take over that process and the human resources office came to us and asked, "How can we do this better?"

Obviously, that should just be a web form that you type into, that will auto fill any detail it knows about you. You press submit and it goes into the database, which goes directly to the DOT [Department of Transportation]. So that's what we made. We demoed that for DOT and they really like it. USAID is also into it. It's encouraging to see that something really simple could prove really useful for other agencies.

On the other side of the spectrum, we use a lot of Django tools. As an example, we have a tool we just released through our website called "Ask CFPB." It's a Django-based question and answer tool, with a series of questions and answers.

Now, the content is managed in Django. All of the content is managed from our staging server behind the firewall. When we need to get that content, we need to get the update from staging over to production.

Before, what we had to do was pick up the entire database, copy it and them move it over to production, which was kind of a nightmare. And there was no Django tool for selectively moving data modifications.

So we sat there and we thought, "Oh, we really need something to do that because we're going to be doing a lot of that. We can't be copying the database over every time we need to correct a copy. So two of our developers developed a Django app called "Nudge." Basically, you go into a Django and if you've ever seen a Django admin, you just go into it and assess, "Hey, here's everything that's changed. What do you want to move over?"

You can pick and choose what you want to move over and, with the click of a button, it goes to production. I think that's something that every Django developer will have a use for if they have a staging server.

In a way, we were sort of surprised it didn't exist. So, we needed it. We built it. Now we're giving it back and anybody in the world can use it.

You mentioned the cloud. I know that CFPB is very associated with Treasury. Are you using Treasury's FISMA moderate cloud?

Willey: We have a mix of what I would say are private and public clouds. On the public side, we're using our own cloud environments that we have established. On the private side, we are using Treasury for some of our apps. We're slowly migrating off of treasury systems onto our own cloud infrastructure or our own cloud.

In the case of email, for example, we're looking at email as a service. So we'll be looking at Google, Microsoft and others just to see what's out there and what we might be able to use.

Why is it important for the CFPB to share code back to the public? And who else in the federal government has done something like this, aside from the folks at the White House?

Burton:: We see it the same way that we believe the rest of the open source community sees it: The only way this stuff is going to get better and become more viable is if people share. Without that, then it'll only be hobbyists. It'll only be people who build their own little personal thing. Maybe it's great. Maybe it's not. Open source gets better by the community actually contributing to it. So it's self-interest in a lot of ways. If the tools get better, then what we have available to us is, therefore, gets better. We can actually do our mission better.

Using the transit subsidy enrollment application example, it's also an opportunity for government to help itself, for one agency to help another. We've created this thing. Every federal agency has a transit subsidy program. They all need to allow people to enroll in it. Therefore, it's immediately useful to any other agency in the federal government. That's just a matter of government improving its own processes.

If one group does it, why should another group have to figure it out or have to pay lots of money to have it figured out? Why not just share it internally and then everybody benefits?

Why do you think it's taken until 2012 to have that insight actually be made into reality in terms of a policy?

Burton: I think to some degree, the tools have changed. The ability to actually do this easily is a lot better now than it was even a year or two ago. Government also traditionally lags behind the private sector in a lot of ways. I think that's changing, too. With this administration in particular, I think what we've seen is that government has started to become a little bit on parity with the private sector, including some of the thinking around how to use technology to improve business processes. That's really exciting. And I think as a result, there are a lot of great people coming in as developers and designers who want to work in the federal government because they see that change.

Willey: It's also because we're new. There are two things behind that. First, we're able to sort of craft a technology philosophy with a modern perspective. So we can, from our founding, ask "What is the right way to do this?" Other agencies, if they want to do this, have to turn around decades of culture. We don't have that burden. I think that's a big reason why we're able to do this.

The second thing is a lot of agencies don't have the intense need that we do. We have 76 projects to do. We have to use every means available to us.

We can't say, "We're not going to use a large share of the software that's available to us." That's just not an option. We have to say, "Yes, we will consider this as a commercial good, just like any other piece of proprietary software."

In terms of the broader context for technology and policy, how does open source relate to open government?

Willey: When I was working for the District, Apps for Democracy was a big contest that we did around opening data and then asking developers to write applications using that data that could then be used by anybody. We said that the next logical step was to sort of create more participatory government. And in my mind, open sourcing the projects that we do is a way of asking the citizenry to participate in the active government.

So by putting something in the public space, somebody could pick that up. Maybe not the transit subsidy enrollment project — but maybe some other project that we've put out there that's useful outside of government as well as inside of government. Somebody can pick that code up, contribute to it and then we benefit. In that way, the public is helping us make government better.

When you have conversations around open source in government, what do you say about what it means to put your code online and to have people look at it or work on it? Can you take changes that people make to the code base to improve it and then use it yourself?

Willey: Everything that we put out there will be reviewed by our security team. The goal is that, by the time it's out there, not to have any security vulnerabilities. If someone does discover a security vulnerability, however, we'll be sharing that code in a way that makes it much more likely that someone will point it out to us and maybe even provide a fix than they will exploit it because it's out there. They wouldn't be exploiting our instance of the code; they would be working with the code on Github.com.

I've seen people in government with a misperception of what open source means. They hear that it's code that anyone can contribute to. I think that they don't understand that you're controlling your own instance of it. They think that anyone can come along and just write anything into your code that they like. And, of course, it's not like that.

I think as we talk more and more about this to other agencies, we might run into that, but I think it'll be good to have strong advocates in government, especially on the security side, who can say, "No, that's not the case; it doesn't work that way."

Burton: We have a firewall between our public and private instances at Git as well. So even if somebody contributes code, that's also reviewed on the way in. We wouldn't implement it unless we made sure that, from a security perspective, the code was not malicious. We're taking those precautions as well.

I can't point to one specifically, but I know that there have been articles and studies done on the relative security of open source. I think the consensus in the industry is that the peer review process of open source actually helps from a security perspective. It's not that you have a chaos of people contributing code whenever they want to. It improves the process. It's like the thinking behind academic papers. You do peer review because it enhances the quality of the work. I think that's true for open source as well.

We actually want to create a community of peer reviewers of code within the federal government. As we talk to agencies, we want people to actually use the stuff we build. We want them to contribute to it. We actually want them to be a community. As each agency contributes things, the other agencies can actually review that code and help each other from that perspective as well.

It's actually fairly hard. As we build more projects, it's going to put a little bit of a strain on our IT security team, doing an extra level of scrutiny to make sure that the code going out is safe. But the only way to get there is to grow that pie. And I think by talking with other agencies, we'll be able to do that.

A classic open source koan is that "with many eyes, all bugs become shallow." In IT security, is it that with many eyes, all worms become shallow?

Burton: What the Department of Defense said was if someone has malicious intent and the code isn't available, they'll have some way of getting the code. But if it is available and everyone has access to it, then any vulnerabilities that are there are much more likely to be corrected than before they're exploited.

How do you see open source contributing to your ability to get insights from large amounts of data? If you're recruiting developers, can they actually make a difference in helping their fellow citizens?

Burton: It's all about recruiting. As we go out and we bring on data people and software developers, we're looking for that kind of expertise. We're looking for people that have worked with PostgreSQL. We're looking for people that have worked with Solar. We're looking for people that have worked with Hadoop, because then we can start to build that expertise in-house. Those tools are out there.

R is an interesting example. What we're finding is that as more people are coming out of academia into the professional world, they're actually used to using R in school. And then they have to come out and learn a different tool and they're actually working in the marketplace.

It's similar with the Mac versus the PC. You get people using the Mac in college — and suddenly they have to go to a Windows interface. Why impose that on them? If they're going to be extremely productive with a tool like R, why not allow that to be used?

We're starting to see, in some pockets of the bureau, push from the business side to actually use some of these tools, which is great. That's another change I think that's happened in the last couple of years.

Before, there would've been big resistance on that kind of thing. Now that we're getting pushed a little bit, we have to respond to that. We also think it's worth it that we do.

Related:

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl