Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

August 05 2010

Four short links: 5 August 2010

  1. Delicious Links Clustered and Stacked (Matt Biddulph) -- six years of his delicious links, k-means clustered by tag and graphed. The clusters are interesting, but I wonder whether Matt can identify significant life/work events by the spikes in the graph.
  2. Open Data and the Voluntary Sector (OKFN) -- Open data will give charities new ways to find and share information on the need of their beneficiaries - who needs their services most and where they are located. The sharing of information will be key to this - it’s not just about using data that the government has opened up, but also opening your own data.
  3. Cognitive and Behavioral Challenges in Responding to Climate Change -- At the deepest level, large scale environmental problems such as global warming threaten people's sense of the continuity of life - what sociologist Anthony Giddens calls ontological security. Ignoring the obvious can, however, be a lot of work. Both the reasons for and process of denial are socially organized; that is to say, both cognition and denial are socially structured. Denial is socially organized because societies develop and reinforce a whole repertoire of techniques or "tools" for ignoring disturbing problems. Fascinating paper. (via Jez)
  4. Blueprints -- provides a collection of interfaces and implementations to common, complex data structures. Blueprints contains a property graph model its implementations for TinkerGraph, Neo4j, and SAIL. Also, it contains an object document model and implementations for TinkerDoc, CouchDB, and MongoDB. In short, Blueprints provides a one stop shop for implemented interfaces to help developers create software without being tied to particular underlying data management systems.

June 30 2010

Data is not binary

Guest blogger Gavin Starks is founder and CEO of AMEE, a neutral aggregation platform designed to measure and track all the energy data in the world..

The World Bank has stated that "data in document format is effectively useless".

However, "open data" is only the beginning of a journey. Simply applying the rules of open source as applied to software may help us take the first steps, but there are new categories of challenges to face.

Data needs to be computable (ie. acted upon in context)

"Data" is a much broader term than "code." The term embodies a range of dimensions: there are more than just the numbers at play, especially with scientific data.

  • How was the data collected?
  • How should the data be used?
  • Are the models for processing the data valid?
  • What assumptions exist, in words and equations?
  • What is the significance of the assumptions?

In an age when peer review is an anachronism, we are searching for new solutions for "scientific content management". When Pascal's Wager is evoked, it is equally important to remember Godel's incompleteness theorems (in complex enough systems, logic can be used to prove anything, including untrue statements).

Only eight percent of members of the Scientific Research Society agreed that "peer review works well as it is" (Chubin and Hackett, 1990; p.192). Peer review has also been claimed to be "a non-validated charade whose processes generate results little better than does chance." But in the same context: "Peer review is central to the organization of modern science ... why not apply scientific [and engineering] methods to the peer review process" (Horrobin, 2001)". The absence of URLs on those two pieces of research are indicative of one of the problems we are trying to solve.

Peer review remains today in its current form because of history, but in a niche because technology has opened up usage to a mass audience.

We must build tools that enable credible engagement

To illustrate our story: we are engaged with the very pressing and complex issue of climate change. At AMEE we codify international, government, and proprietary data, models and methodologies that represent, at the most fundamental level, the algorithms that enable the energy, carbon and environmental cost of consumption and activities to be calculated. AMEE doesn't just store and re-broadcast data, it performs the calculations based on inputs to the models.

One of our challenges is getting at the raw data in a useful, repeatable, and traceable form. As a result of this, one of the core services we offer to data and standards managers are tools that enable this.

Releasing raw data is vital. There can be no excuse not to. Releasing source code is optional. It's truly great for open source review, but it's also dangerous if everyone just re-runs the same code with the same baked-in implicit and explicit assumptions and errors.

This is where data and code deviate substantially. The logic cascade for the interpretation of data is not unary (there is no single interpretation), it is based on assumptions that may vary and are subject to many quantitative and qualitative inputs: the interpretation of the data is not even binary.

We believe it's much better to publish the following five components to provide transparent and auditable disclosure:

  1. The raw data
  2. The circumstances of its collection
  3. The method and assumptions used to process the data (in words and equations)
  4. The results of the processing
  5. The known limitations on the method and significance of the assumptions

The processing code should be written from scratch as many times as possible to reduce the chance that it affected the results in any way.

Once "published," the challenge is the how to build out a credible, and usable, set of services that encourage correct usage.

Building the solution stack

At AMEE we have developed a six-tier solution to try and address some of these issues. Specifically, we address the gap between content creators/managers (e.g. standards bodies) and content users (e.g. software apps, consultants, auditors), with a solution that is both human and machine-readable.

1. Aggregation -- We aggregate the raw data, and track and log the sources. We have a standards spider that checks for changes, not unlike a search engine spider.

2. Content Enhancement -- In the process of aggregation, we document the data, and embed provenance, linking back to the source. We also add authority, a measure of the reliability and credibility of the source. We're beginning to add other taxonomies and semantic links that enable the data to be joined, and are building tools for engagement with the platform to stimulate discussion.

3. Discoverability -- AMEE Explorer is the human-readable version of the data, and the only search engine on carbon calculation models (N.B.: we are focused on the industrial and human impacts at the moment, not modeling the climate itself).

4. Repeatable Quality -- We have a quality-control process around the underlying data that is similar to a Six Sigma process. Our systems self-test the data every 30 minutes, and human checks are carried out at random intervals to ensure systemic errors have not been introduced. Our target accuracy metric is 100 percent, not five-nines.

5. Computable Engine -- We believe we are taking the notion of a master database service to an entirely new level by ensuring that not only the data is robust, but AMEE performs the actual calculations. AMEE retains an audit history behind both the inputs and the calculations themselves.

6. Interoperability and auditability -- The AMEE API is the machine-readable version of the data (in fact all of the content including meta data and documentation), which enables the calculations to be done. AMEE also stores the audit-history of both the inputs and the calculation mechanics. For example: PUT a (flight in an F-15 from London to New York at combat thrust), and GET the kgCO2 for that journey, or PUT (1000kWh reported by my Whirlpool fridge for this month, in Washington, using my preferred energy supplier and my solar panels) and GET the kgCO2.


AMEE is positioned right at the junction between cloud, code, API, content, data, and the usage of the data, and as carbon becomes priced, we believe the consequences of getting it wrong are extremely high.

From an "open" standpoint, one of the big challenges we face includes defining where the boundaries of "open" lie. Our value, of course, is in the ongoing maintenance and reliability of the system, and connecting the data.

Commercially, we are treading very carefully through the platform and use-case stack (core platform, API, data, algorithms, code, structure, etc), and increasing transparency at the most relevant points for the end-user (who needs to feel confident about their own inputs and outputs). It's a complex stack, and no open source or creative commons licenses wholly cover the kinds of issues we face.

Our field, carbon footprinting, is what we call a "non-trivial" example of where open data meets the markets: billions of dollars are flowing through or around these data on the carbon markets. For example, thousands of businesses in the UK have to start reporting their carbon footprint to the government this year, and paying for it next year. Very, very few people understand how to use this data, how it all joins together, where the trap doors are, and why it's important to build an industry-stack to solve the problem.

If we don't build a credible industry stack, from the ground up, the outcome could be no industry at all (or a tiny one), and that has dire consequences not only for the vendors and businesses in the space (such as SAP, SAS, CA, Microsoft, Google, and others), but also removes our ability to accelerate solving the underlying issue of carbon and climate change itself. Root cause of this credibility-gap has been lack of transparency, and no one has comprehensively joined the dots to see what is real, and what it not.

We also believe this kind of approach has huge value in many areas beyond the ones AMEE is addressing.

Open data isn't just about re-broadcasting data, but combining it, re-using it and building upon it. It's about creating new uses, creating new markets and building credibility into the data as it flows.


January 05 2010

Four short links: 5 January 2010

  1. Introduction to Computational Advertising -- slides to a Stanford class on a new "scientific discipline" whose central challenge is to find the best ad to present to a user engaged in a given context, such as querying a search engine ("sponsored search"), reading a web page ("content match"), watching a movie, and IM-ing. "Scientific discipline" makes me gag. You could devise algorithms, measure performance, and write papers about the best way to put carrots up your bottom or the best way to pick pockets, but those still aren't complex enough activities to be trumpeted as "new scientific disciplines". (Although I do look forward to reading Stanford's CBUM126, "Introduction to Carrot Stuffing" lecture notes online). (via Greg Linden)
  2. Timing Attack in Google KeyCzar Library -- if you compare strings in the naive way, attackers can figure out whether the first bytes they gave you are correct based on the time the comparison takes. When they get the first bytes correct, then they can work on the next, and so on. This is a common mode of information leakage, and reminds me of my revelation when I began to edit security books: "this stuff is hard". New programmers are not taught to think like attackers, and the only trope of secure programming that they're taught is "avoid buffer overflows". (via Simon Willison)
  3. Climate Wizard -- explore historical temperature data as well as the various climate models and see what their predictions look like across the United States. (via Sciblogs)
  4. Contextual Clothing for Naked Transparency (Jon Udell) -- notable for this: The Net can be an engine for context assembly, a wonderful phrase I picked up years ago from Jack Ozzie. We used to think that the challenge of social software was to amass as many users as quickly as possible, but the far harder problem to solve is how to help those people contribute to something positive. YouTube comments shows that simply having a lot of users doesn't make something virtuous.

December 31 2009

Commerce and the Wealth of Nations

I was struck the other day by an article in the New York Times that describes the different approaches of the US and China to Afghanistan, in which the US shoulders the burden of war, while China reaps the benefits of commerce. Quoting from the article, I tweeted: "American troops help make Afghanistan safe for Chinese commerce."

In response, @kamalram wrote: "During WW1 and the early days of WW2, the United States focused on commerce when much of Europe was at war. History gets repeatd"

Pundits have long proclaimed the 21st century "the Chinese century", and @kamalram may well be right that America's wars against terrorism are a turning point. But the lesson is broader than that China is securing rights to rare-earth minerals in Afghanistan while the US gets mired in a messy war. The question is who creates the industries of the 21st century, which system of government is best at encouraging innovation, and which citizens have the drive to tackle hard problems and turn them into great opportunities.

This line of thought in turn put me in mind of Thomas Friedman's recent column, Off to the races, in which he argued:

I’ve long believed there are two basic strategies for dealing with climate change — the “Earth Day” strategy and the “Earth Race” strategy. This Copenhagen climate summit was based on the Earth Day strategy. It was not very impressive. This conference produced a series of limited, conditional, messy compromises, which it is not at all clear will get us any closer to mitigating climate change at the speed and scale we need....

I am an Earth Race guy. I believe that averting catastrophic climate change is a huge scale issue. The only engine big enough to impact Mother Nature is Father Greed: the Market. Only a market, shaped by regulations and incentives to stimulate massive innovation in clean, emission-free power sources can make a dent in global warming. And no market can do that better than America’s....

In the cold war, we had the space race: who could be the first to put a man on the moon. Only two countries competed, and there could be only one winner. Today, we need the Earth Race....

Whether you're a "warmist" or a "denier," you should have no doubt that green technology is going to be one of the biggest business opportunities of the 21st century. As Friedman continues:

Even if the world never warms another degree, population is projected to rise from 6.7 billion to 9 billion between now and 2050, and more and more of those people will want to live like Americans. In this world, demand for clean power and energy efficient cars and buildings will go through the roof.

Harnessing the market is also key to my thinking about "government as a platform" (aka "Government 2.0). As I wrote in an as-yet-unpublished chapter for the upcoming O'Reilly book, Open Government: Collaboration, Transparency, and Participation in Practice:

If you look at the history of the computer industry, the innovations that define each era are frameworks that enabled a whole ecosystem of participation from companies large and small. The personal computer was such a platform. So was the World Wide Web. This same platform dynamic is playing out right now in the recent success of the Apple iPhone. Where other phones had a limited menu of applications developed by the phone vendor and a few carefully chosen partners, Apple built a framework that allowed virtually anyone to build applications for the phone, leading to an explosion of creativity, with more than 100,000 applications appearing for the phone in little more than eighteen months, and more than 3000 new ones now appearing every week.

This is the right way to frame the question of "Government 2.0." How does government become an open platform that allows people inside and outside government to innovate? How do you design a system in which all of the outcomes aren't specified beforehand, but instead evolve through interactions between government and its citizens, as a service provider enabling its user community?

It's worth noting that the idea of government as a platform applies to every aspect of the government's role in society. For example, the Federal-Aid Highway Act of 1956, which committed the United States to building an interstate highway system, was a triumph of platform thinking, a key investment in facilities that had a huge economic and social multiplier effect. Though government builds the network of roads that tie our cities together, it does not operate the factories and farms and businesses that use that network: that opportunity is afforded to "we the people." Government does set policies for the use of those roads, however, regulating interstate commerce, levying gasoline taxes as well as fees on heavy vehicles that damage the roads, setting and policing speed limits, specifying criteria for the safety of bridges and tunnels, and even for vehicles that travel on the roads, and performing many other responsibilities appropriate to a "platform provider."

While it has become common to ridicule the 1990s description of the Internet as the "information superhighway," the analogy is actually quite apt. Like the internet, the road system is a "network of networks," in which national, state, local, and private roads all interconnect, for the most part without restrictive fees. We have the same rules of the road everywhere in the country, yet anyone, down to a local landowner adding a driveway to an unimproved lot, can connect to the nation's system of roads.

The launch of weather, communications, and positioning satellites is a similar exercise of platform strategy. When you use a car navigation system to guide you to your destination, you are using an application built on the government platform, extended and enriched by massive private sector investment. When you check the weather - in the newspaper, on TV, or on the internet, you are using applications built using the National Weather Service (or equivalent services in other countries) as a platform. Until recently, the private sector had neither the resources nor the incentives to create space-based infrastructure. Government as a platform provider created capabilities that enrich the possibilities for subsequent private sector investment.

There are other areas where the appropriate role of the platform provider and the marketplace of application providers is less clear. Health care is a contentious example. Should the government be providing health care, or leaving it to the private sector? The answer is in the outcomes. If the private sector is doing a good job of providing necessary services that lead to the overall increase in the vitality of the country, government should stay out. But just as the interstate highway system increased the vitality of our transportation infrastructure, it is certainly possible that greater government involvement in health care could do the same. But it should do so, if the lesson is correctly learned, not by competing with the private sector to deliver health services, but by investing in infrastructure (and "rules of the road") that will lead to a more robust private sector ecosystem.

...platforms always require choices, and those choices must periodically be revisited. Platforms lose their power when they fail to adapt. The US investment in the highway system helped to vitiate our railroads, shaping a society of automobiles and suburbs. Today, we need to rethink the culture of sprawl and fossil fuel use that platform choice encouraged. A platform that once seemed so generative of positive outcomes can become a dead weight over time.

As we head into the second decade of the 21st century, we as a nation, we as a world need to make good choices about where we invest our time, our resources, and our ingenuity. It's the job of our leaders to make choices that give us leverage, that is, that create multiplier effects on our efforts.

The choice isn't between climate change alarmism and climate change denial, or between big government and small government. The choice is between dynamism and stagnation, between leadership that creates opportunity and leadership that protects the status quo, and, at bottom, between effective and ineffective strategies for increasing the total wealth of our society.

And of course, that wealth is more than material. Quality of life means more than quantity of stuff, and a single well designed device (or immaterial service delivered through said device) can deliver more value than a mountain of schlock. We all want to consume less and enjoy more, and it's certainly possible that there will be revolutions in which the next great innovation is itself a technology platform, a substrate of possibility on which immaterial economies grow and prosper.

I'd love to see, in this New Year, this new decade, deeper thinking about the society we want to build, and what kind of policies will encourage the market to make the right choices.

And I'd love to hear your thoughts about policy choices that might encourage 21st century industries here in America and around the world.

Reposted bymhariclaire mhariclaire

April 22 2009

February 05 2009

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!