Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 06 2014

Self-directed learning, and O’Reilly’s role in the ConnectED program

I wanted to provide a bit of perspective on the donation, announced on Wednesday by the White House, of a Safari Books Online subscription providing access to O’Reilly Media books, videos, and other educational content to every high school in the country.

First off, this came up very suddenly, with a request from the White House that reached me only on Monday, as the White House and Department of Education were gearing up to Wednesday’s announcement about broadband and iPads in schools.  I had a followup conversation with David Edelman, a young staffer who taught himself programming by reading O’Reilly books when in middle school, and launched a web development firm while in high school.  He made the case that connectivity alone, without content, wasn’t all it could be. And he thought of his own experience, and he thought of us.

So we began brainstorming if there were any way we could donate a library of O’Reilly ebooks to every high school in the country. Fortunately, there may be a relatively easy way for us to do that, via Safari Books Online, the subscription service we launched in 2000 in partnership with the Pearson Technology Group. Safari already offers access to corporations and colleges in addition to individuals, so we should be able to work out some kind of special library as part of this offering.

Andrew Savikas, the CEO of Safari, was game. We still haven’t figured out all the details on how we’ll be implementing the program, but in essence, we’ll be providing a custom Safari subscription containing a rich library of content from O’Reilly (and potentially other publishers, if they want to join us) to all high schools in the US.

What’s interesting here is that when we think about education, we often think about investing in teachers. And yes, teachers are incredibly important. But they are only one of the resources we provide to motivated students.

I can’t tell you how often people come up to me and say, “I taught myself everything I know about programming from your books.” In fast-moving fields like software development, people learn from their peers, by looking at source code, and reading books or watching videos to learn more about how things work. They teach themselves.

And if this is true of our adult customers, it is also true of high schoolers and even middle schoolers. I still laugh to remember when it came time to sign the contract for Adam Goldstein’s first book with us, Applescript: The Missing Manual, and he sheepishly confessed that his mother would have to sign for him, because he was only sixteen. His proposal had been flawless – over email, how were we to know how young he was? Adam went on to be an Internet entrepreneur, founder and CEO of the Hipmunk travel search engine.

Other people from O’Reilly’s extended circle of friends who may be well known to you who began their software careers in high school or younger include Eric Ries of Lean Startup fame, Dylan Field of Figma, Alex Rampell of TrialPay, and, sadly, Aaron Swartz.

As David explained the goals of the ConnectED program, he made the point that if only one or two kids in every school gets fired up to build and learn on their own, that could make a huge difference to the future of our country.

It’s easy to see how kids get exposed to programming when they live in Silicon Valley or another high-tech hub. It’s a lot harder in many other parts of the country. So we’re glad to be part of the ConnectEd program, and hope that one day we’ll all be using powerful new services that got built because some kid, somewhere, got his start programming as a result of our participation in this initiative.

August 15 2012

Mining the astronomical literature

There is a huge debate right now about making academic literature freely accessible and moving toward open access. But what would be possible if people stopped talking about it and just dug in and got on with it?

NASA’s Astrophysics Data System (ADS), hosted by the Smithsonian Astrophysical Observatory (SAO), has quietly been working away since the mid-’90s. Without much, if any, fanfare amongst the other disciplines, it has moved astronomers into a world where access to the literature is just a given. It’s something they don’t have to think about all that much.

The ADS service provides access to abstracts for virtually all of the astronomical literature. But it also provides access to the full text of more than half a million papers, going right back to the start of peer-reviewed journals in the 1800s. The service has links to online data archives, along with reference and citation information for each of the papers, and it’s all searchable and downloadable.

Number of papers published in the three main astronomy journals each year
Number of papers published in the three main astronomy journals each year. CREDIT: Robert Simpson

The existence of the ADS, along with the arXiv pre-print server, has meant that most astronomers haven’t seen the inside of a brick-built library since the late 1990s.

It also makes astronomy almost uniquely well placed for interesting data mining experiments, experiments that hint at what the rest of academia could do if they followed astronomy’s lead. The fact that the discipline’s literature has been scanned, archived, indexed and catalogued, and placed behind a RESTful API makes it a treasure trove, both for hypothesis generation and sociological research.

For example, the .Astronomy series of conferences is a small workshop that brings together the best and the brightest of the technical community: researchers, developers, educators and communicators. Billed as “20% time for astronomers,” it gives these people space to think about how the new technologies affect both how research and communicating research to their peers and to the public is done.

[Disclosure: I'm a member of the advisory board to the .Astronomy conference, and I previously served as a member of the programme organising committee for the conference series.]

It should perhaps come as little surprise that one of the more interesting projects to come out of a hack day held as part of this year’s .Astronomy meeting in Heidelberg was work by Robert Simpson, Karen Masters and Sarah Kendrew that focused on data mining the astronomical literature.

The team grabbed and processed the titles and abstracts of all the papers from the Astrophysical Journal (ApJ), Astronomy & Astrophysics (A&A), and the Monthly Notices of the Royal Astronomical Society (MNRAS) since each of those journals started publication — and that’s 1827 in the case of MNRAS.

By the end of the day, they’d found some interesting results showing how various terms have trended over time. The results were similar to what’s found in Google Books’ Ngram Viewer.

The relative popularity of the names of telescopes in the literature
The relative popularity of the names of telescopes in the literature. Hubble, Chandra and Spitzer seem to have taken turns in hogging the limelight, much as COBE, WMAP and Planck have each contributed to our knowledge of the cosmic microwave background in successive decades. References to Planck are still on the rise. CREDIT: Robert Simpson.

After the meeting, however, Robert has taken his initial results and explored the astronomical literature and his new corpus of data on the literature. He’s explored various visualisations of the data, including word matrixes for related terms and for various astro-chemistry.

Correlation between terms related to Active Galactic Nuclei
Correlation between terms related to Active Galactic Nuclei (AGN). The opacity of each square represents the strength of the correlation between the terms. CREDIT: Robert Simpson.

He’s also taken a look at authorship in astronomy and is starting to find some interesting trends.

Fraction of astronomical papers published with one, two, three, four or more authors
Fraction of astronomical papers published with one, two, three, four or more authors. CREDIT: Robert Simpson

You can see that single-author papers dominated for most of the 20th century. Around 1960, we see the decline begin, as two- and three-author papers begin to become a significant chunk of the whole. In 1978, author papers become more prevalent than single-author papers.

Compare the number of active research astronomers to the number of papers published each year
Compare the number of “active” research astronomers to the number of papers published each year (across all the major journals). CREDIT: Robert Simpson.

Here we see that people begin to outpace papers in the 1960s. This may reflect the fact that as we get more technical as a field, and more specialised, it takes more people to write the same number of papers, which is a sort of interesting result all by itself.

Interview with Robert Simpson: Behind the project and what lies ahead

I recently talked with Rob about the work he, Karen Masters, and Sarah Kendrew did at the meeting, and the work he’s been doing since with the newly gathered data.

What made you think about data mining the ADS?

Robert Simpson: At the .Astronomy 4 Hack Day in July, Sarah Kendrew had the idea to try to do an astronomy version of BrainSCANr, a project that generates new hypotheses in the neuroscience literature. I’ve had a go at mining ADS and arXiv before, so it seemed like a great excuse to dive back in.

Do you think there might be actual science that could be done here?

Robert Simpson: Yes, in the form of finding questions that were unexpected. With such large volumes of peer-reviewed papers being produced daily in astronomy, there is a lot being said. Most researchers can only try to keep up with it all — my daily RSS feed from arXiv is next to useless, it’s so bloated. In amongst all that text, there must be connections and relationships that are being missed by the community at large, hidden in the chatter. Maybe we can develop simple techniques to highlight potential missed links, i.e. generate new hypotheses from the mass of words and data.

Are the results coming out of the work useful for auditing academics?

Robert Simpson: Well, perhaps, but that would be tricky territory in my opinion. I’ve only just begun to explore the data around authorship in astronomy. One thing that is clear is that we can see a big trend toward collaborative work. In 2012, only 6% of papers were single-author efforts, compared with 70+% in the 1950s.

The average number of authors per paper since 1827
The above plot shows the average number of authors, per paper since 1827. CREDIT: Robert Simpson.

We can measure how large groups are becoming, and who is part of which groups. In that sense, we can audit research groups, and maybe individual people. The big issue is keeping track of people through variations in their names and affiliations. Identifying authors is probably a solved problem if we look at ORCID.

What about citations? Can you draw any comparisons with h-index data?

Robert Simpson: I haven’t looked at h-index stuff specifically, at least not yet, but citations are fun. I looked at the trends surrounding the term “dark matter” and saw something interesting. Mentions of dark matter rise steadily after it first appears in the late ’70s.

Compare the term dark matter with related terms
Compare the term “dark matter” with a few other related terms: “cosmology,” “big bang,” “dark energy,” and “wmap.” You can see cosmology has been getting more popular since the 1990s, and dark energy is a recent addition. CREDIT: Robert Simpson.

In the data, astronomy becomes more and more obsessed with dark matter — the term appears in 1% of all papers by the end of the ’80s and 6% today.

Looking at citations changes the picture. The community is writing papers about dark matter more and more each year, but they are getting fewer citations than they used to (the peak for this was in the late ’90s). These trends are normalised, so the only regency effect I can think of is that dark matter papers take more than 10 years to become citable. Either that or dark matter studies are currently in a trough for impact.

Can you see where work is dropped by parts of the community and picked up again?

Robert Simpson: Not yet, but I see what you mean. I need to build a better picture of the community and its components.

Can you build a social graph of astronomers out of this data? What about (academic) family trees?

Robert Simpson: Identifying unique authors is my next step, followed by creating fingerprints of individuals at a given point in time. When do people create their first-author papers, when do they have the most impact in their careers, stuff like that.

What tools did you use? In hindsight, would you do it differently?

I’m using Ruby and Perl to grab the data, MySQL to store and query it, JavaScript to display it (Google Charts and D3.js). I may still move the database part to MongoDB because it was designed to store documents. Similarly, I may switch from ADS to arXiv as the data source. Using arXiv would allow me to grab the full text in many cases, even if it does introduce a peer-review issue.

What’s next?

Robert Simpson: My aim is still to attempt real hypothesis generation. I’ve begun the process by investigating correlations between terms in the literature, but I think the power will be in being able to compare all terms with all terms and looking for the unexpected. Terms may correlate indirectly (via a third term, for example), so the entire corpus needs to be processed and optimised to make it work comprehensively.

Science between the cracks

I’m really looking forward to seeing more results coming out of Robert’s work. This sort of analysis hasn’t really been possible before. It’s showing a lot of promise both from a sociological angle, with the ability to do research into how science is done and how that has changed, but also ultimately as a hypothesis engine — something that can generate new science in and of itself. This is just a hack day experiment. Imagine what could be done if the literature were more open and this sort of analysis could be done across fields?

Right now, a lot of the most interesting science is being done in the cracks between disciplines, but the hardest part of that sort of work is often trying to understand the literature of the discipline that isn’t your own. Robert’s project offers a lot of hope that this may soon become easier.

Sponsored post

June 08 2012

In defense of frivolities and open-ended experiments

My first child was born just about nine months ago. From the hospital window on that memorable day, I could see that it was surprisingly sunny for a Berkeley autumn afternoon. At the time, I'd only slept about three of the last 38 hours. My mind was making up for the missing haze that usually fills the Berkeley sky. Despite my cloudy state, I can easily recall those moments following my first afternoon laying with my newborn son. In those minutes, he cleared my mind better than the sun had cleared the Berkeley skies.

While my wife slept and recovered, I talked to my boy, welcoming him into this strange world and his newfound existence. I told him how excited I was for him to learn about it all: the sky, planets, stars, galaxies, animals, happiness, sadness, laughter. As I talked, I came to realize how many concepts I understand that he lacked. For every new thing I mentioned, I realized there were 10 more that he would need to learn just to understand that one.

Of course, he need not know specific facts to appreciate the sun's warmth, but to understand what the sun is, he must first learn the pyramid of knowledge that encapsulates our understanding of it: He must learn to distinguish self from other; he must learn about time, scale and distance and proportion, light and energy, motion, vision, sensation, and so on.

Anatomy of a sunset

I mentioned time. Ultimately, I regressed to talking about language, mathematics, history, ancient Egypt, and the Pyramids. It was the verbal equivalent of "wiki walking," wherein I go to Wikipedia to look up an innocuous fact, such as the density of gold, and find myself reading about Mesopotamian religious practices an hour later.

It struck me then how incredible human culture, science, and technology truly are. For billions of years, life was restricted to a nearly memoryless existence, at most relying upon brief changes in chemical gradients to move closer to nutrient sources or farther from toxins.

With time, these basic chemo- and photo-sensory apparatuses evolved; creatures with longer memories — perhaps long enough to remember where food sources were richest — possessed an evolutionary advantage. Eventually, the time scales on which memory operates extended longer; short-term memory became long-term memory, and brains evolved the ability to maintain a memory across an entire biological lifetime. (In fact, how the brain coordinates such memories is a core question of my neuroscientific research.)


However, memory did not stop there. Language permitted interpersonal communication, and primates finally overcame the memory limitations of a single lifespan. Writing and culture imbued an increased permanence to memory, impervious to the requirement for knowledge to pass verbally, thus improving the fidelity of memory and minimizing the costs of the "telephone game effect."

We are now in the digital age, where we are freed from the confines of needing to remember a phone number or other arbitrary facts. While I'd like to think that we're using this "extra storage" for useful purposes, sadly I can tell you more about minutiae of the Marvel Universe and "Star Wars" canon than will ever be useful (short of an alien invasion in which our survival as a species is predicated on my ability to tell you that Nightcrawler doesn't, strictly speaking, teleport, but rather he travels through another dimension, and when he reappears in our dimension the "BAMF" sound results from some sulfuric gasses entering our dimension upon his return).

But I wiki-walk digress.

So what does all of this extra memory gain us?

Accelerated innovation.

As a scientist my (hopefully) novel research is built upon the unfathomable number of failures and successes dedicated by those who came before me. The common refrain is that we scientists stand on the shoulders of giants. It is for this reason that I've previously argued that research funding is so critical, even for apparently "frivolous" projects. I've got a Google Doc noting impressive breakthroughs that emerged from research that, on the surface, has no "practical" value:

Although you can't legislate innovation or democratize a breakthrough, you can encourage a system that maximizes the probability that a breakthrough can occur. This is what science should be doing and this is, to a certain extent, what Silicon Valley is already doing.

The more data, information, software, tools, and knowledge available, the more we as a society can build upon previous work. (That said, even though I'm a huge proponent for more data, the most transformational theory from biology came about from solid critical thinking, logical, and sparse data collection.)

Of course, I'm biased, but I'm going to talk about two projects in which I'm involved: one business and one scientific. The first is Uber, an on-demand car service that allows users to request a private car via their smartphone or SMS. Uber is built using a variety of open software and tools such as Python, MySQL, node.js, and others. These systems helped make Uber possible.

Uber screenshot

As a non-engineer, it's staggering to think of the complexity of the systems that make Uber work: GPS, accurate mapping tools, a reliable cellular/SMS system, automated dispatching system, and so on. But we as a culture become so quickly accustomed to certain advances that, should our system ever experience a service disruption, Louis C.K. would almost certainly be prophetic about the response:

The other project in which I'm involved is brainSCANr. My wife and I recently published a paper on this, but the basic idea is that we mined the text of more than three million peer-reviewed neuroscience research articles to find associations between topics and search for potentially missing links (which we called "semi-automated hypothesis generation").

We built the first version of the site in a week, using nothing but open data and tools. The National Library of Medicine, part of the National Institutes of Health, provides an API to search all of these manuscripts in their massive, 20-million-paper-plus database. We used Python to process the associations, the JavaScript InfoVis Toolkit to plot the data, and Google App Engine to host it all. I'm positive when the NIH funded the creation of PubMed and its API, they didn't have this kind of project in mind.

That's the great thing about making more tools available; it's arrogant to think that we can anticipate the best ways to make use of our own creations. My hope is that brainSCANr is the weakest incarnation of this kind of scientific text mining, and that bigger and better things will come of it.

Twenty years ago, these projects would have been practically impossible, meaning that the amount of labor involved to make them would have been impractical. Now they can be built by a handful of people (or a guy and his pregnant wife) in a week.

Just as research into black holes can lead to a breakthrough in wireless communication, so too can seemingly benign software technologies open amazing and unpredictable frontiers. Who would have guessed that what began with a simple online bookstore would grow into Amazon Web Services, a tool that is playing an ever-important role in innovation and scientific computing such as genetic sequencing?

So, before you scoff at the "pointlessness" of social networks or the wastefulness of "another web service," remember that we don't always do the research that will lead to the best immediate applications or build the company that is immediately useful or profitable. Nor can we always anticipate how our products will be used. It's easy to mock Twitter because you don't care to hear about who ate what for lunch, but I guarantee that the people whose lives were saved after the Haiti earthquake or who coordinated the spark of the Arab Spring are happy Twitter exists.

While we might have to justify ourselves to granting agencies, or venture capitalists, or our shareholders in order to do the work we want to do, sometimes the "real" reason we spend so much of our time working is the same reason people climb mountains: because it's awesome that we can. That said, it's nice to know that what we're building now will be improved upon by our children in ways we can't even conceive.

I can't wait to have this conversation with my son when — after learning how to talk, of course — he's had a chance to build on the frivolities of my generation.


April 01 2012

What is smart disclosure?

Citizens generate an enormous amount of economically valuable data through interactions with with companies and government. Earlier this year, a report from the World Economic Forum and McKinsey Consulting described the emergence of personal data as of a new asset class." The value created from such data does not , however, always go to the benefit of consumers, particularly when third parties collect it, separating people from their personal data.

The emergence of new technologies and government policies has provided an opportunity to both empower consumers and create new markets from "smarter disclosure" of this personal data. Smart disclosure is when a private company or government agency provides a person with periodic access to his or her own data in open formats that enable them to easily put the data to use. Specifically, smart disclosure refers to the timely release of data in standardized, machine readable formats in ways that enable consumers to make better decisions about finance, healthcare, energy or other contexts.

Smart disclosure is "a new tool that helps provide consumers with greater access to the information they need to make informed choices," wrote Cass Sunstein, the U.S. administrator of the White House Office of Information and Regulatory Affairs (OIRA), in a post on smart disclosure on the White House blog. Sunstein delivered a keynote address at the White House Summit on smart disclosure at the U.S. National Archives on Friday. He authored a memorandum providing  guidance on smart disclosure guidance from OIRA in September 2011.

Smart disclosure is part of the final United States National Action Plan for its participation in the Open Government Partnership." Speaking at the launch of the Open Government Partnership in New York City last September, the president specifically referred to the role of smart disclosure in the United States:

"We’ve developed new tools -- called 'smart disclosures' -- so that the data we make public can help people make health care choices, help small businesses innovate, and help scientists achieve new breakthroughs," said President Obama. "We’ve been promoting greater disclosure of government information, empowering citizens with new ways to participate in their democracy," said President Obama. "We are releasing more data in usable forms on health and safety and the environment, because information is power, and helping people make informed decisions and entrepreneurs turn data into new products, they create new jobs."

In the months since the announcement, the U.S. National Science and Technology Council established a smart disclosure task force dedicated to promoting better policies and implementation across government.

"In many contexts, the federal government uses disclosure as a way to ensure that consumers know what they are purchasing and are able to compare alternatives," wrote Sunstein at the White House blog. "Consider nutrition facts labels, the newly designed automobile fuel economy labels, and  Modern technologies are giving rise to a series of new possibilities for promoting informed decisions."

Smart disclosure is a "case of the Administration asking agencies to focus on making available high value data (as distinct from traditional transparency and accountability data) for purposes other than decreasing corruption in government," wrote New York Law School professor Beth Noveck, the former U.S. deputy chief technology officer for open government, in an email. "It starts from the premise that consumers, when given access to information and useful decision tools built by third parties using that information, can self-regulate and stand on a more level playing field with companies who otherwise seek to obfuscate." The choice of Todd Park as United States CTO also sends a message about the importance of smart disclosure to the administration, she said.

The United Kingdom's “midata” smart disclosure initiative is an important smart disclosure case study outside of the United States. Progress there has come in large part because the UK has a privacy law that gives citizens the right to access their personal data held by private companies, unlike the United States. In the UK, however, companies have been complying with the law in a way that did not realize the real potential value of that right to data, which is to say that a citizen could request personal data and it would arrive the mail weeks later at a cost of a few dozen pounds. The UK government has launched a voluntary public-private partnership to enable companies to comply with the law by making the data available online in open formats. The recent introduction of the Consumer Privacy Bill of Rights from the White House and Privacy Report from the FTC suggests that such rights to personal data ownership might be negotiated, in principle, much as a right to credit reports have been in the past.

Four categories of smart disclosure

One of the most powerful versions of smart disclosure is when data on products or services (including pricing algorithms, quality, and features) is combined with personal data (like customer usage history, credit score, health, energy and education data) into "choice engines" (like search engines, interactive maps or mobile applications) that enable consumers to make better decisions in context, at the point of a buying or contractual decision. There are four broad categories where smart disclosure applies:

  1. When government releases data about products or services. For instance, when the Department of Health and Human Services releases hospital quality ratings, the Security and Exchange Commission releases public company financial filings in machine-readable formats at, or the Department of Education puts data about more than 7,000 institutions online in a College Navigator for prospective students.
  2. When government releases personal data about a citizen. For instance, when the Department of Veterans Affairs gives veterans access to health records using at the "Blue Button" or the IRS provides citizens with online access to their electronic tax transcript. The work of BrightScope liberating financial advisor data and 401(k) data has been an early signal of how data drives the innovation economy.
  3. When a private company releases information about products or services in machine readable formats. Entrepreneurs can then use that data to empower consumers. For instance, both and Hello Wallet may enhance consumer finance decisions.
  4. When a private company releases personal data about usage to a citizen. For instance, when a power utility company provides a household access to its energy usage data through the Green Button or when banks allowing customers to download their transaction histories in a machine readable format to use at or similar services. As with the Blue Button for healthcare data and consumer finance, the White House asserts that providing energy consumers with secure access to information about energy usage will increase innovation in the sector and empower citizens with more information.

An expanding colorwheel of buttons

Should smart disclosure initiatives continue to gather steam, citizens could see “Blue Button”-like and "Green Button"-like solutions for every kind of data government or industry collects about citizens.  For example, the Department of Defense has military training and experience records. Social Security and the Internal Revenue Service have the historical financial history of citizens, such as earnings and income. The Department of Veterans Affairs and Centers for Medicare and Medicaid Services have personal health records.

More "Green Button"-like mechanisms could enable secure, private access to private industry collects about citizen services. The latter could includes mobile phone bills, credit card fees, mortgage disclosures, mutual fund fee and more, except where there are legal restrictions, as for national security reasons.

Earlier this year, influential venture capitalist Fred Wilson encouraged entrepreneurs and VCs to get behind open data. Writing on his widely read blog, Wilson urged developers to adopt the Green Button.

"This is the kind of innovation that gets me excited," Wilson wrote. "The Green Button is like OAuth for energy data. It is a simple standard that the utilities can implement on one side and web/mobile developers can implement on the other side. And the result is a ton of information sharing about energy consumption and in all likelihood energy savings that result from more informed consumers.

When citizens gain access to data and put it to work, they can tap it to make better choices about everything from finance to healthcare to real estate, much in the same way that Web applications like Hipmunk and Zillow let consumers make more informed decisions.

"I'm a big fan of simplicity and open standards to unleash a lot of innovation," wrote Wilson. "APIs and open data aren't always simple concepts for end users. Green Buttons and Blue Buttons are pretty simple concepts that most consumers will understand. I'm hoping we soon see Yellow Buttons, Red Buttons, Purple Buttons, and Orange Buttons too. Let's get behind these open data initiatives. Let's build them into our apps. And let's pressure our hospitals, utilities, and other institutions to support them."

The next generation of open data is personal data, wrote open government analyst David Eaves this month:

I would love to see the blue button and green button initiative spread to companies and jurisdictions outside the United States. There is no reason why for example there cannot be Blue Buttons on the Provincial Health Care website in Canada, or the UK. Nor is there any reason why provincial energy corporations like BC Hydro or Bullfrog Energy (there's a progressive company that would get this) couldn't implement the Green Button. Doing so would enable Canadian software developers to create applications that could use this data and help citizens and tap into the US market. Conversely, Canadian citizens could tap into applications created in the US.

The opportunity here is huge. Not only could this revolutionize citizens access to their own health and energy consumption data, it would reduce the costs of sharing health care records, which in turn could potentially create savings for the industry at large.

Data drives consumer finance innovation

Despite recent headlines about the Green Button and the household energy data market, the biggest US smart disclosure story of this type is currently consumer finance, where there is already significant private sector activity going on today.

For instance, if a consumer visits, you can get personalized recommendations for a cheaper cell phone plan based on your calling history. will make specific recommendations on how to save (and alternative products to use) based on an analysis of the accounts it is pulling data from. Hello Wallet is enabled by smart disclosure by banks and government data. The sector's success hints at the innovation that's possible when people get open, portable access to their personal data in a a consumer market of sufficient size and value to attract entrepreneurial activity.

Such innovation is enabled in part because entrepreneurs and developers can go directly to data aggregation intermediaries like Yodlee or CashEdge and license the data, meaning that they do not have to strike deals directly with each of the private companies or build their own screen scraping technology, although some do go it alone.

"How do people actually make decisions?  How can data help improve those decisions in complex markets?  Research questions like these in behavioral economics are priorities for both the Russell Sage Foundation and the Alfred P. Sloan Foundation," said Daniel Goroff, a Sloan Program Director, in an interview yesterday.  "That's why we are launching a 'Smart Disclosure Research and Demonstration Design Competition.'  If you have ideas and want to win a prize,  please send a short essay.  Even if you are not in a position to carry out the work, we are especially interested in finding and funding projects that can help measure the costs and benefits of existing or novel 'choice engines.'" 

What is the future of smart disclosure?

This kind of vibrant innovation could spread to many other sectors, like energy, health, education, telecommunication, food and nutrition, if relevant data were liberated. The Green Button is an early signal in this area, with the potential to spread to 27 million households around the United States. The Blue Button, with over 800,000 current users, is spreading to private health plans like Aetna and Walgreens, with the potential to spread to 21 million users.

Despite an increasingly number of powerful tools that enable data journalists and scientists to interrogate data, many of even the most literate consumers do not look at data themselves, particularly if it is in machine-readable, as opposed to human-readable formats. Instead, they digest it from ratings agencies, consumer reports and guides to the best services or products in a given area. Increasingly, entrepreneurs are combining data with applications, algorithms and improved user interfaces to provide consumers with "choice engines."

As Tim O'Reilly outlined in his keynote speech yesterday, the future of smart disclosure includes more than quarterly data disclosure from the SEC or banks. If you're really lining up with the future, you have to think about real-time data and real-time data systems, he said. Tim outlined 10 key lessons his presentation, an annotated version of which is embedded below.

The Future of Smart Disclosure (pdf)
View more presentations from Tim O'Reilly

When released through smart disclosure, data resembles a classic "public good" in a broader economic sense. Disclosures of such open data in a useful format are currently under-produced by the marketplace, suggesting a potential role for government in the facilitation of its release. Generally, consumers do not have access to it today.

Well over a century ago, President Lincoln said that "the legitimate object of government is to do for the people what needs to be done, but which they cannot by individual effort do at all, or do so well, for themselves." The thesis behind smart disclosure in the 21st century is that when consumers have access to that personal data and the market creates new tools to put to work, citizens will be empowered make economic, education and lifestyle choices that enable to them to live healthier, wealthier, and -- in the most aspirational sense -- happier lives.

"Moving the government into the 21st century should be applauded," wrote Richard Thaler, an economics professor at the University of Chicago, in the New York Times last year. In a time when so many citizens are struggling with economic woes, unemployment and the high costs of energy, education and healthcare, better tools that help them invest and benefit from personal data are sorely needed..

March 07 2012

The dilemma of authentic learning: Do you destroy what you measure?

John Seely Brown tells us the half-life of any skill is about five years. This astounding metric is presented as part of the ongoing discussion of how education needs to change radically in order to prepare students for a world which is very different than the one their parents graduated into, and in which change is accelerating.

It's pretty straightforward to recognize that new job categories, such as data science, will require new skills. The first-order solution is to add data science as a college curriculum and work the prerequisites backward to kindergarten. But if JSB is right about the half-life of skills, even if this process were instantaneous, the learning path begun in kindergarten might be obsolete by middle school.

The second-order solution is to include meta-skills into the curriculum — ensuring young people learn how to learn, for instance, so that they can adapt as new skills are required with increasing frequency. This is essential, but raises the question of how to stay ahead of the skills curve — what are the next critical things to learn, how do you know, and how do you find them?

John Seely Brown and co-author Douglas Thomas propose in their book "A New Culture of Learning: Cultivating the Imagination for a World of Constant Change" a third-order solution, which is to inculcate the mindsets and dispositions that will lead us, as independent agents, to the things that matter. These include curiosity, questing, and connecting.

A similar theme emerged at the Design, Make, Play workshop at the New York Hall of Science in January. Focused on the question of how the maker movement can catalyze innovation in science, technology, engineering, and mathematics (STEM) education, participants included technologists, makers, learning science researchers, educators, and more, all wrestling with how to translate the authentic, integrated experiences that designing, making, and playing provide into something that can be measured, understood, and incorporated into education.

The primary outcomes of making, designing, and playing look much more like JSB's dispositions than the skills demonstrated on standardized tests of reading, writing, and arithmetic. At the same time, though, practical skills are developed — the kinds of projects exhibited at Maker Faire require the same skills as many high tech professions.

This highlights the most pernicious, devilish, intransigent challenge to bringing critical learning into school. Through the lens of standardized tests, higher order skills, meta-skills, and dispositions are literally invisible. Yet, these tests are the gold standard of educational efficacy for judging schools, educational innovations, and now even teachers themselves. School boards are held accountable by property owners for such test results due to their direct correlation to property values. Innovators, researchers, and even the philanthropic institutions that fund them are beholden to education investors for meaningful results that prove innovations work — with test scores as the default.

This conundrum is well understood by the very stakeholders who are trapped by it, and there are efforts at many levels to combat it — from incorporating critical thinking skills into the core standards being adopted by most states to alternative measures of effectiveness being adopted by grant makers. At the DMP workshop, participants struggled with the very real challenge of authentically articulating the benefits of design, make, and play at different levels and the measures that would make these benefits visible. It's a tricky balancing act to reduce something to metrics without losing its essence.

One fascinating approach was presented by Kevin Crowley about how to recognize the impact of science experiences such as those found in museum exhibits on young people. Crowley and his colleagues researched the forces and events that influenced scientists and science enthusiasts in their career/hobby choices. They identified the notion of experiences that caused "science learning activation," which they defined as a "composite of dispositions, skills, and knowledge that enables success in science learning experiences." The idea is that perhaps we can measure the degree to which a specific informal learning experience creates such activation and that this becomes one of the measures that shines a light on the outcomes of making.

As the gathered experts brainstormed to articulate the genuine outcomes of making for students and how to capture those, it became clear that this is a task that is both crucial and emergent. If authentic learning is to become available to all students regardless of means or zip code, the iterative and ongoing process of articulating the educational values of a world of rapidly changing expectations must become a priority for experts and lay folk alike. What are your thoughts? How do we capture and share the soul of making without turning it into something that can be tested using the No. 2 pencil?


November 30 2011

Developer Week in Review: Siri is the talk of the town

After a one-week hiatus, during which research was undertaken in waistline enhancement via the consumption of starch and protein materials, we're back to see what's been happening in the non-turkey-related fields.

Imitation is the sincerest form of flattery

SiriIt's an interesting time for the voice-enabled smartphone field. On the one hand, some industry pundits with vested interests are claiming that people don't want to talk to their phones and don't want them to be assistants. Perhaps they have forgotten that the original smartphones were offshoots of the PDA market, and that PDA doesn't stand for "public display of affection" in this case.

At the other extreme, we have Microsoft stating that Apple's Siri is just a knock-off of Windows Tellme, a claim that has been placed into question by several head-to-head comparisons of features.

Of most interest to the developer community are reports that the latest iOS beta release contains additional hooks to allow applications to integrate into Siri's voice recognition functionality. I talked about the possibility that Apple would be expanding the use of Siri into third-party apps a few weeks ago, and the new features in the beta seem to confirm that voice is going to be made available as a feature throughout applications. This would be a real game changer, in everything from games to GPS applications on the iOS platform.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Computer science for the masses

Two interesting pieces of news this time around on the educational front. In the higher learning arena, Stanford is expanding its free online computer science courseware with several new classes, including one on machine learning. Although you can't earn a free degree this way, you can get computer-graded test results to go along with the recorded lectures. This material will be very useful, even to grizzly old veterans such as myself, who may have a hole or two in their theoretical underpinnings. For a bright high school student who has exhausted his or her school's CS offerings, it could also serve as a next step.

Meanwhile, in the UK, the government seems to be moving toward having all students learn the basics of programming. I worry about this, on two fronts. First, it is unclear if the majority of students really need to learn software engineering or would benefit from it. Force-feeding coding skills into students who may not have the aptitude or proclivity to want to learn them seems unwise to me and is likely to slow down the students who might actually have a desire to learn the subject. Second, I have my doubts that a government-designed software engineering curriculum would actually be any good.

Is there anything JavaScript can't do?

JavaScriptJavaScript is often derided by "serious" computer professionals as a poorly designed toy language unfit for "real" software engineering. Yet, those who spend time using it know that you can produce some impressive results with it.

For example, there is now a JavaScript implementation of the OpenPGP message specification, which would allow JavaScript code to send and receive encrypted messages. And if you really want to go out on a limb, you could always develop a Java Virtual Machine byte code interpreter written entirely in JavaScript (somewhere, James Gosling is crying ...).

There's no question that JavaScript has its weak points, but its near-ubiquity makes it an incredibly useful spanner to carry around in your tool belt. Developers, sneer at your own risk. Like cockroaches, JavaScript may be around well after some more traditional languages have turned to dust.

Got news?

Please send tips and leads here.


November 23 2011

November 15 2011

Helping educators find the right stuff

Learning RegistryEducation innovation will require scalable, national, open, interoperable systems that support data feedback loops. At the recent State Education Technology Director's Association's (SETDA) Leadership Summit, the United States Department of Education launched the Learning Registry, a powerful step toward creating the ecosystem infrastructure that will enable such systems.

The Learning Registry addresses the problem of discoverability of education resources. There are countless repositories of fantastic educational content, from user-generated and curated sites to Open Education Resources to private sector publisher sites. Yet, with all this high-quality content available to teachers, it is still nearly impossible to find content to use with a particular lesson plan for a particular grade aligned to particular standards. Regrettably, it is often easier for a teacher to develop his own content than to find just the right thing on the Internet.

Schools, states, individuals, and professional communities have historically addressed this challenge by curating lists of content; rating and reviewing sites; and sharing their finds via websites, Twitter and other social media platforms. With aggregated sites to peruse, a teacher might increase his odds of finding that "just right" content, but it is still often a losing proposition. As an alternative, most educators will resort to Google, but as Secretary of Education Arne Duncan told the SETDA members, "Today's search engines do many things well, but they aren't designed to directly support teaching and learning. The Learning Registry aims to fix this problem." Aneesh Chopra, United States CTO, called the project the flagship open-government initiative for the Department of Education.

The Department of Education and the Department of Defense set out to solve the problem of discoverability, each contributing $1.3 million to the registry project. Steve Midgley, Deputy Director for the Office of Educational Technology pointed out, "We didn't build another portal — that would not be the proper role of the federal government." Instead, the proper role as Midgley envisioned it was to create infrastructure that would enable all stakeholders to share valuable information and resources in a non-centralized, open way.

In short, the Learning Registry has created open application programming interfaces (APIs) that allow publishers and others to quickly publish metadata and paradata about their content. For instance, the Smithsonian could assert digitally that a certain piece of video is intended for ages 5-7 in natural science, aligned with specific state standards. Software developers could include algorithms in lesson-planning software systems that extract, sign, and send information, such as: "A third grade teacher used this video in a lesson plan on the bridges of Portland." Browser developers could write code to include this data in search results and to increase result relevance based on ratings and reputations from trusted sources. In fact, Midgley showed the SETDA audience a prototype browser plug-in that did just that.

The virtue of this system comes from the platform thinking behind its design — an open communication system versus a portal — and from the value it provides to users from the very beginning. In the early days, improved discoverability of relevant content is a boon to both the teacher who discovers it and the content owner who publishes it. The APIs are structured in such a way that well-implemented code will collect valuable information about how the content is used as a side effect of educators, parents, and others simply doing their daily work. Over time, a body of metadata and paradata will emerge that identifies educational content; detailed data about how it has been used and interacted with; as well as rating, reputation and other information that can feed interesting new analytics, visualizations, and meaningful presentation of information to teachers, parents, researchers, administrators and developers.

Midgley called for innovative developers and entrepreneurs to take advantage of this enabling system for data collection in the education market. As the simple uses begin to drive use cases that shed increasingly rich data, there will be new opportunities to build businesses based on analytics and the meaningful presentation of rich new data to teachers, parents, students, and others who have an interest in teaching and learning.

I am delighted and intrigued to see the Department of Education leading with infrastructure over point solutions. As Richard Culatta, Education Fellow in Senator Patty Murray's office, said to the audience, "When common frameworks are put in place, it allows smart people to do really creative things."

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20


November 10 2011

Access or ownership: Which will be the default?

Open QuestionIn a recent article, The Atlantic takes a looks at the threads that connect Steve Case's investments:

A luxury-home network. A car-sharing company. An explosive deal site. Maybe you see three random ideas. [Steve] Case and his team saw three bets that paid off thanks to a new Web economy that promotes power in numbers and access over ownership. [Emphasis added.]

From time to time at Radar we've been checking in on this "access vs. ownership" trend.

For example, Lisa Gansky, author of "The Mesh," explained why businesses need to embrace sharing and open systems

Corey Pressman, founder of Exprima Media, discussed the role customization will play in an access-dominant media world:

... music access versus ownership is very compelling. I could see a possible near future in which "accessible music" (streaming unlimited cloud access) trumps "owned music" (purchased CDs or downloads). In this scenario, customization — creating customized playlists — is external to the media; customization is handled by the conduit, not the content.

More from Pressman here.

In "What if a book is just a URL?", Radar contributor Jenn Webb pointed out ebook companies that ignore downloads and instead provide access to material.

And in an interview with Audrey Watters, education theorist George Siemens noted that in the education data/analytics world, "Data access and ownership are equally important issues: who should be able to see the analysis that schools perform on learners?"

Business, media, publishing, data, education — these are all areas where access vs. ownership has organically popped up in our coverage. And it's easy to see how the same trend applies to the technical side: access requires storage and ubiquity, which generally leads to a cloud solution (and then you get into issues like public cloud vs private cloud, who's responsible for uptime, what happens when there's a breach, who actually owns that data, how do you maximize performance, and on and on ...)

What's your take? Will access become the default? Or is ownership a hardwired trait?

Please weigh in through the comments or join the conversation at Google+.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20


November 07 2011

Three game characteristics that can be applied to education

In a related post, I talked about what the notion of gamification as applied to education might mean on three levels. In particular, I described the lessons that might be learned by the field of education from the different types of gaming encountered in World of Warcraft and Minecraft — two very different online multiplayer games. In this post, I look at the technology roadmap that can support these three levels of application in real schools.

Level 1: Leveling up and questing

The first level is one where leveling, questing, and leaderboards can help motivate students to engage more with their schoolwork. Like a gamer who chooses his or her own path and pace to "level up," a student will choose his or her own path and pace to learn a standard curriculum and be able to prove advancement to that next level through performance on tests.

The technology to be successful at this level exists today — the obstacle is cost, and the payoff is more students demonstrating success on state tests, closing the achievement gap. To work, this model calls for a mobile device with plenty of bandwidth for every student and software that lets the student level up at his or her own pace. The software can be an online course or something more sophisticated and engaging. The idea is that with software support to allow personalization for each student, teachers will have more time to spend with individual students and small groups to help them succeed with whatever unique challenges they are working on that day.

Despite the numerous challenges to achieve this level in reality, this is actually the easiest of the three levels.

First, this level is easy because objective standards of "better" exist — higher scores on standardized state tests. A school can try various online classes or drills, or adaptive software with its particular students, and standardized test scores will provide the data regarding what worked best for them.

Second, this level is easy because the technology infrastructure degrades gracefully — it still works even if students don't have a device of their own. The first gains will come from just allowing students to work at their own pace on shared school computers. Since real schools are likely to have an uneven and years-long transition from the shared computer labs that most schools have today to ubiquitous computing environments, schools can make every penny count by creating an IT roadmap that supports self-paced leveling. In short, this will involve transitioning to cloud-based services as quickly as possible and increasing computer-to-student ratios and bandwidth as budgets allow.

Third, this level is easy because there are already processes in place for evolving the definition of "better." For more than 40 states, current standards are being replaced by the Common Core Standards developed through an initiative by the Council of Chief State School Officers and the National Governors' Association. The Common Core Assessments that are being developed to support these standards not only raise the bar for existing basic skills, but create assessments for higher-order thinking skills. By following their IT roadmaps, schools will be able to swap out current online tests for more sophisticated online tests over time, with no new technology architecture needed to participate in that continual improvement. If they have chosen cloud-based software that is easy to opt into and out of, they can experiment with new applications at will to see which ones best help their students perform on these increasingly sophisticated tests.

Level 2: Group collaboration

The second level is more like the World of Warcraft gameplay called "raiding" — group collaboration to achieve a shared goal. In Warcraft, that could involve downing a boss while in school it could be a collaboration on a book about local ecology. To the degree that work (or play) happens digitally, leaders (or teachers) can get rich insight into everyone's contributions and participation.

This level is hard. First of all, there is no agreement on what these collaboration and communication skills should look like. Second, there are, consequently, no assessments for these skills available. Third, there is no software developed to interpret collaboration based on the digital tracks left by students working together online. Fourth, there are no standards for how to balance student privacy with such data collection.

For all these reasons, the full burden falls on the teacher to create shared goals for students; create collaboration environments; and observe, analyze, and measure their skills. Fortunately, the same technology architecture that supported the comparatively easy first level of personalizing learning (above), can support the teacher in these tasks. By using project management tools and shared authoring tools, such as Google Docs and wikis that generate histories as students edit their shared work, a teacher can get pretty good first-order information on the timeline and magnitude and quality of each student's digital contributions. That's a big improvement over trying to be everywhere at once to observe each group's work.

Also, the same assessment groups that are working toward improved digital assessments for basic skills and higher-order thinking are also targeting 21st-century skills. If structured carefully, these digital assessments will also flow seamlessly into an IT roadmap for schools that is moving toward a ubiquitous computing environment.

Level 3: Play

The third level is less like traditional gamification and more about play. Rather than using Warcraft dynamics, it focuses on open-ended exploration — more like the game Minecraft. It already shows up in education through inquiry and the arts, and is more focused on developing questions than finding answers.

This is the expert level. This level confounds traditional approaches of measuring success — how do you measure the value of a question, or a journey, or artistic expression? If there are no outcomes that we know how to measure, then is the activity even a valid one for schools?

Still, teachers, critics and experts evaluate art all the time. Perhaps the artistic tradition of portfolios will serve the role of capturing open-ended student work that isn't readily reduced to performance on a test. The student work itself, including student reflections on the journey of creating that work, may in its entirety be interpreted and understood by an audience of teachers, college admissions arbiters, employers, friends, family, experts and critics.

I've written previously about the notion of a student digital backpack wherein students and families own their data and which can include everything from test scores to rich digital portfolios. Although the need for standard privacy and data-sharing policies is as yet unmet, and the structure of such backpacks may not yet be fully conceived, the good news, once again, is that the technology degrades gracefully. An IT roadmap that includes cloud-based, student-controlled portfolios today will support a migration to systems that provide privacy management and evolving mechanisms for demonstrating achievements, performance, and student work in the future.

It is a fairly small technical shift, though a potentially significant conceptual leap, for schools to change from the current kinds of planning that tends to include lots of locally maintained servers and fixed computer labs to planning for mobile devices and cloud computing provided as a service to schools. Regardless of the hardware, software, and bandwidth a school currently has available, planning for this emergent infrastructure will provide critically needed flexibility over the next decade.

There are many examples that highlight this need, but the lens of gaming and gamification make a point that can be overlooked when discussing the use of technology in education: we learn best by doing, we learn best in authentic situations, we learn best socially, and we learn best playfully. These elements can be seen in the best classrooms, regardless of whether technology is involved — from gold stars for recognizing achievements, to students collaborating on a meaningful community project, to young people engaging in open-ended inquiry. The risk is that as we move to more digitally supported and mediated teaching and learning, these best traditions and practices might be lost. Thoughtful roadmapping of technology that supports both Warcraft-like and Minecraft-like student work can help keep these practices central.


November 04 2011

The maker movement's potential for education, jobs and innovation is growing

Dale DoughertyDale Dougherty (@dalepd), one of the co-founders of O'Reilly Media, was honored at the White House yesterday as a "Champion of Change." This White House initiative profiles Americans who are helping their fellow citizens "meet the challenges of the 21st century." The recognition came as part of what the White House is calling "Make it in America," which convenes people from around the country to discuss American manufacturing and jobs.

"This is so completely deserved," wrote Tim O'Reilly on Google+. "When you see kids at Maker Faire suddenly turned on to science and math because they want to make things, when you see them dragging their parents around with eyes shining, you realize just how dull our education system has made some of the most exciting and interesting stuff in the world. Dale has taken a huge step towards changing that. I'm honored to have worked with Dale now for more than 25 years, making big ideas happen. He's a genius."

The event was streamed online at Video of the event is up on YouTube, where you can watch Dougherty's comments, beginning at 58:18. Most of the other speakers focused on energy, transportation or other economic issues. Dougherty went in a different direction. "You're sort of the anti-Washington message, in that you guys just hang out and do great stuff," said U.S. CTO Aneesh Chopra when introducing Dougherty.

"I started this magazine called 'MAKE'," Dougherty said. "It's sort of a 21st-century 'Popular Mechanics,' and it really meant to describe how to make things for fun and play. [We] started an event called MakerFaire, just bringing people together to see what they make in their basements, their garages, and what they're doing with technology. It really kind of came from the technology side into what you might call manufacturing, but people are building robots, people are building new forms of lighting, people are building … new forms of things that are just in their heads," he said.

"You mentioned tinkering," said Dougherty, responding to an earlier comment by Chopra. "Tinkering was once a solid middle-class skill. It was how you made your life better. You got a better home, you fixed your car, you did a lot of things. We've kind of lost some of that, and tinkering is on the fringe instead of in the middle today.

The software community is influencing manufacturing today, said Dougherty, including new ways of thinking about it. "It's a culture. I think when you look at 'MAKE' and MakerFaire, this is a new culture, and it is a way to kind of redefine what this means." It's about seeing manufacturing as a "creative enterprise," not something "where you're told to do something but where you're invited to solve a problem or figure things out."

This emergent culture is one in which makers create because of passion and personal interest. "People are building robots because they want to," Dougherty said. "It's an expression of who they are and what they love to do. When you get these people together, they really turn each other on, and they turn on other people."

I caught up with Dougherty and talked with him about the White House event and what's happening more broadly in the maker space. Our interview follows.

What does this recognition mean to you?

Dale Dougherty: I see it as a recognition for the maker movement and the can-do spirit of makers. I'm proud of what makers are doing, so I appreciated the opportunity to tell this story to business and government leaders. Makers are the champions of change.

How fast is the maker community growing?

Dale Dougherty: It's hard to put a number on the spread of an idea. The key thing is that it continues to spread and more people are getting connected. I know that the maker audience is getting younger every year, which is a good sign. That means we've involved more families and young people.

What's particularly exciting to you in the maker movement right now?

Dale Dougherty: Kits. We just wrapped up a special issue of "MAKE" on kits. Kits are a very interesting alternative to packaged consumer products. They provide parts and instructions for you to make something yourself. There's such a broad range of kits available that I wanted to bring them together in one issue. We have a great lead article by MIT researcher and economist, Michael Schrage, on how kits drive innovation. I didn't know, for example, that the first steam engine was sold as a kit. So were the first personal computers. Today we're looking at 3-D printers such as the Makerbot. We're also looking at the RallyFighter, a kit car from Local Motors, which you can build in their new microfactory in Arizona. Also, Jose Gomez-Marquez of MIT writes about DIY medical devices and how they can be hacked by medical practitioners in third-world countries to produce custom solutions.

What does making mean for education?

Dale Dougherty: Making is learning. Remember John Dewey's phrase "learn by doing." It's a hundred-year-old educational philosophy based on experiential learning that seems forgotten, if not forbidden, today. I see a huge opportunity to change the nature of our educational system.

How is the maker movement currently influencing government?

Dale Dougherty: The DIY mindset seems essential for a democratic society, especially one that is undergoing constant change. Think of Ralph Waldo Emerson's famous essay, "Self-Reliance." Taking responsibility for yourself and your community is critical. You can't have a democracy without participation. Everything we can do for ourselves we should do and not wait or expect others to do it for us. If you want things to change, step up and make it happen.

The theme of the Washington meeting was "Make It in America." America is the leading manufacturing economy, but that lead is shrinking. As one speaker said, we have to refute the idea that manufacturing is "dirty, dangerous and disappearing."

Do we want to remain a country that makes things? There are obvious reasons many would like that answer to be 'yes,' but the biggest reason is that manufacturing has historically been a source of middle class jobs.

Some folks asked how to influence people so that they value manufacturing in American and how to get young kids interested in careers in manufacturing. One answer I have is that you have to get more people participating, to think of manufacturing as something that we all do, not just a few. We want to get people to see themselves as makers. This is the broad democratic invitation of the maker movement.

Flipping this a bit, how should the maker movement influence government?

Dale Dougherty: I see four things that the maker movement can bring:

  1. Openness — Once you get started doing something, you find others doing similar things. This creates opportunities for sharing and learning together. Collaboration just seems baked into the maker movement. Let's work together.
  2. Willingness to take risks — Let's not avoid risks. Let's not fear failure. Let's move ahead and learn from what experiences we have. The most important thing is iterating, making things better, learning new ways of doing things.
  3. Creativity — What excites many people is the opportunity to do creative work. If we can't define work as creative, maybe it won't get done.
  4. Personal — Technology has become personal. It's something we can use and shape to our own goals. Making is personal; what you make is an expression of who you are. It means something and that meaning can be shared in public.

What lies ahead in the space? DIY solar, bioreactors, hacking cars?

Dale Dougherty: That's what we'd all like to know. I don't spend too much time thinking about the future. There's so much going on right now.

World of Warcraft and Minecraft: Models for our educational system?

What is wrong with schools that there is so much discussion about how to fix them through gamification? One perspective is that students are unmotivated by school but obsessed with gaming — perhaps a game-like structure for school would make students as passionate about solving quadratic equations as killing monsters. Another perspective is that students are not being prepared for a 21st-century workforce — perhaps the collaborative requirements of online guilds and group challenges would help them gain the skills needed to work in a global environment. A third perspective is that school has lost any authentic connection with real life — perhaps introducing playfulness will create more relevance and authenticity.

Numerous game-like technology approaches for learning have been known to improve test scores among low-performing students. Computer-based learning that allows students to proceed at their own pace, to slow down and repeat subjects when they get stuck, to skip material they have already mastered, and to have a digital dashboard that lets them know how far they've come seem to help students stay more engaged — at least when combined with guidance and support from an excellent teacher. As these elements parallel many of the mechanics of games like World of Warcraft (WoW) it is not implausible to think of including them in both digital and brick-and-mortar learning in the hopes of creating significantly increased engagement and achievement on the part of students.

World of Warcraft and education

There are some powerful ideas in this approach, including the most common gamification mechanism: leveling up. Traditionally, students learn one day at a time. "What are the new example problems, and can I reproduce the process of solving them? What will be on the test?" In this model, the goal is the grade, not understanding, and the game is school. If done well, implementing a leveling-up metaphor can help shift a student's mindset to one in which the game is learning and the grade is a side effect of getting better. "What do I need to understand in order to reach the next level?" Generally, this requires that the levels are awarded as indications of genuine accomplishment as opposed to being expected to have intrinsic motivational value.

Another common mechanism is "unlocking" new content: one can imagine a math curriculum being broken up into smaller modules that allow a student to choose what skills to "unlock" next, increasing ownership and autonomy in a way that is associated with increased motivation. "Achieves" (acknowledgement of having accomplished something significant) can motivate students to explore more widely. Leaderboards can stimulate competition and peer pressure to succeed. With careful design, the structure can create an environment that supports both intrinsic and extrinsic motivators for personal achievement using traditional gamification tools that parallel the leveling aspects of games like WoW.

Screenshot from World of Warcraft Cataclysm
Screenshot from "World of Warcraft Cataclysm."

The challenge with the gamification approach as described so far is that it doesn't address the whole story. What education has found over the past decade by incentivizing improved test scores is that those come at the cost of other forms of student achievement. For instance, a student who can achieve proficiency on a state math test may be able to solve rote problems and perform computation, but not know how to apply those skills to challenges in the real world that require higher-order thinking. More importantly, competency in these basic skills may not be enough to prepare a student for work in a global economy. There is a growing emphasis in education on 21st-century skills such as collaboration and communication — skills that advanced players of WoW must master in order to succeed in dungeons, battle grounds, and raids; those aspects of the game that require groups or teams.

Leveling up in WoW means solving problems (quests) and grinding (tedious monster-killing that gains experience points), and requires only basic skills. You measure success by your level, and you gain levels faster by becoming faster at questing and defeating monsters. The key statistic in how quickly a monster goes down is the "damage per second" (DPS) that your character can deal. Optimizing DPS is challenging and takes both practice and analysis, but in the end, great DPS only gets you so far. WoW is not just an online game. Like the real world, it is massively multiplayer, and much of the game, including all of the advanced gaming, involves working with teams to achieve challenging objectives. While statistics like DPS and others provide the minimum requirements for entry into advanced team gaming, you will only be able to participate if the rest of the group accepts you as a team member. This requires a more advanced knowledge of the challenges, collaboration and teamwork, communication, and other 21st-century skills. (The relationship between massively multiplayer online role-playing games (MMORPG) and 21st-century skills has been described for years by Marc Prensky and John Seely Brown.)

There is no point system in WoW to grade you as a team player — there is only your reputation. Other players include and invite you based on your value as they see it — a combination of your performance and their biases. Similarly, in school, there are currently no digital assessments that can predict the ability of a student to perform effectively on self-managed, collaborative teams once they enter the workforce, yet preparation for work or college is one of the top goals of K-12 education.

That's not to say that there is no performance data available — it just requires human interpretation. In WoW, raid leaders download spreadsheets with data on every action of every character and its effect — data that is available because the game is digital. This data is used to determine the performance of the players and the effectiveness of their strategies. Combined with first-hand experience of collaborating with each player, this data can provide a well-rounded picture to an experienced raid leader. Analogously, in schools, one could imagine that digitally mediated group projects might yield data that would help an educator understand how a student was performing as a collaborator and a communicator. Of course, teachers do this without technology all the time; but with large classes and little time, they are exposed to only a fraction of the work and interactions that are actually happening in teams.

A personalized education can parallel WoW on two levels — in the first, a shared standard for success lets the student "level up" based on mastery rather than moving through the system based on seat time. In the second, shared common goals give the student the opportunity to demonstrate 21st-century skills such as collaboration and communication. But this interpretation of gamification still falls short of the big picture. Life doesn't have the pre-defined goals at which these structures are designed to help us succeed; whether work does or not depends largely on the work environment, far more than on the nature of the work.

Minecraft and education

If part of college or work preparation also involves gaining experience and confidence with open exploration, curiosity, creativity, and following a hunch or an interest without knowing where it will lead, let's shift our metaphor from Warcraft to Minecraft.

Screenshot from This is Minecraft video
Screenshot from "This is Minecraft" video.

Like Warcraft, Minecraft is a virtual world with a few simple rules. In a nutshell, the world is littered with materials that can be used for building things, a "craft table" for making things from raw materials, and optional monsters to battle. Unlike Warcraft, there are no pre-defined goals. Players may create adventure maps with all kinds of goals and challenges for other players, and these are wildly popular, but conquering a map doesn't get you points in a bigger game-wide contest.

Minecraft is about making stuff. Virtual stuff, but stuff nonetheless. It is also about exploration. In Minecraft, you can get lost and never find your way back, in which case your best option may be to cut your losses and move forward. In Minecraft, players make elaborate buildings, works of art, performance art (see the TNT videos on YouTube), and mini worlds and challenges for other players. Games like Minecraft can offer us a perspective on balancing the goal-based solving of problems with the open-ended finding of valuable questions — a skill education will need to provide to every new global citizen.

If there are things to learn from the notion of gamification, let's apply them at multiple levels, not superficially. We can learn from levels and leaderboards to add intrinsic and extrinsic motivators to help motivate students to succeed at traditional state standards and tests. We can learn from the structure of the human dynamics in massively multiplayer games to value and capture collaboration, communication, and other higher-order skills needed to achieve collective pre-defined goals. We can learn from simple rule-based (as opposed to goal-based) games to value and preserve the artifacts of exploration as well as its end products.


Reposted bycheg00 cheg00

November 03 2011

Strata Week: Cloudera founder has a new data product

Here are a few of the data stories that caught my attention this week:

Odiago: Cloudera founder Christophe Bisciglia's next big data project

Odiago and WibiDataCloudera founder Christophe Bisciglia unveiled his new data startup this week: Odiago. The company's product, WibiData (say it out loud), uses Apache Hadoop and Hbase to analyze consumer web data. Database industry analyst Curt Monash describes WibiData on his DBMS2 blog:

WibiData is designed for management of, investigative analytics on, and operational analytics on consumer internet data, the main examples of which are web site traffic and personalization and their analogues for games and/or mobile devices. The core WibiData technology, built on HBase and Hadoop, is a data management and analytic execution layer. That's where the secret sauce resides.

GigaOm's Derrick Harris posits that Odiago points to "the future of Hadoop-based products." Rather than having to "roll your own" Hadoop solutions, future Hadoop users will be able to build their apps to tap into other products that do the "heavy lifting."

Hortonworks launches its data platform

Hadoop company Hortonworks, which spun out of Yahoo earlier this year, officially announced its products and services this week. The Hortonworks Data Platform is an open source distribution powered by Apache Hadoop. It includes the Hadoop Distributed File System (HDFS), MapReduce, Pig, Hive, HBase and Zookeeper, as well as HCatalog and open APIs for integration. THe Hortonworks Data Platform also includes Ambari, another Apache project, that will serve as the Hadoop installation and management system.

It's possible Hortonworks' efforts will pick up the pace of the Hadoop release cycle and address what ReadWriteWeb's Scott Fulton sees as the "degree of fragmentation and confusion." But as GigaOm's Derrick Harris points out, there is still "so much Hadoop in so many places, with multiple companies offering their own Hadoop solutions.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Big education content meets big education data

A couple of weeks ago, the adaptive learning startup Knewton announced that it had raised an additional $33 million. This latest round was led by Pearson, the largest education company in the world. As such, the announcement this week that Knewton and Pearson are partnering is hardly surprising.

But this partnership does mark an important development for big data, textbook publishing, and higher education.

Knewton's adaptive learning platform will be integrated with Pearson's digital courseware, giving students individualized content as they move through the materials. To begin with, Knewton will work with just a few of the subjects within Pearson's MyLab and Mastering catalog. There are more than 750 courses in that catalog, and the adaptive learning platform will be integrated with more of them soon. The companies also say they plan to "jointly develop a line of custom, next-generation digital course solutions, and will explore new products in the K12 and international markets."

The data from Pearson's vast student customer base — some 9 million higher ed students use Pearson materials — will certainly help Knewton refine its learning algorithms. In turn, the promise of adaptive learning systems means that students and teachers will be able to glean insights from the learning process — what students understand, what they don't — in real time. It also means that teachers can provide remediation aimed at students' unique strengths and weaknesses.

Got data news?

Feel free to email me.


October 16 2011

BioCurious opens its lab in Sunnyvale, CA

When I got to the BioCurious lab yesterday evening, they were just cleaning up some old coffee makers. These, I learned, had been turned into sous vide cookers in that day's class.

New lab at BioCurious
New lab at BioCurious

Sous vide cookers are sort of the gourmet rage at the moment. One normally costs several hundred dollars, but BioCurious offered a class for $117 where seventeen participants learned to build their own cookers and took them home at the end. They actually cooked steak during the class--and I'm told that it come out very good--but of course, sous vide cookers are also useful for biological experiments because they hold temperatures very steady.

The class used Arduinos to provide the temperature control for the coffee pots and other basic hardware, so the lesson was more about electronics than biology. But it's a great illustration of several aspects of what BioCurious is doing: a mission of involving ordinary people off the street in biological experiments, using hands-on learning, and promoting open source hardware and software.

Other classes have taught people to insert dyes into cells (in order to teach basic skills such as pipetting), to run tests on food for genetically modified ingredients, and to run computer analyses on people's personal DNA sequences. The latter class involved interesting philosophical discussions about how much to trust their amateur analyses and how to handle potentially disturbing revelations about their genetic make-up. All the participants in that class got their sequencing done at 23andme first, so they had sequences to work with and could compare their own work with what the professionals turned up.

Experiments at BioCurious are not just about health. Synthetic biologists, for instance, are trying a lot of different ways to create eco-friendly synthetic fuels.

BioCurious is not a substitute for formal training in biochemistry, biology, and genetics. But it is a place for people to get a feel for what biologists do and for real biologists without access to expensive equipment to do research of their dreams.

In a back room (where I was allowed to go after being strenuously warned not to touch anything--BioCurious is an official BSL 1 facility, and they're lucky the city of Sunnyvale allowed them to open), one of the staff showed a traditional polymerase chain reaction (PCR) machine, which costs several thousand dollars and is critical for sequencing DNA.

Traditional commercial PCR
Traditional commercial PCR

A couple BioCurious founders analyzed the functions of a PCR and, out of plywood and off-the-shelf parts, built an OpenPCR with open hardware specs. At $599, OpenPCR opens up genetic research to a far greater audience.

BioCurious staffer with OpenPCR
BioCurious staffer with OpenPCR

How low-budget is BioCurious? After meeting for a year in somebody's garage, they finally opened this space three weeks ago with funds raised through Kickstarter. All the staff and instructors are volunteers. They keep such a tight rein on spending that a staffer told me they could keep the place open by teaching one class per week. Of the $117 students spent today for their five-hour class, $80 went to hardware.

BioCurious isn't unique (a similar space has been set up in New York City, and some movements such as synthetic biology promote open information), but it's got a rare knack for making people comfortable with processes and ideas that normally put them off. When executive director Eri Gentry introduces the idea to many people, they react with alarm and put up their hands, as if they're afraid of being overwhelmed by technobabble. (I interviewed Gentry (MP3) before a talk she gave at this year's O'Reilly Open Source Convention.)

Founder and executive director Eri Gentry
Founder and executive director Eri Gentry

BioCurious attacks that fear and miscomprehension. Like Hacker Dojo, another Silicon Valley stalwart whose happy hour I attended Friday night, they wants an open space for open-minded people. Hacker Dojo and BioCurious will banish forever the stereotype of the scientist or engineer as a socially maladroit loner. The attendees are stringently welcoming and interested in talking about what they do in says that make it understandable.

I thought of my two children, both of whom pursued musical careers. I wondered how they would have felt about music if kids weren't exposed to music until junior high school, whereupon they were sat down and forced to learn the circle of fifths and first species counterpoint. That's sort of how we present biology to the public--and then, even those who do show an interest are denied access to affordable equipment. BioCurious is on the cusp of a new scientific revolution.

Eri Gentry with Andy Oram in lab
Eri Gentry with Andy Oram in lab

October 05 2011

Giving kids access to almost any book in the world

The United Nations Educational, Scientific, and Cultural Organization (UNESCO) reports that one in five adults worldwide is still not literate. In this interview, Elizabeth Wood (@lizzywood), director of digital publishing for Worldreader and a speaker at TOC Frankfurt, talks about the social and infrastructure issues affecting literacy and how Worldreader is making a difference. She says Worldreader's goal is to reach 1 million children by 2015.

Our interview follows.

What is

Elizabeth-Wood.jpgElizabeth Wood: Worldreader is an innovative non-profit organization that uses ereaders, like the Kindle, to cultivate a culture of reading in the developing world. Since printed materials and books are nearly impossible to come by in many areas, Worldreader uses the GSM network to give kids and teachers access to a library of electronic books.

What countries are involved at this point, and how is the project organized?

Elizabeth Wood: We have projects up and running in Ghana and Kenya, and have plans to expand soon into other African countries. Through our partnerships with technology companies, publishers, and public and private organizations, we're able to deliver thousands of local and international ebooks to students and teachers. Our key funding partners include a mix of governmental support (USAID), private investors (Jeffrey Bezos and John McCall MacBain), and foundations (The Spotlight Foundation and Social Endeavors). From publishing, we're working with big players such as Random House and Penguin, and local African publishers like EPP in Ghana and Longhorn in Kenya. On the technology side, we've teamed up with Amazon, which has provided us with discounted pricing for the Kindle 3G device and delivery support for the ebooks.

To date, we have delivered more than 56,000 ebooks to kids and teachers participating in our programs.

Once kids have the hardware and the connectivity, what needs to happen next?

Elizabeth Wood: Given that the developing world has leapfrogged the Western world in mobile phone technology, from a tech perspective, using ereaders with built-in 3G connectivity makes more sense than relying on the Internet, which is sporadic at best in places like Africa. Prior to delivering the ereaders to the kids, Worldreader registers the Kindles to our internal account database and uploads — or "pushes" — a starter collection of books into the ereader.

Once the ereaders are in kids' hands, we continue to push new content on a weekly basis. Additionally, kids and teachers can choose from more than 28,000 free books in the Kindle store and download as many as they like. They also have access to free samples (the first chapter of almost any book in the world) and some periodical subscriptions. Getting a new book is as easy as receiving a text message.

TOC Frankfurt 2011 — Being held on Tuesday, Oct. 11, 2011, TOC Frankfurt will feature a full day of cutting-edge keynotes and panel discussions by key figures in the worlds of publishing and technology.

Save 100€ off the regular admission price with code TOC2011OR

Can you put the global literacy situation into perspective?

Elizabeth Wood: According to UNESCO, some 793 million adults lack minimum literacy skills. That means that about one in five adults is still not literate. Additionally, 67.4 million children do not attend school, and many more attend irregularly or drop out.

In many parts of Africa and other emerging countries, there simply is no access to any sort of books or printed materials. There are no libraries, school book shelves are completely bare, and paper books that do occasionally arrive there (via expensive shipping methods) often inadequately meet the current educational needs. For many kids, Worldreader provides the only opportunity they may have in accessing any kind of book.

Worldreader believes improved reading skills empower people to change their personal economic and social situations. Over time, increased literacy helps individuals rise above poverty and creates new opportunities for families and communities.

It's early, but are ereaders making a difference at this point?

Elizabeth Wood: We are just beginning to understand the effects ereaders can have on reading, but early signs are very positive. Primary students in Ghana are showing improvements of up to 13% on reading comprehension in just five months. Anecdotally, a majority of students expressed that they never became bored of the ereader, and teachers have said they noticed increased student enthusiasm toward reading.

For additional information and stats, check out the "E-readers inspire future writers" post on the Worldreader blog and the videos on our YouTube channel, such as this video:

How is the program going so far and what's next?

Elizabeth Wood: Our iRead 1 pilot will move into its second year, really allowing us to understand the deeper effects of ereaders on literacy rates. We will be expanding to more children with the launch of iRead 2, which will affect more than 1,000 students. Worldreader will move into more markets soon, and our goal is to reach 1 million children by 2015.

This interview was edited and condensed.


September 08 2011

Master a new skill? Here's your badge

Open Badges ProjectEarning badges for learning new things is an entrenched idea. Legions of Boy Scouts and Girl Scouts have decorated their sashes with badges, demonstrating their mastery of various skills. A badge is a symbol of personal achievement that's acknowledged by others.

The Mozilla Foundation and Peer-to-Peer University (P2PU), among others, are working to create an alternative — and recognized — form of certification that combines merit-earned badges with an open framework. The Open Badges Project will allow skills and competencies to be tracked, assessed, and showcased.

In the interview below, I talk with the project director, Mozilla's Erin Knight (@eknight), about the genesis and goals of the Open Badges initiative.

How did the Open Badges project come about?

Erin Knight: At the core, it's really just a general acknowledgement that learning looks very different today than traditionally imagined. Legitimate and interest-driven learning is occurring through a multitude of channels outside of formal education, and yet much of that learning does not "count" in today's world. There is no real way to demonstrate that learning and transfer it across contexts or use it for real results.

We feel this is where badges can come in — they can provide evidence of learning, regardless of where it occurs or what it involves, and give learners tangible recognition for their skills, achievements, interests and affiliations that they can carry with them and share with key stakeholders, such as potential employers, formal institutions or peer communities.

This problem space is particularly interesting and important to Mozilla for a couple of reasons:

  1. It is our mission to promote the open web, get more people involved in making it and help people capitalize on the benefits and affordances of it. There is so much learning that is occurring, or could occur, through the web — through open education opportunities like P2PU, information hubs like Wikipedia, and even social media. We want to help people capitalize on these opportunities and make this learning count and get them real results.
  2. We also care about supporting and encouraging more people to become open web developers, and much of this learning is typically based on social, informal and personal experiences and work. For example, you may look at someone else's code on github to figure out how to solve a specific problem or tinker on your own to develop a deeper mastery. None of this is taught through a formal curriculum, and in fact, the space moves so quickly that formal curricula are often outdated by the time they can put a syllabus together. We want a way to acknowledge the work and skills of web developers at all stages of their careers, both to motivate them to learn new skills and become better as well as to connect them with jobs and opportunities.

Web 2.0 Expo New York 2011, being held Oct. 10-13, showcases the latest Web 2.0 business models, development tools and design strategies for the builders of the next-generation web.

Save 20% on registration with code WEBNY11RAD

Tell me about the technology infrastructure behind the Open Badges system. How do you validate a badge?

Erin Knight: One piece of the Open Badges initiative is the Open Badge Infrastructure (OBI). This came out of early conversations. We spent a lot of time talking about core aspects of an individual badge system: What are the badges? What does assessment look like? How do we ensure validity? We realized quite quickly that to truly solve the problems we are trying to solve and to support learners wherever they are learning, we were not just talking about a badge system, but a badge ecosystem.

In this ecosystem, there would be many badge issuers offering different types of badges for different learning experiences, and each learner could earn badges across issuers and experiences. This requires that badge systems work together and are interoperable for the learner.

The big missing piece was a core infrastructure that could support a multitude of issuers, allow a learner to collect badges into a single collection tied to his or her identity, and then connect to many display sites or consumers to extend the value of the badges. This middle "plumbing" needs to be open and decentralized because if this is as successful as we all think it can be, we are talking about critical identity information here. It's important that the user remain in complete control.

We're building this to be as open and decentralized as possible. All elements, including the Hub, or main badge manifest repository, and the Backpack(s) — the user interface on the Hub (users will have their own Backpacks showing them all of their badges and allowing them to manage, control and share out badges) — are being built open source and extensible so that anyone can create their own instance. Mozilla will build and host the reference implementations, but we want to support decentralization as much as possible.

We're also working with a large advisory group with representation that spans informal education providers, academia, federal agencies, and development communities to make sure that all of our assumptions and approaches are fully vetted and thought through from multiple perspectives and interests. And finally, we're building this to be as lightweight as possible, especially at this point so early in the game, and pushing the innovation to the edge. This means that issuers completely control and decide what their badges are, how they are earned, and so forth. And on the other end, displayers control how badges are displayed, such as with filters or visualizations, etc. We want the OBI to support innovation, not constrain it in any way.

How do badges benefit learners and badge issuers?

Erin Knight: The OBI supports an open and decentralized badge ecosystem where the value of learning experiences can be extended to very real results very easily. It gives the learners the ability to earn lots of different badges across lots of different experiences and not only combine them into one big collection, but remix them into subgroups to share with specific audiences. This allows learners to tell complete stories about themselves, backed by the badges and the evidence they are linked to.

For the issuers, the platform allows them to support the learners further, extend the value of the opportunities they provide, and promote themselves through the badges. For the displayers, they can pull more information backed by evidence into profiles, job opportunities, etc., as well as discover people based on badges.

Is there a connection between the Open Badges project and gamification?

Erin Knight: There is an element of gamification in all of this in that we've all experienced badges or levels in games, and we know that they can be motivating. That's important. Badges will range from smaller motivational badges, to larger certification-type badges, but as people are designing badge systems, many of the principles of game design do and should apply. Badges from game providers will be important for the ecosystem because they represent reputation, identity and achievement that will be valuable for some users in various contexts.

Where does the Open Badges project go from here?

Erin Knight: We're working on developing a number of badge systems for Mozilla projects, including the School of Webcraft; a partnership with P2PU offering free, open opportunities for web developer training; and Hackasaurus, a program to get youth involved in hacking and building the open web.

On the Open Badge Infrastructure front, the goal is for this to be completely open and accessible to anyone who wants to be an issuer (push badges in) or a displayer/consumer (pull badges out). We are developing and releasing a set of APIs and a badge metadata spec, and we're launching the beta version of the OBI by mid September. It will be a critical feature-complete infrastructure with a number of initial issuers.

Anyone interested in participating in that beta can contact me via Twitter @eknight. We plan to publicly release the OBI, the metadata spec and APIs in early January 2012. At that point, all the documentation and code samples will be there so anyone can plug in. For more information, people can check out MozillaWiki and "An Open Badge System Framework."

This interview was edited and condensed.


August 23 2011

The nexus of data, art and science is where the interesting stuff happens

Jer Thorp (@blprnt), data artist in residence at The New York Times, was tasked a few years ago with designing an algorithm for the placement of the names on the 9/11 memorial. If an algorithm sounds unnecessarily complex for what seems like a basic bit of organization, consider this: Designer Michael Arad envisioned names being arranged according to "meaningful adjacencies," rather than by age or alphabetical order.

The project, says Thorp, is a reminder that data is connected to people, to real lives, and to the real world. I recently spoke with Thorp about the challenges that come with this type of work and the relationship between data, art and science. Thorp will expand on many of these ideas in his session at next month's Strata Conference in New York City.

Our interview follows.

How do aesthetics change our understanding of data?

Jer ThorpJer Thorp: I'm certainly interested in the aesthetic of data, but I rarely think when I start a project "let's make something beautiful." What we see as beauty in a data visualization is typically pattern and symmetry — something that often emerges when you find the "right" way, or one of the right ways, to represent a particular dataset. I don't really set out for beauty, but if the result is beautiful, I've probably done something right.

My work ranges from practical to conceptual. In the utilitarian projects I try not to add aesthetic elements unless they are necessary for communication. In the more conceptual projects, I'll often push the acceptable limits of complexity and disorder to make the piece more effective. Of course, often these more abstract pieces get mistaken for infographics, and I've had my fair share Internet comment bashing as a result. Which I kind of like, in some sort of masochistic way.

What's it like working as a data artist at the New York Times? What are the biggest challenges you face?

Jer Thorp: I work in the R&D Group at the New York Times, which is tasked to think about what media production and consumption will look like in the next three years or so. So we're kind of a near-futurist department. I've spent the last year working on Project Cascade, which is a really novel system for visualizing large-scale sharing systems in real time. We're using it to analyze how New York Times content gets shared through Twitter, but it could be used to look at any sharing system — meme dispersal, STD spread, etc. The system runs live on a five-screen video wall outside the lab, and it gives us a dynamic, exploratory look at the vast conversation that is occurring at any time around New York Times articles, blog posts, etc.

It's frankly amazing to be able to work in a group where we're encouraged to take the novel path. Too many "R&D" departments, particularly in advertising agencies, are really production departments that happen to do work with augmented reality, or big data, or whatever else is trendy at the moment. There's an "R" in R&D for a reason, and I'm lucky to be in a place where we're given a lot of room to roam. Most of the credit for this goes to Michael Zimbalist, who is a great thinker and has an uncanny sense of the future. Add to that a soundly brilliant design and development team and you get a perfect creative storm.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science -- from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code STN11RAD

I try to straddle the border between design, art and science, and one of my biggest challenges is to not get pulled too far in one direction. I'm always conscious when I'm starting new projects to try to face in a different direction from where I was headed last. This keeps me at that boundary where I think the most interesting things are happening. Right now I'm working on two projects that concern memory and history, which is relatively uncharted territory for me and is getting me into a mix of neurobiology and psychology research alongside a lot of art and design history. So far, it's been tremendously satisfying.

In addition to your position at the Times, you're also a visiting professor at New York University. I'm curious how you see data visualization changing the way art and technology are taught and learned.

Jer Thorp: The class I'm currently teaching is called "Data Representation." Although it does include a fair amount of visualization, we talk a lot about how data can be used in a creative practice in different ways — sculpture, performance, participatory practice, etc. I'm really excited about artists who are representing information in novel media, such as Adrien Segal and Nathalie Miebach, and I try to encourage my students to push into areas that haven't been well explored. It's an exciting time for students because there are a million new niches just waiting to be found.

This interview was edited and condensed.


August 16 2011

Data science is a pipeline between academic disciplines

We talk a lot about the ways in which data science affects various businesses, organizations, and professions, but how are we actually preparing future data scientists? What training, if any, do university students get in this area? The answer may be obvious if students focus on math, statistics or hard science majors, but what about other disciplines?

I recently spoke with Drew Conway (@drewconway) about data science and academia, particularly in regards to social sciences. Conway, a PhD candidate in political science at New York University, will expand on some of these topics during a session at next month's Strata Conference in New York.

Our interview follows.

How has the work of academia — particularly political science — been affected by technology, open data, and open source?

Drew ConwayDrew Conway: There are fundamentally two separate questions in here, so I will try to address both of them. First is the question of how academic research has changed as a result of these technologies. And for my part, I can only really speak for how they have affected social science research. The open data movement has impacted research most notably in compressing the amount of time a researcher goes from the moment of inception ("hmm, that would be interesting to look at!") to actually looking at data and searching for interesting patterns. This is especially true of the open data movement happening at the local, state and federal government levels.

Only a few years ago, the task of identifying, collecting, and normalizing these data would have taken months, if not years. This meant that a researcher could have spent all of that time and effort only to find out that their hypothesis was wrong and that — in fact — there was nothing to be found in a given dataset. The richness of data made available through open data allows for a much more rapid research cycle, and hopefully a greater breadth of topics being researched.

Open source has also had a tremendous impact on how academics do research. First, open source tools for performing statistical analysis, such as R and Python, have robust communities around them. Academics can develop and share code within their niche research area, and as a result the entire community benefits from their effort. Moreover, the philosophy of open source has started to enter into the framework of research. That is, academics are becoming much more open to the idea of sharing data and code at early stages of a research project. Also, many journals in the social sciences are now requiring that authors provide replication code and data.

The second piece of the question is how these technologies affect the dissemination of research. In this case blogs have becoming the de facto source for early access to new research, or scientific debate. In my own discipline, The Monkey Cage is most political scientists' first source for new research. What is fantastic about the Monkey Cage, and other academic blogs, is that they are not only ready by other academics. Journalists, policy makers, and engaged citizens can also interact with academics in this way — something that was not possible before these academic blogs became mainstream.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science -- from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code STN11RAD

Let's sidestep the history of the discipline and debates about what constitutes a hard or soft science. But as its name suggests, "political science" has long been interested in models, statistics, quantifiable data and so on. Has the discipline been affected by the rise of data science and big data?

Drew Conway: The impact of big data has been slow, but there are a few champions who are doing really interesting work. Political science, at its core, is most interested in understanding how people collectively make decisions, and as researchers we attempt to build models and collect data to that end. As such, the massive data on social interactions being generated by social media services like Facebook and Twitter present unprecedented opportunities for research.

While some academics have been able to leverage this data for interesting work, there seems to be a clash between these services' terms of service and with the desire for scientists to collect data and generate reproducible findings from this data. I wrote about my own experience using Twitter data for research, but there are many others researchers from all disciplines that have run into similar problems.

With respect to how academics have been impacted by data science, I think the impact has mostly flowed in the other direction. One major component of data science is the ability to extract insight from data using tools from math, statistics and computer science. Most of this is informed by the work of academics, and not the other way around. That said, as more academic researchers become interested in examining large-scale datasets (on the order of Twitter or Facebook), many of the technical skills of data science will have to be acquired by academics.

How does data science change the work of the grad student — in terms of necessary skills but also in terms of access to information/informants?

Drew Conway: Unfortunately, having sophisticated technical skills, i.e., those of a data scientist, are still undervalued in academia. Being involved in open-source projects, or producing statistical software is not something that will help a graduate student land a high-profile academic job, or help a young faculty member get tenure. Publications are still the currency of success, and that — as I mentioned — clashes with the data-sharing policies of many large social media services.

Graduate students and faculty do themselves a disservice by not actively staying technically relevant. As so much more data gets pushed into the open, I believe basic data hacking skills — scraping, cleaning, and visualization — will be prerequisites to any academic research project. But, then again, I've always been a weird academic, double majoring in computer science and political science as an undergrad

How does the rise of data science and its spread beyond the realm of math and statistics change the world of technology, either from an academic or entrepreneurial perspective?

Drew Conway: From an entrepreneurial perspective I think it has dramatically changed the way new businesses think about building a team. Whether it is at Strata, or any of the other conferences in the same vein, you will see a glut of job openings or panels on how to "build a data team." At present, people who have the blend of skills I associate with data science — hacking, math/stats, and substantive expertise — are a rare commodity. This dearth of talent, however, will be short-lived.

I see in my undergrads many more students who grew up with data and computing as ubiquitous parts of their lives. They're interested in pursuing routes of study that provide them with data science skills, both in terms of technical competence, and also in creative outlets such as interactive design.

How does "human subjects compliance" work when you're talking about "data" versus "people" — that's an odd distinction, of course, and an inaccurate one at that. But I'm curious if some of the rules and regulations that govern research on humans account for research on humans' data.

Drew Conway: I think it is an excellent question, and one that academe is still struggling to deal with. In some sense, mining social data that is freely available on the Internet provides researchers a way to sidestep traditional IRB regulation. I don't think there's anything ethically questionable about recording observations that are freely made public. That's akin to observing the meanderings of people in a park.

Where things get interesting is when researchers use crowd sourcing technology, like Mechanical Turk, as a survey mechanism. Here, this is much more of a gray area. I suppose, technically, the Amazon terms of services covers researchers, but ethically this is something that would seem to me to fall within the scope of an IRB. Unfortunately, the likely outcome is that institutions won't attempt to understand the difference until some problem arises.

This interview was edited and condensed.


August 01 2011

Science hacks chip away at the old barriers to entry

drawings_of_scientists.jpgStereotypes of what scientists do, how they act, what they look like persist as Eri Gentry noted in her keynote at last week's OSCON. To illustrate, Gentry pointing to the "before" and "after" pictures drawn by seventh-grade visitors to Fermilab. Hopefully, the notion of a quiet (or mad) scientist, isolated in the lab, will soon be uncommon as citizen and DIY science projects like BioCurious take shape.

In her keynote, Gentry described what it was like to be a non-scientist doing science. She also addressed the struggles that many scientists, regardless of background, face: a lack of access to the tools they need for research. Lab space is often restricted to universities or big corporations, and lab rentals, when available, can be exorbitant. But following a successful Kickstarter campaign, BioCurious will provide a collaborative lab space for biotech at a much lower rate.

The availability of an affordable and accessible space is one thing; the availability of tools is another. As Gentry highlighted, with a combination of open source software and off-the-shelf parts, it's also possible to build lab equipment that costs far less: a $125 DIY clean bench (regular price $12,000), a $55 Dremel-based centrifuge (compared to a $500 centrifuge).

But hacking science isn't just about "making things," as Spacehack's Ariel Waldman argued in her OSCON keynote. It's about making "disruptively accessible things." In her talk, Waldman talked about some of the ways in which that accessibility is occurring in space exploration. It isn't as simple as opening up the massive datasets we have from satellites and space missions — although that's part of it. It's about making sure that data isn't "buried deep within a government website" or locked in an unintelligible interface or format. And it's about making sure that people can actively contribute and that when they do, they receive credit for their work.

As Gentry described it in her talk at OSCON, these open science efforts are done "out of necessity and out of passion." Opening access to data, equipment and lab space this way puts science in the hands of makers, creators, developers, scientists, citizens — anyone. And this in turn will hopefully spur more innovation and discovery.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science -- from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 20% on registration with the code STN11RAD


July 29 2011

Maker Faire Detroit this weekend

Maker FaireThis weekend, Maker Faire Detroit opens at The Henry Ford in Dearborn, MI. Charlie Wollberg framed it perfectly on his blog:

What if Albert Einstein, Willy Wonka, Curious George, R2D2 and MacGyver threw a really big party? They’d invite all of their really cool friends: the artists, the inventors, the crafters, the mad scientists, the happy scientists, the curious, the creators, the hackers, the tinkerers.

Sure, Leonardo da Vinci would be there showing off his new helicopter prototype and Rube Goldberg would be making people laugh with his convoluted contraptions and Grace Hopper would be taking apart all the clocks while writing new computer languages. It would be the kind of place where everyone who’s ever been called weird, crazy or geeky would feel right at home.

Good news: That party is happening this weekend in Detroit.

In our second year, we're able to see all kinds of examples of how makers have become resources for the community, contributing in Detroit and the region. Jeff Sturges is one good example of an inspiring maker. He's working in the community to reach kids and share the joy of making. We shot a video of Jeff this week, which starts in Eastern Market in Detroit. He brought kids to teach soldering. These kids learned to solder at the year-old Mt. Elliott Makerspace, located in the basement of a church and at the center of a supportive community. Seeing 8 year old Raven teaching teenage boys and adults to solder makes quite an impression.

Follow the Show Daily for Maker Faire Detroit for news and featured attractions.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
No Soup for you

Don't be the product, buy the product!

YES, I want to SOUP ●UP for ...