Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

June 01 2012

Visualization of the Week: 56 years of tornadoes

This week's visualization comes from John Nelson of IDV Solutions, who has taken data from the National Oceanic and Atmospheric Administration (NOAA) to map tornado paths and F-Scale frequencies.

Nelson describes the visualization:

"It tracks 56 years of tornado paths along with a host of attribute information. Here, the tracks are categorized by their F-Scale (which isn't the latest and greatest means, but good enough for a hack like me), where brighter strokes represent more violent storms."

Tornado tracks visualization

Found a great visualization? Tell us about it

This post is part of an ongoing series exploring visualizations. We're always looking for leads, so please drop a line if there's a visualization you think we should know about.

OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.

Save 20% on registration with the code RADAR

More Visualizations:

May 10 2012

Strata Week: Big data boom and big data gaps

Here are a few of the data stories that caught my attention this week.

Big data booming

The call for speakers for Strata New York has closed, but as Edd Dumbill notes, the number of proposals are a solid indication of the booming interest in big data. The first Strata conference, held in California in 2011, elicited 255 proposals. The following event in New York elicited 230. The most recent Strata, held in California again, had 415 proposals. And the number received for Strata's fall event in New York? That came in at 635.

Edd writes:

"That's some pretty amazing growth. I can thus expect two things from Strata New York. My job in putting the schedule together is going to be hard. And we're going to have the very best content around."

The increased popularity of the Strata conference is just one data point from the week that highlights a big data boom. Here's another: According to a recent report by IDC, the "worldwide ecosystem for Hadoop-MapReduce software is expected to grow at a compound annual rate of 60.2 percent, from $77 million in revenue in 2011 to $812.8 million in 2016."

"Hadoop and MapReduce are taking the software world by storm," says IDC's Carl Olofson. Or as GigaOm's Derrick Harris puts it: "All aboard the Hadoop money train."

A big data gap?

Another report released this week reins in some of the exuberance about big data. This report comes from the government IT network MeriTalk, and it points to a "big data gap" in the government — that is, a gap between the promise and the capabilities of the federal government to make use of big data. That's interesting, no doubt, in terms of the Obama administration's recent $200 million commitment to a federal agency big data initiative.

Among the MeriTalk report's findings: 60% of government IT professionals say their agency is analyzing the data it collects and less than half (40%) are using data to make strategic decisions. Those responding to the survey said they felt as though it would take, on average, three years before their agencies were ready to fully take advantage of big data.

Prismatic and data-mining the news

The largest-ever healthcare fraud scheme was uncovered this past week. Arrests were made in seven cities — some 107 doctors, nurses and social workers were charged, with fraudulent Medicare claims totaling about $452 million. The discoveries about the fraudulent behavior were made thanks in part to data-mining — looking for anomalies in the Medicare filings made by various health care providers.

Prismatic penned a post in which it makes the case for more open data so that there's "less friction" in accessing the sort of information that led to this sting operation.

"Both the recent sting and the Prime case show that you need real journalists and investigators working with technology and data to achieve good results. The challenge now is to scale this recipe and force transparency on a larger scale.

"We need to get more technically sophisticated and start analysing the data sets up front to discover the right questions to ask, not just the answer the questions we already know to ask based on up-front human investigation. If we have to discover each fraud ring or singleton abuse as a one-off case, we'll never be able to wipe out fraud on a large enough scale to matter."

Indeed, despite this being the largest bust ever, it's really just a fraction of the estimated $20 to $100 billion a year in Medicare fraud.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20

Got data news?

Feel free to email me.


Sponsored post
Reposted byLegendaryy Legendaryy

April 05 2012

Strata Week: New life for an old census

Here are a few of the data stories that caught my attention this week

Now available in digital form: The 1940 census

The National Archives released the 1940 U.S. Census records on Monday, after a mandatory 72-year waiting period. The release marks the single largest collection of digital information ever made available online by the agency.

Screenshot from the 1940 Census available through
Screenshot from the digital version of the 1940 Census.

The 1940 Census, conducted as a door-to-door survey, included questions about age, race, occupation, employment status, income, and participation in New Deal programs — all important (and intriguing) following the previous decade's Great Depression. One data point: in 1940, there were 5.1 million farmers. According to the 2010 American Community Survey (not the census, mind you), there were just 613,000.

The ability to glean these sorts of insights proved to be far more compelling than the National Archives anticipated, and the website hosting the data,, was temporarily brought down by the traffic load. The site is now up, so anyone can investigate the records of approximately 132 million Americans. The records are searchable by map — or rather, "the appropriate enumeration district" — but not by name.

A federal plan for big data

The Obama administration unveiled its "Big Data Research and Development Initiative" late last week, with more than $200 million in financial commitments. Among the White House's goals: to "advance state-of-the-art core technologies needed to to collect, store, preserve, manage, analyze, and share huge quantities of data."

The new big data initiative was announced with a number of departments and agencies already on board with specific plans, including grant opportunities from the Department of Defense and National Science Foundation, new spending on an XDATA program by DARPA to build new computational tools as well as open data initiatives, such as the the 1000 Genomes Project.

"In the same way that past Federal investments in information-technology R&D led to dramatic advances in supercomputing and the creation of the Internet, the initiative we are launching today promises to transform our ability to use big data for scientific discovery, environmental and biomedical research, education, and national security," said Dr. John P. Holdren, assistant to the President and director of the White House Office of Science and Technology Policy in the official press release (PDF).

Personal data and social context

When the Girls Around Me app was released, using data from Foursquare and Facebook to notify users when there were females nearby, many commentators called it creepy. "Girls Around Me is the perfect complement to any pick-up strategy," the app's website once touted. "And with millions of chicks checking in daily, there's never been a better time to be on the hunt."

"Hunt" is an interesting choice of words here, and the Cult of Mac, among other blogs, asked if the app was encouraging stalking. Outcry about the app prompted Foursquare to yank the app's API access, and the app's developers later pulled the app voluntarily from the App Store.

Many of the responses to the app raised issues about privacy and user data, and questioned whether women in particular should be extra cautious about sharing their information with social networks. But as Amit Runchal writes in TechCrunch, this response blames the victims:

"You may argue, the women signed up to be a part of this when they signed up to be on Facebook. No. What they signed up for was to be on Facebook. Our identities change depending on our context, no matter what permissions we have given to the Big Blue Eye. Denying us the right to this creates victims who then get blamed for it. 'Well,' they say, 'you shouldn't have been on Facebook if you didn't want to ...' No. Please recognize them as a person. Please recognize what that means."

Writing here at Radar, Mike Loukides expands on some of these issues, noting that the questions are always about data and social context:

"It's useful to imagine the same software with a slightly different configuration. Girls Around Me has undeniably crossed a line. But what if, instead of finding women, the app was Hackers Around Me? That might be borderline creepy, but most people could live with it, and it might even lead to some wonderful impromptu hackathons. EMTs Around Me could save lives. I doubt that you'd need to change a single line of code to implement either of these apps, just some search strings. The problem isn't the software itself, nor is it the victims, but what happens when you move data from one context into another. Moving data about EMTs into context where EMTs are needed is socially acceptable; moving data into a context that facilitates stalking isn't acceptable, and shouldn't be."

Fluent Conference: JavaScript & Beyond — Explore the changing worlds of JavaScript & HTML5 at the O'Reilly Fluent Conference (May 29 - 31 in San Francisco, Calif.).

Save 20% on registration with the code RADAR20

Got data news?

Feel free to email me.


September 09 2011

Top Stories: September 5-9, 2011

Here's a look at the top stories published across O'Reilly sites this week.

The new guy wants to hack the city's data
Instead of quietly settling in like most new residents, Tyler, Texas, transplant Christopher Groskopf is on a mission to find and unlock his new city's datasets.

RIP Michael S. Hart
Michael Hart was the founder of Project Gutenberg, an incredible visionary for online books, and someone who played an important role in Nat Torkington's life.

Look at Cook sets a high bar for open government data visualizations
One of the best recent efforts at visualizing open government data can be found at, which tracks government budgets and expenditures from 1993-2011 in Cook County, Illinois.

Master a new skill? Here's your badge
The Mozilla Foundation's Erin Knight talks about how the badges and open framework of the Open Badge Project could change what "counts" as learning.

The boffins and the luvvies
Whether we're discussing ancients versus moderns, scientists versus poets, or the latest variant — computer science versus humanities, the debate between science and art is persistent and quite old.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively. Save 30% on registration with the code ORM30.

August 30 2011

How to create sustainable open data projects with purpose

mySocietyThere has been much hand-wringing of late about whether the explosion of government-run app contests over the last couple of years has generated any real value for the public. With only one of the Apps for Democracy projects still running, it's easy to see the entire movement being written off as an overly optimistic fad.

The organisation that I'm lucky enough to lead — mySociety — didn't come from the world of app contests, but it does build the kind of open-source, open-data-grounded civic apps that such contests are suppose to produce. I believe that mySociety's story shows that it's possible to build meaningful, impactful civic and democratic web apps, to grow them to a scale where they're unambiguously a good use of time and money, then sustain them for years at a time. Right now we're launching a new site, FixMyTransport, that is trying to try to raise the bar for the ambition and scale of civic apps, so this seems a good moment to share some thoughts about what it takes to build good services and get them to last more than a few months.

You have to be just as focused on user needs as any company (and perhaps more so)

People have needs. Sometimes they need to eat, sometimes they need to sleep. And sometimes they need to send an urgent message to a local politician, or get a dangerous hanging branch cleared off of a road.

What people never, ever do is wake up thinking, "Today I need to do something civic," or, "Today I will explore some interesting data via an attractive visualisation." MySociety has always been unashamed about packaging civic services in a way that appeals directly to real people with real, everyday needs. I gleefully delete the two or three emails a year that land in our inbox suggesting that FixMyStreet should be renamed to FixOurStreet. No, dude, when I'm pissed it's definitely my street, which is why people have borrowed the name around the world.

We learned this lesson most vividly from Pledgebank, a sputtering site with occasional amazing successes and lots and lots of "meh." The reason it never took off was because, unlike the later (and brilliant) Kickstarter, we didn't make it specific enough. We didn't say "use this site to raise money for your first album," or "use this site to organise a march." We said it was a platform for "getting things done," and the users walked away in confusion. That's why our new site is called FixMyTransport, even though it's actually the first instance of a general civic-problem-fixing platform that could handle nearly any kind of local campaigning.

Being focused on user needs means not starting things you think you probably can't finish

In mySociety's history we have run four calls for proposals, asking the whole world what we should build next. Like most idea gathering processes, there's about 100 bad ideas for every good one, but the bad ideas have value in that they reveal a habitual digital era trait — being insanely optimistic about the effort required to build things to a high standard.

Now, clearly, I'm not saying it is impossible to hack brilliant things without piles of VC gold. But if you are going to hack something really, genuinely valuable in just a couple of weeks, and you want it to thrive and survive in the real Internet, you need to have an idea that is as simple as it is brilliant. Matthew Somerville's accessible Traintimes fits into this category, as does, and But ideas like this are super rare — they're so simple and powerful that really polished sites can be built and sustained on volunteer-level time contributions. I salute the geniuses who gave us the four sites I just mentioned. They make me feel small and stupid.

If your civic hack idea is more complicated than this, then you should really go hunting for funding before you set about coding. Because the Internet is a savagely competitive place, and if your site isn't pretty spanking, nobody is going to come except the robots and spammers.

To be clear — FixMyTransport is not an example of a super-simple genius idea. I wish it were. Rather it's our response to the questions "What's missing in the civic web?" and "What's still too hard to get done online?" But we didn't start building it until we knew we had the money, and we didn't try to fit it into evenings and weekends. It was painful to wait and not rush with it, but it was the right thing to do to build something up to the expectations of an Internet-using public habituated to websites with billion-dollar budgets. And we are emotionally and financially prepared for the six months of rapid iteration that will follow once the public arrives.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code ORM30

Data is your servant, not your master

I love open data. I love structured data. I love data, full stop. But my love of data is not the same as respecting our users' needs. There are more than 300,000 bus stops, train stations, ferry routes and so on in the FixMyTransport back end, munged together over months of hard work from dirty, dirty public data sources. Can you see any sign of this on the homepage? No sir, because users want to fix transport problems, not revel in our mastery of databases.

Demand fewer, larger grants from government and funders

MySociety got lucky. It was born into a period of high public spending, 2003/4, and its second ever grant was for 0.02% of a government funding pot worth more than a billion dollars — about a quarter of a million dollars. It was amazing luck for a small organisation with no track record, possible only because so much money was being thrown around. Those days are gone on both sides of the Pond, but governments everywhere should note that that funding of this scale got us right through our first couple of years, until sites like WriteToThem were mature and had proved their public value (and picked up an award or two).

In the subsequent few years, we saw the "thousand flowers bloom" mentality really take over the world of public-good digital funding, and we saw it go way beyond what was sensible. Time and again, we'd see two good ideas get funding and eight bad ones at the same time because of the sense that it was necessary to spread the money around. It would be great if someone could make the case to public grant funders that good tech ideas — and the teams that can implement them — are vanishingly rare. There is nothing to be ashamed about dividing the pot up two or three ways if there are only a few ideas or proposals or hacks that justify the money. The larger amounts this would produce wouldn't mean champagne parties for grantees, it would mean the best ideas surviving long enough to grow meaningful traffic and learn how to make money other ways.

After a long road supported by public grant funding, mySociety is now 50% commercially funded and 50% private-grant funded, but we'd never have arrived there without being 100% public-grant funded for the first couple of years. Now our key donors are philanthropic, with Indigo Trust in particular covering most of the core development cost for FixMyTransport.

Respect the geeks

All great technology projects have one or more über geeks at the heart of them. If you find the right über geeks, they'll understand politics, society and users just as much as they understand their code. If you find someone as ferociously multi-talented as, say, Louise Crow, who built FixMyTransport almost single-handedly, listen to them and change your plans when they say "no." Luckily, she said "yes" to building this project, and I hope those of you who care about civic tech give her the props appropriate to building something on this scale. Respect her, and respect the geeks like her, and you'll be one step closer to civic app success.


November 12 2010

Open health data: Spurring better decisions and new businesses

itriage-multiphones.jpgAs Network World reported this week, iPhone apps that could save your life have come to an App Store near you.

"A growing number of developers are tapping into a treasure trove of U.S. government healthcare data and coming up with innovative iPhone apps that help consumers make better medical decisions," wrote Carolyn Duffy Marsan. She was reporting on a trend that started at the National Institute of Medicine in May when the U.S. Department of Health and Human Services launched its Community Health Data Initiative.

Network World covered Medwatcher, Asthmapolis, and iTriage -- the latter two also showed up here on Radar back in May. iTriage, a free app for iPhones, Android, Blackberry and other web-enabled devices, has enjoyed continued growth over the summer and fall, with nearly 1 million users to date, and a new iPad app.

Peter Hudson, one of the physicians who founded Healthagen, the company that created iTriage, spoke with me at this week's mHealth Summit. In the following video, Hudson discusses his app and the kinds of data that would help him and other mobile health entrepreneurs grow their businesses.

iTriage is free and genuinely useful. It also looks like a viable business, as more healthcare providers pay to add their data to its database. If that vision for open government at HHS continues to gain traction, the innovation released in the private sector could meet or exceed the billions of dollars unlocked by GPS and NOAA data. To see the first steps in that direction, look no further than the healthcare apps that have already gone online. When goes live later this year, entrepreneurs will have even more indicators to build into their applications.


Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
No Soup for you

Don't be the product, buy the product!

YES, I want to SOUP ●UP for ...