Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

May 17 2012

Strata Week: Google unveils its Knowledge Graph

Here's what caught my attention in the data space this week.

Google's Knowledge Graph

Google Knowledge Graph"Google does the semantic Web," says O'Reilly's Edd Dumbill, "except they call it the Knowledge Graph." That Knowledge Graph is part of an update to search that Google unveiled this week.

"We've always believed that the perfect search engine should understand exactly what you mean and give you back exactly what you want," writes Amit Singhal, Senior VP of Engineering, in the company's official blog post.

That post makes no mention of the semantic web, although as ReadWriteWeb's Jon Mitchell notes, the Knowledge Graph certainly relies on it, following on and developing from Google's acquisition of the semantic database Freebase in 2010.

Mitchell describes the enhanced search features:

"Most of Google users' queries are ambiguous. In the old Google, when you searched for "kings," Google didn't know whether you meant actual monarchs, the hockey team, the basketball team or the TV series, so it did its best to show you web results for all of them.

"In the new Google, with the Knowledge Graph online, a new box will come up. You'll still get the Google results you're used to, including the box scores for the team Google thinks you're looking for, but on the right side, a box called "See results about" will show brief descriptions for the Los Angeles Kings, the Sacramento Kings, and the TV series, Kings. If you need to clarify, click the one you're looking for, and Google will refine your search query for you."

Yahoo's fumbles

The news from Yahoo hasn't been good for a long time now, with the most recent troubles involving the departure of newly appointed CEO Scott Thompson over the weekend and a scathing blog post this week by Gizmodo's Mathew Honan titled "How Yahoo Killed Flickr and Lost the Internet." Ouch.

Over on GigaOm, Derrick Harris wonders if Yahoo "sowed the seeds of its own demise with Hadoop." While Hadoop has long been pointed to as a shining innovation from Yahoo, Harris argues that:

"The big problem for Yahoo is that, increasingly, users and advertisers want to be everywhere on the web but at Yahoo. Maybe that's because everyone else that's benefiting from Hadoop, either directly or indirectly, is able to provide a better experience for consumers and advertisers alike."

De-funding data gathering

The appropriations bill that recently passed the U.S. House of Representatives axes funding for the Economic Census and the American Community Survey. The former gathers data about 25 million businesses and 1,100 industries in the U.S., while the latter collects data from three million American households every year.

Census Bureau director Robert Groves writes that the bill "devastates the nation's statistical information about the status of the economy and the larger society." BusinessWeek chimes in that the end to these surveys "blinds business," noting that businesses rely "heavily on it to do such things as decide where to build new stores, hire new employees, and get valuable insights on consumer spending habits."

Got data news to share?

Feel free to email me.

OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.

Save 20% on registration with the code RADAR


Related:


April 05 2012

Strata Week: New life for an old census

Here are a few of the data stories that caught my attention this week

Now available in digital form: The 1940 census

The National Archives released the 1940 U.S. Census records on Monday, after a mandatory 72-year waiting period. The release marks the single largest collection of digital information ever made available online by the agency.

Screenshot from the 1940 Census available through Archives.org
Screenshot from the digital version of the 1940 Census.

The 1940 Census, conducted as a door-to-door survey, included questions about age, race, occupation, employment status, income, and participation in New Deal programs — all important (and intriguing) following the previous decade's Great Depression. One data point: in 1940, there were 5.1 million farmers. According to the 2010 American Community Survey (not the census, mind you), there were just 613,000.

The ability to glean these sorts of insights proved to be far more compelling than the National Archives anticipated, and the website hosting the data, Archives.com, was temporarily brought down by the traffic load. The site is now up, so anyone can investigate the records of approximately 132 million Americans. The records are searchable by map — or rather, "the appropriate enumeration district" — but not by name.

A federal plan for big data

The Obama administration unveiled its "Big Data Research and Development Initiative" late last week, with more than $200 million in financial commitments. Among the White House's goals: to "advance state-of-the-art core technologies needed to to collect, store, preserve, manage, analyze, and share huge quantities of data."

The new big data initiative was announced with a number of departments and agencies already on board with specific plans, including grant opportunities from the Department of Defense and National Science Foundation, new spending on an XDATA program by DARPA to build new computational tools as well as open data initiatives, such as the the 1000 Genomes Project.

"In the same way that past Federal investments in information-technology R&D led to dramatic advances in supercomputing and the creation of the Internet, the initiative we are launching today promises to transform our ability to use big data for scientific discovery, environmental and biomedical research, education, and national security," said Dr. John P. Holdren, assistant to the President and director of the White House Office of Science and Technology Policy in the official press release (PDF).

Personal data and social context

When the Girls Around Me app was released, using data from Foursquare and Facebook to notify users when there were females nearby, many commentators called it creepy. "Girls Around Me is the perfect complement to any pick-up strategy," the app's website once touted. "And with millions of chicks checking in daily, there's never been a better time to be on the hunt."

"Hunt" is an interesting choice of words here, and the Cult of Mac, among other blogs, asked if the app was encouraging stalking. Outcry about the app prompted Foursquare to yank the app's API access, and the app's developers later pulled the app voluntarily from the App Store.

Many of the responses to the app raised issues about privacy and user data, and questioned whether women in particular should be extra cautious about sharing their information with social networks. But as Amit Runchal writes in TechCrunch, this response blames the victims:

"You may argue, the women signed up to be a part of this when they signed up to be on Facebook. No. What they signed up for was to be on Facebook. Our identities change depending on our context, no matter what permissions we have given to the Big Blue Eye. Denying us the right to this creates victims who then get blamed for it. 'Well,' they say, 'you shouldn't have been on Facebook if you didn't want to ...' No. Please recognize them as a person. Please recognize what that means."

Writing here at Radar, Mike Loukides expands on some of these issues, noting that the questions are always about data and social context:

"It's useful to imagine the same software with a slightly different configuration. Girls Around Me has undeniably crossed a line. But what if, instead of finding women, the app was Hackers Around Me? That might be borderline creepy, but most people could live with it, and it might even lead to some wonderful impromptu hackathons. EMTs Around Me could save lives. I doubt that you'd need to change a single line of code to implement either of these apps, just some search strings. The problem isn't the software itself, nor is it the victims, but what happens when you move data from one context into another. Moving data about EMTs into context where EMTs are needed is socially acceptable; moving data into a context that facilitates stalking isn't acceptable, and shouldn't be."

Fluent Conference: JavaScript & Beyond — Explore the changing worlds of JavaScript & HTML5 at the O'Reilly Fluent Conference (May 29 - 31 in San Francisco, Calif.).

Save 20% on registration with the code RADAR20

Got data news?

Feel free to email me.

Related:

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl