Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 14 2014

Four short links: 14 February 2014

  1. Bitcoin: Understanding and Assessing Potential Opportunities (Slideshare) — VC deck on Bitcoin market and opportunities, long-term and short-term. Interesting lens on the development and gaps.
  2. Queensland Police Map Crime Scenes with 3D Scanner (ComputerWorld) — can’t wait for the 3D printed merchandise from famous trials.
  3. Atheer LabsAn immersive 3D display, over a million apps, sub-mm 3D hand interaction, all in 75 grams.
  4. libcloudPython library for interacting with many of the popular cloud service providers using a unified API.

December 12 2013

Four short links: 12 December 2013

  1. iBeacons — Bluetooth LE enabling tighter coupling of physical world with digital. I’m enamoured with the interaction possibilities: The latest Apple TV software brought a fantastically clever workaround. You just tap your iPhone to the Apple TV itself, and it passes your Wi-Fi and iTunes credentials over and sets everything up instantaneously.
  2. Better and Better Keyboards (Jesse Vincent) — It suffered from the same problem as every other 3D-printed keyboard I’d made to date – When I showed it to someone, they got really excited about the fact that I had a 3D printer. In contrast, whenever I showed someone one of the layered acrylic prototype keyboards I’d built, they got excited about the keyboard.
  3. Bamboo.io — open source modular web service for dataset storage and retrieval.
  4. state.jsOpen source JavaScript state machine supporting most UML 2 features.

November 20 2013

AOL vs. „People Plus”: Wie frei sind Schnittstellen?

Weil sie falschen Gebrauch von einer Schnittstelle gemacht hätten, hat AOL die Macher des Programms „People Plus” abgemahnt. „People Plus” greift auf die Datenbank „Crunchbase” zu. Die aber ist unter Creative Commons lizenziert.

„People Plus“ ist eine App, auf die man wahrscheinlich nur in Kalifornien kommen kann. Wer auf Anlässen in der Tech-Welt nicht weiß, mit wem er gerade warum redet, dem zeigt sie auf der Datenbrille an: Das ist Venturekapitalist Tim Chang vom Mayfield Fund – zum Beispiel. Zusätzlich gibt es Funktionen wie „Find investors nearby“ („Investoren in der Nähe finden“). Wenn man so will: Das Modell Linkedin plus Augmented Reality, was im Reklamevideo so aussieht:

Um Antworten auf die Fragen „Who am I talking to? Why are we talking?“ zu finden, durchsucht das Programm unter anderem Check-Ins bei Veranstaltungen, aber auch Unternehmensdatenbanken wie Crunchbase, einem Seitenprojekt von Techcrunch, das wie das Blog selbst seit 2010 zu AOL gehört. Über die Nutzung der Datenbank gibt es jetzt einen Lizenzstreit: Zwar hat Crunchbase seine nach dem Wikipedia-Prinzip aufgebaute Datenbank unter Creative Commons gestellt, People Plus darf sie also gemäß den Bedingungen der Lizenz (Namensnennung) ohne Rückfrage verwenden.

Nutzungsbedinungen vs. Creative-Commons-Lizenz

Die API – also die technische Schnittstelle – zur Datenbank aber hat Crunchbase mit Nutzungsbedingungen versehen, die die Freiheiten der Creative-Commons-Lizenz de facto wieder aufheben. Crunchbase behält sich darin vor, Nutzungen zu überprüfen, die „mehr konkurrierend als komplementär” sind („CrunchBase reserves the right to continually review and evaluate all uses of the API, including those that appear more competitive than complementary in nature.”)

So hat People Plus Anfang November Anwaltspost von AOL bekommen und soll die Nutzung der Crunchbase unterlassen. Doch soweit es um Nutzungsrechte geht, sagen die Creative-Commons-Lizenzen in der Langfassung: Zusätzliche Beschränkungen über die Lizenz hinaus sind ungültig – nur im Einzelfall können Änderungen per „schriftlicher Vereinbarung“ getroffen werden:

This License constitutes the entire agreement between the parties with respect to the Work licensed here. (…) Licensor shall not be bound by any additional provisions that may appear in any communication from You. This License may not be modified without the mutual written agreement of the Licensor and You.

Folgt man dem Beitrag eines Entwicklers bei Hackernews, war die API zunächst für alle Nutzer offen – eine Registrierung mit zusätzlichen Vereinbarungen sei erst später eingeführt worden.

CC bleibt CC

In einem Statement gegenüber „Wired” räumt Creative-Commons-Chefjuristin Diane Peters ein, dass Unternehmen zwar Nutzungsbedinungen für APIs festlegen können, sie weist aber darauf hin, dass bereits veröffentlichte Inhalte dennoch nicht zurückgezogen werden können. Was draußen ist, ist draußen – so lässt sich der Grundsatz umschreiben, dass ein per Creative Commons erteiltes Nutzungsrecht nicht nachträglich zurückgezogen werden kann. AOL könnte jetzt zwar die Lizenz ändern, aber Daten, die schon freigegeben wurden, blieben trotzdem frei.

Landet der Streit vor Gericht, könnte eine der Fragen lauten: Lässt sich die Nutzung von Datenbanken von der Nutzung der Schnittstellen abgrenzen, über die eben diese Datenbanken ausgelesen werden? Im Moment sieht es allerdings so aus, als wollten sich AOL und die People-Plus-Macher ohne Gerichte einig werden. People Plus hat die Nutzung der API vorerst eingestellt, ohne aber die alten Daten aus seinem Dienst zu entfernen. Techcrunch und dem Crunchbase-Projekt wiederum stünde es wohl schlecht zu Gesicht, sich gerichtliche Scharmützel mit Startups zu liefern.

Deutlich macht der Streit vor allem, dass der Eigentümer AOL augenscheinlich Nutzungen der Datenbank verbieten will, um einen potenziellen Konkurrenten auszubremsen, der die Daten auf neue Weise verwendet – während das Unternehmen sich unter dem Motto „Disrupt AOL!” zugleich für seine „Open Philosophy” rühmt. Zu versuchen, „Disruption” von außen mit Zusatzvereinbarungen zur Creative-Commons-Lizenz zu verhindern, ist aber wenig Erfolg versprechend – die Lizenz soll neue Nutzungen schließlich gerade ermöglichen.

August 08 2013

Diffbot aide les ordinateurs à comprendre les pages web - Technology Review

Diffbot aide les ordinateurs à comprendre les pages web - Technology Review
http://www.technologyreview.com/news/428056/a-startup-hopes-to-help-computers-understand-web-pages

La Technology Review revient sur Diffbot - http://www.diffbot.com - un programme qui vise à comprendre les pages web pour mieux les exploiter. Ses dernières avancées lui permettent par exemple de repérer quand une page parle d’un produit pour en extraire des informations (comme le prix) afin d’adapter son site web à la concurrence par exemple. Le programme est accessible via des interfaces de programmation... Tags : internetactu2net (...)

#sémantique #medias #marketing #APi #interfacesdeprogrammation

May 27 2013

February 15 2013

Masking the complexity of the machine

The Internet has thrived on abstraction and modularity. Web services hide their complexity behind APIs and standardized protocols, and these clean interfaces make it easy to turn them into modules of larger systems that can take advantage of the most intelligent solution to each of many problems.

The Internet revolutionized the software-software interface; the industrial Internet will revolutionize the software-machine interface and, in doing so, will make machines more accessible. I’m using “access” very broadly here — interfaces will make machines accessible to innovators who aren’t necessarily experts in physical machinery, in the same way that the Google Maps API makes interactive mapping an accessible feature to developers who aren’t expert cartographers and front-end developers. And better access for people who write software means wider applications for those machines.

I’ve recently encountered a couple of widely different examples that illustrate this idea. These come from very different places — an aerospace manufacturer that has built strong linkages between airplanes and software, and an advanced enthusiast who has built new controllers for a pair of industrial robots — but they both involve the development of interfaces that make machines accessible.

The Centaur, built by Aurora Flight Sciences, is an optionally-piloted aircraft: it can be flown remotely, as a drone, or by a certified pilot sitting in the plane, which satisfies U.S. restrictions against domestic drone use. Customers include defense agencies and scientists, who might need a technician onboard to monitor equipment in some cases but in others send the plane on long trips well beyond a human’s comfort and safety limitations.

John Langford, Aurora’s founder, described his company’s work to me and in the process offered a terrific characterization of what the industrial Internet does: “We’re masking the complexity of the machine.”

The intelligence that Aurora layers onto its planes reduces the entire flight process to an API. The Centaur can even be flown from the pilot’s seat in the plane through the remote-operator control. In other words, Aurora has so comprehensively captured the mechanism of flight in its software that a pilot might as well fly the airplane he’s sitting in through the digital pipeline rather than directly through the flight deck’s physical links.

A highly-evolved interface between airplane and its software means that the software can draw insight from the plane, reading control settings as well as sensors to improve its piloting performance. “An experienced human pilot might have [flown] 10,000 to 20,000 hours,” says Langford. “We already have operating systems that have hundreds of thousands of flying hours on them. Every anomaly gets built into the memory of the system. As the systems learn, you only have to see something once in order to know how to respond. The [unmanned aircraft] has flight experience that no human pilot will ever build up in his lifetime.”

The simplified interface between humans and the Centaur’s combined machinery and software might eventually make flight vastly more accessible. “What we think the robotic revolution really does is remove operating an air vehicle from the priesthood that it’s part of today, and makes it accessible to people with lower levels of training,” he says.

Trammell Hudson's PUMA robotic arm setup at NYC Resistor, with laptop running kinematics library, homemade controller stack, and robot.Trammell Hudson's PUMA robotic arm setup at NYC Resistor, with laptop running kinematics library, homemade controller stack, and robot.

Trammell Hudson's PUMA robotic arm setup at NYC Resistor, with laptop running kinematics library, homemade controller stack, and robot.

I saw a different kind of revolutionary accessibility at work when I visited Trammell Hudson at NYC Resistor, a hardware collective in Brooklyn. I came across Hudson through a blog post he wrote detailing his rehabilitation of a pair of industrial robots — reverse-engineering their controls and building his own new controller stack in place of the PLCs that had operated them before they were salvaged from a factory with wire cutters.

“The arm itself has no smarts — just motors and quadrature encoders,” he says. (Even the arm’s current position is stored in the controller’s memory, not the robot’s.) Hudson had to write his own smarts for the robot, from scratch — intelligence that, when the robot was new, resided in purpose-built controllers the size of mini-fridges but that today can be built from open-source software libraries and run on an inexpensive microprocessor.

The robot’s kinematics — the spatial intelligence that decides how to get the robot’s hand from one place to another by repositioning six different joints — run on Hudson’s laptop. He’s interested in building those mathematical models directly into a controller that could be built from widely-available parts by anyone else with a similar robot, which could give second lives to thousands of high-quality industrial automation components by taking discarded machines and assigning new intelligence to them.

“The hardware itself is very durable,” Hudson told me. “The software is where the interesting things are happening, and the controllers age very rapidly.

Hudson’s remarkable feat of Saturday-afternoon electrical engineering was made possible by open-source microcontrollers, software libraries, and hardware interfaces (and, naturally, his own ingenuity). But he told me the most important factor in the success of his project was the rise of an online community that has an extraordinarily specialized and sophisticated understanding of electronics. “The ease of finding information now is incredible,” he said. “Some guy posted the correct voltage for releasing the arm’s brake, and I was able to find it in a few minutes and avoid damaging anything.”

“We went through a white-collar dark ages in the 1980s,” Hudson said. “People stopped building things. No one took shop class.” Now hardware components, abstracted and modularized, have become accessible to anyone with a technical mindset, who can improve the physical world by writing more intelligence onto it.

In an earlier reverse-engineering project, Hudson wrote his own firmware, which became Magic Lantern, for Canon’s 5D Mark II digital SLR camera. “I have a 4 by 5 [inch] camera from the 1890s — with my Canon 5D Mark II attached to the back,” he says. “The hardware on the old camera is still working fine, but the software on the 5D is way better than chemical film.”


This is a post in our industrial Internet series, an ongoing exploration of big machines and big data. The series is produced as part of a collaboration between O’Reilly and GE.

September 09 2012

The many sides to shipping a great software project

Chris Vander Mey, CEO of Scaled Recognition, and author of a new O’Reilly book, Shipping Greatness, lays out in this video some of the deep lessons he learned during his years working on some very high-impact and high-priority projects at Google and Amazon.

Chris takes a very expansive view of project management, stressing the crucial decisions and attitudes that leaders need to take at every stage from the team’s initial mission statement through the design, coding, and testing to the ultimate launch. By merging technical, organizational, and cultural issues, he unravels some of the magic that makes projects successful.

Highlights from the full video interview include:

  • Some of the projects Chris has shipped. [Discussed at the 0:30 mark]
  • How to listen to your audience while giving a presentation. [Discussed at the 1:24 mark]
  • Deadlines and launches. [Discussed at the 6:40 mark]
  • Importance of keeping team focused on user experience of launch. [Discussed at the 12:15 mark]
  • Creating an API, and its relationship to requirements and Service Oriented Architectures. [Discussed at the 15:27 mark]
  • 22:36 What integration testing can accomplish. [Discussed at the 22:36 mark]

You can view the entire conversation in the following video:

June 01 2012

March 23 2012

Top Stories: March 19-23, 2012

Here's a look at the top stories published across O'Reilly sites this week.

Why StreetEasy rolled its own maps
Google's decision to start charging for its Maps API is leading some companies to mull other options. StreetEasy's Sebastian Delmont explains why and how his team made a change.

What is Dart?
Dart is a new structured web programming platform designed to enable complex, high-performance apps for the modern web. Kathy Walrath and Seth Ladd, members of Google's developer relations team, explain Dart's purpose and its applications.

My Paleo Media Diet
Jim Stogdill is tired of running on the info treadmill, so he's changing his media habits. His new approach: "Where I can, adapt to my surroundings; where I can't, adapt my surroundings to me."


The unreasonable necessity of subject experts
We can't forget that data is ultimately about insight, and insight is inextricably tied to the stories we build from the data. Subject experts are the ones who find the stories data wants to tell.

Direct sales uncover hidden trends for publishers
A recent O'Reilly customer survey revealed unusual results (e.g. laptops/desktops remain popular ereading devices). These sorts of insights are made possible by O'Reilly's direct sales channel.


Where Conference 2012 is where the people working on and using location technologies explore emerging trends in software development, tools, business strategies and marketing. Save 20% on registration with the code RADAR20.

March 22 2012

Four short links: 22 March 2012

  1. Stamen Watercolour Maps -- I saw a preview of this a week or two ago and was in awe. It is truly the most beautiful thing I've seen a computer do. It's not just a clever hack, it's art. Genius. And they're CC-licensed.
  2. Screens Up Close -- gorgeous microscope pictures of screens, showing how great the iPad's retina display is.
  3. Numbers API -- CUTE! Visit it, even if you're not a math head, it's fun.
  4. China Now Leads the World in New iOS and Android Device Activations (Flurry) -- interesting claim, but the graphs make me question their data. Why have device activations in the US plummeted in January and February even as Chinese activations grew? Is this an artifact of collection or is it real?

March 05 2012

Profile of the Data Journalist: The API Architect

Around the globe, the bond between data and journalism is growing stronger. In an age of big data, the growing importance of data journalism lies in the ability of its practitioners to provide context, clarity and, perhaps most important, find truth in the expanding amount of digital content in the world. In that context, data journalism has profound importance for society.

To learn more about the people who are doing this work and, in some cases, building the newsroom stack for the 21st century, I conducted a series of email interviews during the 2012 NICAR Conference.

Jacob Harris (@harrisj) is an interactive news developer based in New York City. Our interview follows.

Where do you work now? What is a day in your life like?

I work in the Interactive Newsroom team at the New York Times. A day in my life is usually devoted to coding rather than meetings. Currently, I am almost exclusively devoted to the NYT elections coverage, where I switch between the operations of loading election results from the AP or building internal APIs that provide data to our various parts of elections.nytimes.com. I also sometimes help fix problems in our server stack when they arise or sometimes get involved in other projects if they need me.

How did you get started in data journalism? Did you get any special degrees or certificates?

I have a classical CS education, with a combined B.A./M.Eng from MIT. I have no journalism background or experience. I never even worked for my newspaper in college or anywhere. I do have a profound skepticism and contrarian nature that does help me fit in well with the journalists.

Did you have any mentors? Who? What were the most important resources they shared with you?

I don't have any specific mentors. But that doesn't mean I haven't been learning from anybody. We're in a very open team and we all usually learn things from each other. Currently, several of the frontend guys are tolerating my new forays into Javascript. Soon, the map guys will learn to bear my questions with patience.

What does your personal data journalism "stack" look like? What tools could you not live without?

Our actual web stack is built on top of EC2, with Phusion Passenger and Ruby on Rails serving our apps. We also use haproxy as a load balancer. Varnish is an amazing cache that everybody should use. On my own machine, I do my coding currently in Sublime Text 2. I use Pivotal Tracker to track my coding tasks. I could probably live with a different editor, but don't take my server stack away from me.

What data journalism project are you the most proud of working on or creating?

I have two projects I'm pretty proud of working on. Last year, I helped out with the Wikileaks War Logs reporting. We built an internal news app for the reporters to search the reports, see them on a map, and tag the most interesting ones. That was an interesting learning experience.

One of the unique things I figured out was how to extract MGRS coordinates from within the reports to geocode the locations inside of them. From this, I was able to distinguish the locations of various homicides within Baghdad more finely than the geocoding for the reports. I built a demo, pitched it to graphics, and we built an effective and sobering look at the devastation on Baghdad from the violence.

This year, I am working on my third election as part of Interactive News. Although we are proud of our team's work in 2008 and 2010, we've been trying some new ways of presenting our election coverage online and new ways of architecting all of our data sources so that it's easier to build new stuff. It's been gratifying to see how internal APIs combine with novel new storytelling formats and modern browser technologies this year.

Where do you turn to keep your skills updated or learn new things?

Usually, I just find out about things by following all the other news app developers on Twitter. We're a small ecosystem with lots of sharing. It's great how everybody learns from each other. I have created a Twitter list @harrisj/news-hackers to help keep tabs on all the cool stuff being done out there. (If you know someone who should be on it, let me know.)


Why are data journalism and "news apps" important, in the context of the contemporary digital environment for information?

We live in a world of data. Our reporting should do a better job of presenting and investigating that data. I think it's been an incredible time for the world of news applications lately. A few years back, it was just an achievement to put data online in a browsable way.

These days, news applications are at a whole other level. Scott Klein of ProPublica put it best when he described all good data stories as including both the "near" (individual cases, examples) and the "far" (national trends, etc.).

In an article, the reporter would be pick a few compelling "nears" for the story. As a reader, I also would want to know how my school is performing or how polluted my water supply is.

This is what news applications can do: tell the stories that are found in the data, but also allow the readers to investigate the stories in the data that are highly important to them.

This interview has been edited and condensed for clarity.

December 19 2011

Six API predictions for 2012

For businesses, APIs are clearly evolving from a nice-to-have to a must-have. Externalization of back-end functionality so that apps can interact with systems, not just people, has become critical.

As we move into 2012, several API trends are emerging.

Enterprise APIs becoming mainstream

I see a lot of discussion about Facebook, Twitter and other public APIs. However, the excitement of these public APIs hides the real revolution. Namely, enterprises of all sizes are API-enabling their back-end systems. This opens up the aperture of the use of back-end systems, not just through apps built by the enterprise, but also through apps built by partners and independent developers.

For example, several large telecom enterprises, like AT&T, are embracing APIs because, even with their abundant resources, they cannot match what the world outside the enterprise can do for them — build apps that, in the end, bring in more business. Today, I estimate that 10% of enterprises are doing APIs, and another 10% are considering it. In 2012, I predict that these percentages are more likely to be 30% and 80%, respectively, reflecting the pace at which APIs are going mainstream.

API-centric architectures will be different from portal-centric or SOA-centric architectures

Websites (portals) are for people integration. Service-orientated architectures (SOA) are for app-to-app integration. While both websites and SOA use back-end systems through "internal" APIs, the new API world focuses on integration with apps and developers, not with people (via portals) or processes (via SOA). There are three specific things that are different:

  1. Enterprises need to think outside-in as opposed to inside-out. In an outside-in model, one would start with easy consumption (read REST) of perhaps "chatty" APIs and then improve upon them. This is in contrast to thinking performance first and ease of use second.
  2. Enterprises have to be comfortable handling unpredictable demand and rapidly changing usage patterns as opposed to the more predictable patterns in the enterprise software environment.
  3. Enterprises will need to make websites and even some internal processes clients of the "new" API layer instead of having them continue to use back-end systems directly. In this way, APIs will become the de facto and default way of accessing the back-end systems. Also, increasingly, the API layer will be delivered through a cloud model to handle the more rapid and evolving provisioning requirements.

Data-centric APIs increasingly common

Siri and WolframAlpha are great examples of data-centric APIs. There is a huge market for data, and today it is mostly made available through custom feeds (for example, Dun & Bradstreet) or through a sea of xls/csv files on a website (for example, Data.gov). The former is a highly paid model, and the latter is free-for-all model. Clearly, a new model will — and already is — emerging in the middle. This is the model in which data is brokered by APIs and free and freemium models will co-exist. Expect to see more examples of enterprises for which data is the primary business and where using the data through apps is the new business model.

The first thing enterprises like this are doing is to API-enable their data. Now, RESTifying data is not easy, and there are as many schools of thought on how best to do it as there are data providers. However, I expect some combination of conventional and de facto standards, such as the Open Data Protocol (OData), to become increasingly common. I do not believe that the semantic web or the Resource Description Framework (RDF) model of data interchange is the answer. It goes against the grain of ease of use and adoption.

Many enterprises will implement APIs just to get analytics

A common theme in enterprise technologies is that a spend happens first in business automation and second in business optimization. The former enables bottom-line improvements; the latter enables top-line improvements. The API-adoption juggernaut is currently focused on business automation. However, as more and more traffic flows through the APIs, analytics on these APIs provides an increasingly better view of the performance of the enterprise, thereby benefiting IT and business optimizations. If this trend continues and if business optimization is the ultimate goal, a logical conclusion is that APIs become a means to the end for optimization. Therefore, all enterprises focused on business optimization must implement APIs so they have one "choke point" from which a lot of business optimization analytics can derive data.

APIs optimized for the mobile developer

Mobile apps are becoming recognized as the primary driver for API development and adoption. There are many different devices, and each has its own requirements. Most mobile apps have been developed for iPhone (iOS) and Android devices, but the next big trend is HTML5/JavaScript for apps that can run on any device.

Mobile devices in general need to receive less data in API responses and should not have to make repeated API calls to perform simple tasks. Inefficient APIs make things worse for the app developer and the API provider because problems are multiplied by mobile demand patterns (many small API requests) and concurrency (the sheer number of devices hitting the API at once). In 2012, many providers will realize they need to:

  • Let developers filter the size and content of the API response before it's returned to the app.
  • Give developers the right format for their app environment — plist for iOS and JSONP for HTML5/JavaScript.

OAuth 2.0 as the default security model

Apps are the new intermediaries in the digital world, enabling buyers and sellers to meet in ways that make the most sense. In the context of APIs, the buyer is the end-user and the seller is the API provider. Good apps are the ones that can package the provider's API in a great user experience that encourages the end user to participate. The growth of apps as intermediaries with valued services like Salesforce.com, Twitter, Facebook, eBay, and others requires a way for users to try the app for the first time without compromising their private data and privileges.

OAuth 2.0 makes it easy for end users to adopt new apps because they can test them out. If they don't like or don't trust an app, users can terminate the app's access to their account. In 2012, this will be the default choice for securing APIs that enable end-users to interact through apps with their valued services.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Related:

November 25 2011

Top Stories: November 21-25, 2011

Here's a look at the top stories published across O'Reilly sites this week.

Congress considers anti-piracy bills that could cripple Internet industries
With the SOPA and PROTECT IP acts, members of the U.S. Congress have advanced legislation that could undermine Internet industries and harm freedom of expression online.

Jonathan's Card: Lessons from a social experiment
Jonathan Stark raised eyebrows last summer when he made his Starbucks card available for anyone to use. Here, Stark looks back on the "Jonathan's Card" experiment and examines its lessons.


Exposing content via APIs
In this TOC podcast, Fluidinfo CEO Terry Jones says the real world is "writable" and he describes how APIs can offer powerful publishing solutions.

EPUB 3: Building a standard on unstable ground
"What is EPUB 3?" author Matt Garrish explains how EPUB 3 is shaped by web standards and how it addresses accessibility. He also shares his thoughts on Amazon's KF8 and why EPUB will stay one step ahead of the competition.



Tools of Change for Publishing, being held February 13-15 in New York, is where the publishing and tech industries converge. Register to attend TOC 2012.

September 16 2011

Four short links: 16 September 2011

  1. A Quick Buck by Copy and Paste -- scorching review of O'Reilly's Gamification by Design title. tl;dr: reviewer, he does not love. Tim responded on Google Plus. Also on the gamification wtfront, Mozilla Open Badges. It talks about establishing a part of online identity, but to me it feels a little like a Mozilla Open Gradients project would: cargocult-confusing the surface for the substance.
  2. Google + API Launched -- first piece of a Google + API is released. It provides read-only programmatic access to people, posts, checkins, and shares. Activities are retrieved as triples of (subject, verb, object), which is semweb cute and ticks the social object box, but is unlikely in present form to reverse Declining numbers of users.
  3. Cube -- open source time-series visualization software from Square, built on MongoDB, Node, and Redis. As Artur Bergman noted, the bigger news might be that Square is using MongoDB (known meh).
  4. Tenzing -- an SQL implementation on top of Map/Reduce. Tenzing supports a mostly complete SQL implementation (with several extensions) combined with several key characteristics such as heterogeneity, high performance, scalability, reliability, metadata awareness, low latency, support for columnar storage and structured data, and easy extensibility. Tenzing is currently used internally at Google by 1000+ employees and serves 10000+ queries per day over 1.5 petabytes of compressed data. In this paper, we describe the architecture and implementation of Tenzing, and present benchmarks of typical analytical queries. (via Raphaël Valyi)

September 01 2011

Strata Week: What happens when 200,000 hard drives work together?

Here are a few of the data stories that caught my attention this week.

IBM's record-breaking data storage array

Hard Drive by walknboston, on FlickrIBM Research is building a new data storage array that's almost 10 times larger than anything that's been built before. The data array is comprised of 200,000 hard drives working together, with a storage capacity of 120 petabytes — that's 120 million gigabytes. To give you some idea of the capacity of the new "drive," writes MIT Technology Review, "a 120-petabyte drive could hold 24 billion typical five-megabyte MP3 files or comfortably swallow 60 copies of the biggest backup of the Web, the 150 billion pages that make up the Internet Archive's WayBack Machine."

Data storage at that scale creates a number of challenges, including — no surprise — cooling such a massive system. But other problems include handling failure, backups and indexing. The new storage array will benefit from other research that IBM has been doing to help boost supercomputers' data access. Its General Parallel File System was designed with this massive volume in mind. The GPFS spreads files across multiple disks so that many parts of a file can be read or written at once. This system already demonstrated that it can perform when it set a new scanning speed record last month by indexing 10 billion files in just 43 minutes.

IBM's new 120-petabyte drive was built at the request of an unnamed client that needed a new supercomputer for "detailed simulations of real-world phenomena."

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code ORM30

Infochimps' new Geo API

InfoChimpsThe data marketplace Infochimps released a new Geo API this week, giving developers access to a number of disparate location-related datasets via one API with a unified schema.

According to Infochimps, the API addresses several pain points that those working with geodata face:

  1. Difficulty in integrating several different APIs into one unified app
  2. Lack of ability to display all results when zoomed out to a large radius
  3. Limitation of only being able to use lat/long

To address these issues, Infochimps has created a new simple schema to help make data consistent and unified when drawn from multiple sources. The company has also created a "summarizer" to intelligently cluster and better display data. And finally, it has also enabled the API to handle queries other than just those traditionally associated with geodata, namely latitude and longitude.

As we seek to pull together and analyze all types of data from multiple sources, this move toward a unified schema will become increasingly important.

Hurricane Irene and weather data

The arrival of Hurricane Irene last week reiterated the importance not only of emergency preparedness but of access to real-time data — weather data, transportation data, government data, mobile data, and so on.

New York Times Hurricane Irene tracker
Screenshot from the New York Times' interactive Hurricane Irene tracking map. See the full version.

As Alex Howard noted here on Radar, crisis data is becoming increasingly social:

We've been through hurricanes before. What's different about this one is the unprecedented levels of connectivity that now exist up and down the East Coast. According to the most recent numbers from the Pew Internet and Life Project, for the first time, more than 50% of American adults use social networks. 35% of American adults have smartphones. 78% of American adults are connected to the Internet. When combined, those factors mean that we now see earthquake tweets spread faster than the seismic waves themselves. The growth of an Internet of things is an important evolution. What we're seeing this weekend is the importance of an Internet of people."

Got data news?

Feel free to email me.

Hard drive photo: Hard Drive by walknboston, on Flickr

Related:

Four short links: 1 September 2011

  1. A Chart Engine -- Android charting engine.
  2. The Illusion of Asymmetric Insight -- we are driven to create and form groups and then believe others are wrong just because they are others.
  3. Urban Mapping API -- add rich geographic data to web and non-web applications.
  4. Tell Us A Story, Victoria -- a university science story-telling contest.

August 02 2011

Scaling Google+

Last week at OSCON, Google social web engineer Joseph Smarr sat down for an interview with me about Google+, the long-awaited social network that the search engine giant launched earlier this summer.

We covered a lot of ground during the interview. Smarr connected what he and others learned from Plaxo Pulse, where he was the CTO, to how the Circles tool in Google+ builds granular control into public and private sharing. He also said scalability comes in different flavors — it's not just about infrastructure, but rapidly scaling the user interface with feedback. Finally, we talked about the future of the Google+ platform and the possibilities of an API (more on that below).

When asked about what surprised him the most, Smarr pointed to the high rate of public sharing on Google+, versus how the social network had been used internally at Google before launch. "People are getting these incredibly high engagement discussions," he said.

The Google+ API

Smarr said the Google+ team is spending a lot of time thinking about an API, drawing from what they learned from Google Buzz and the experiences of other Internet companies building platforms.

"On the one hand, clearly the goal is to not just have another social network, to really help not only Google's products but to make the web in general more social, more open, more connected, and APIs are a crucial piece of that," said Smarr. "We actually got a far way along the road with the Buzz APIs, and not only having a lot of access to the activities and the graph and so forth, but with a lot of these modern standards, like pubsubhubbub and Webfinger and so forth. That's the style of thing we'd like to bring. "

The challenge for the Google+ team working on the API, as Smarr explained, is that the devil is in the details.

"One of things people seem to really like about Google+ right now is it's 100% authentic," Smarr said. "Every piece of content was created by a real person sitting in front of Google+ and deciding who to share with. Balancing the obvious need to get more content in and out from more sources while maintaining that authenticity is something that we're spending a lot of time playing and iterating and coming up with. I think you'll see things trickle out over time as we get bits and pieces we're happy with."

Web 2.0 Summit, being held October 17-19 in San Francisco, will examine "The Data Frame" — focusing on the impact of data in today's networked economy.

Save $300 on registration with the code RADAR

Google+ and the identity issue

Smarr also talked about the nature of identity on Google+. Those following the launch of the service know that Google+ and pseudonymity is a hot-button issue. As the Electronic Fronter Foundation's Jillian C. York noted in an essay making a case for pseudonyms, Google+ has changed some of its processes, moving from immediate account deactivation to warning users about the issue and giving them an opportunity to align their Google+ username with its "real name" policy. This week, Kaliya "@IdentityWoman" Hamlin became the most recent person to have her Google+ account suspended. (Given her work and role in the digital identity space, Hamlin's use case is likely to be an interesting one.)

"There's cases where that authenticity and knowing that this is a real person with whatever name they tend to be called in the real world is really a feature," Smarr said during our discussion. "It changes the tone of discussions, it helps you find people you know in the real world. And so, wanting to make sure that there's a space that is preserved and promoted is really important. On any of these social networks, it's not enough to write the code, you have to make the right community. Lots of networks choose different approaches to how they do that, and they all have different consequences. It's not that one is inherently better or or more valid than the others, it's just that if you don't do anything about it, it will kind of take its own course."

There are clearly some gray areas here, particularly given Google's global reach into parts of the world were using your real identity to share content could literally be life-threatening. "Obviously there are a lot of cases where being able to share things not under your real identity is valuable and necessary, and Google has a lot of products like this today, like YouTube," said Smarr. "If you're posting videos of authoritarian governments during a revolution, you may not want to use your real name, and that seems pretty valid. Whether or not that type of use case will be supported in the Google+ as you know it today is something that we're all thinking through and figuring out, but it's not meant to stop you from doing that in other products."

Smarr had one other comment on identity that goes to the difficultly of creating social networks in domains that may be hostile to free expression: "It's not just enough to offer the ability to post under a pseudonymous identifier. If you're going to make the commitment that we're not going to out your real identity, that actually takes a lot of work, especially if you're using your real account to log in and then posting under a pseudonym. We feel a real responsibility that if we're going to make the claim to people 'it's safe, you're not going to get outed,' then we really need to think through the architecture and make sure there aren't any loopholes where all of a sudden you get outed. That's actually a hard thing to do in software … we don't want to do it wrong, and so we'd rather wait until we get it right."

As I said at the end of the interview, if anyone is going to solve the challenge of enabling its users to securely and anonymously connect to its social network, Google would have to be near the top of the list. One potential direction might be further integrating the Tor Project and the Android operating system in the context of a Google+ API. What's clear now, however, is that if Google+ looks like a social backbone for the Internet, there's still a lot of growth ahead.



Related:


May 18 2011

Four short links: 18 May 2011

  1. The Future of the Library (Seth Godin) -- We need librarians more than we ever did. What we don't need are mere clerks who guard dead paper. Librarians are too important to be a dwindling voice in our culture. For the right librarian, this is the chance of a lifetime. Passionate railing against a straw man. The library profession is diverse, but huge numbers of them are grappling with the new identity of the library in a digital age. This kind of facile outside-in "get with the Internet times" message is almost laughably displaying ignorance of actual librarians, as much as "the book is dead!" displays ignorance of books and literacy. Libraries are already much more than book caves, and already see themselves as navigators to a world of knowledge for people who need that navigation help. They disproportionately serve the under-privileged, they are public spaces, they are brave and constant battlers at the front line of freedom to access information. This kind of patronising "wake up and smell the digital roses!" wank is exactly what gives technologists a bad name in other professions. Go back to your tribes of purple cows, Seth, and leave librarians to get on with helping people find, access, and use information.
  2. An Old Word for a New World (PDF) -- paper on how "innovation", which used to be pejorative, came now to be laudable. (via Evgeny Mozorov)
  3. AlchemyAPI -- free (as in beer) entity extraction API. (via Andy Baio)
  4. Referrals by LinkedIn -- the thing with social software is that outsiders can have strong visibility into the success of your software, in a way that antisocial software can't.

May 13 2011

Winners of the writable API competition

Last month we ran a developer competition around the newly released Fluidinfo writable API for O'Reilly books and authors. The three judges — Tim O'Reilly, O'Reilly editor Mike Loukides, and O'Reilly GM Joe Wikert — have today announced the winners.

First prize: Book Chirpa

Book Chirpa

Mark McSpadden gets first prize for Book Chirpa. Mark wins an OSCON package that includes a full conference pass, coach airfare from within the US, and 4 nights hotel accommodation. Book Chirpa was "built to explore what people on Twitter are saying about O'Reilly books." It can show you the stream of O'Reilly book mentions, trending books, or a virtual library of all O'Reilly books mentioned on Twitter.

Second prize: Skillshelves

Skillshelf

Jonas Neubert gets second prize for Skillshelves. Jonas wins his choice of either an iPad 2 or a Xoom tablet. Skillshelves lets you "Show the world what tech topics you are an expert in — simply by making a list of O'Reilly books in your bookshelf."

Third prize: FluidCV

FluidCV

Eric Seidel gets third prize for FluidCV. Eric wins his choice of $500 worth of O'Reilly ebooks and/or videos. FluidCV pulls together information for your CV from tags in Fluidinfo, allowing the dynamic construction of a CV just by tagging relevant objects in Fluidinfo. Tag an O'Reilly book in Fluidinfo and the book cover and associated skill automatically appears in your CV. Eric's own FluidCV can be seen here.

Congratulations to the winners and many thanks to all who entered.

[Disclosure: Tim O'Reilly is an investor in Fluidinfo.]



Related:


May 11 2011

Four short links: 11 May 2011

  1. webshell -- command-line tool for debugging/exploring APIs, open sourced (Apache v2) and written in node.js. (via Sean Coates)
  2. sample -- command-line filter for random sampling of input. Useful when you've got heaps of data and want to run your algorithms on a random sample of it. (via Scott Vokes)
  3. Yale Offers Open Access To PD Materials in Collections -- The goal of the new policy is to make high quality digital images of Yale's vast cultural heritage collections in the public domain openly and freely available. No license will be required for the transmission of the images & no limitations imposed on their use. (via Fiona Rigby)
  4. Resistance to Putting Lectures Online (Sydney Morning Herald) -- lecturers are worried that their off-the-cuff mistakes would be mocked on YouTube (they will be), but also that students wouldn't attend lectures. Nobody seems to have asked whether students actually learn from lectures.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl