Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 12 2014

Searching for the software stack for the physical world

When I flip through a book on networking, one of the first things I look for is the protocol stack diagram. An elegant representation of the protocol stack can help you make sense of where to put things, separate out important mental concepts, and help explain how a technology is organized.

I’m no stranger to the idea of trying to diagram out a protocol space; I had a successful effort back when the second edition of my book on 802.11 was published. I’ve been a party to several awesome conversations recently about how to organize the broad space that’s referred to as the “Internet of Things.”IoT StackIoT Stack

Let’s start at the bottom, with the hardware layer, which is labeled Things. These are devices that aren’t typically thought of as computers. Sure, they wind up using a computing ecosystem, but they are not really general-purpose computers. These devices are embedded devices that interact with the physical world, such as sensors and motors. Primarily, they will either be the eyes and ears of the overall system, or the channel for actions on the physical world. They will be designed around low power consumption, and therefore will use a low throughput communication channel. If they communicate with a network, it will typically be through a radio interface, but the tyranny of limited power consumption means that the network interface will usually be limited in some way.

Things provide their information or are instructed to act on the world by connecting through a Network Transport layer. Networking allows reports from devices to be received and acted on. In this model, the network transport consists of a set of technologies that move data around, and is a combination of the OSI data link, networking, and transport layers. Mapping into technologies that we use, it would be TCP/IP on Wi-Fi for packet transport, with data carried over a protocol like REST.

The Data layer aggregates many devices into an overall collective picture. In the IoT world, users are interacting with the physical world and asking questions like “what is the temperature in this room?” That question isn’t answered by just one device (or if it is, the temperature of the room is likely to fluctuate wildly). Each component of the hardware layer at the bottom of the stack contributes a piece of the overall contextual picture. A light bulb can be on or off, but to determine the desired state, you might mix in overall power consumption of the building, how many people are present, the locations of those people, total demand on the electrical grid, and possibly even the preferences of the people in an area. (I once toured an FAA regional air traffic control center, and the groups of controllers that serve each sub-region of airspace could customize the lighting, ranging from normal lighting to quite dim lighting.)

The value of the base level of devices in this environment depends on how much unique context it can add to the overall picture and enable the operation of software that can operate on concepts of the physical world like room temperature or whether I am asleep. Looking back on it, the foundation for the data layer was being laid years ago, as is obvious reading section three of Tim O’Reilly’s “What is Web 2.0?” essay from 2005. In this world, you are either contributing to or working with context, or you are not doing very interesting work.

Sharing context widely requires APIs. If the real-world import of a piece of data is only apparent when it is combined with other sources of data, an application needs to be able to mash up several data sources to create its own unique picture of the world. APIs enable programmers to build context that represents what is important to users and build a cognitively significant aggregation. “Room temperature” may depend on getting data from temperature, humidity, and sunlight sensors, perhaps in several locations in a room.

In addition to reporting up, APIs need to enable control to flow down the stack. If “room temperature” is a complex state that depends on data from several sensors, it may require acting on several aspects of a climate control system to change: the heater, air conditioner, fan, and maybe even whether the blinds in a room are open or closed.

Designing APIs is hard; in addition to getting the data operations and manipulation right, you need to design for performance, security, and to enable applications over time. A good place to start is O’Reilly’s API strategy guide.

Finally, we reach the top of the stack when we get to Applications. Early applications allowed you to control the physical world like a remote control. Nice, but by no means the end of the story. One of the smartest technology analysts I know often challenges me by saying, “It’s great that you can report anomalous behavior. If you know something is wrong, do something about it!” The “killer app” in this world is automation: being able to work without explicit instructions from the user, or to continuously fine-tune and optimize the real world based on what has flowed up the stack.

Unlike my previous effort in depicting the world of 802.11, this stack is very much a work in progress. Please leave your thoughts in the comments.

September 05 2013

Four short links: 5 September 2013

  1. Bezos at the Post (Washington Post) — “All businesses need to be young forever. If your customer base ages with you, you’re Woolworth’s,” added Bezos.[...] “The number one rule has to be: Don’t be boring.” (via Julie Starr)
  2. How Carnegie-Mellon Increased Women in Computer Science to 42% — outreach, admissions based on potential not existing advantage, making CS classes practical from the start, and peer support.
  3. Summingbird (Github) — Twitter open-sourced library that lets you write streaming MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce platforms like Storm and Scalding.
  4. aws-cli (Github) — commandline for Amazon Web Services. (via AWS Blog)

July 11 2013

Four short links: 11 July 2013

  1. Sifted — 7 minute animation set in a point cloud world, using photogrammetry in film-making. My brilliant cousin Ben wrote the software behind it. See this newspaper article and tv report for more.
  2. Vehicle Tech Out of Sync with Drivers’ DevicesFord Motor Co. has its own system. Apple Inc. is working with one set of automakers to design an interface that works better with its iPhone line. Some of the same car companies and others have joined the Car Connectivity Consortium, which is working with the major Android phone brands to develop a different interface. FFS. “… you are changing your phone every other year, and the top-of-mind apps are continuously changing.” That’s why Chevrolet, Mini and some other automakers are starting to offer screens that mirror apps from a smartphone.
  3. Incentives in Notice and Takedown (PDF) — findings summarised in Blocking and Removing Illegal Child Sexual Content: Analysis from a Technical and Legal Perspective: financial institutions seemed to be relatively successful at removing phishing websites while it took on average 150 times longer to remove child pornography.
  4. OpenCV for Processing (Github) — OpenCV for Processing is based on the official OpenCV Java bindings. Therefore, in addition to a suite of friendly functions for all the basics, you can also do anything that OpenCV can do. And a book from O’Reilly, and it’ll be CC-licensed. All is win. (via Greg Borenstein)

April 24 2013

Four short links: 25 April 2013

  1. Alcatraz — package manager for iOS. (via Hacker News)
  2. Scarfolk Council — clever satire, the concept being a UK town stuck in 1979. Tupperware urns, “put old people down at birth”. The 1979 look is gorgeous. (via BoingBoing)
  3. Stop Designing Fragile Web APIsIt is possible to design your API in a manner that reduces its fragility and increases its resilience to change. The key is to design your API around its intent. In the SOA world, this is also referred to as business-orientation.
  4. @life100yearsago (Twitter) — account that tweets out fragments of New Zealand journals and newspapers and similar historic documents, as part of celebrating the surprising and the commonplace during WWI. My favourite so far: “Wizard” stones aeroplane. (via NDF)

February 15 2013

Masking the complexity of the machine

The Internet has thrived on abstraction and modularity. Web services hide their complexity behind APIs and standardized protocols, and these clean interfaces make it easy to turn them into modules of larger systems that can take advantage of the most intelligent solution to each of many problems.

The Internet revolutionized the software-software interface; the industrial Internet will revolutionize the software-machine interface and, in doing so, will make machines more accessible. I’m using “access” very broadly here — interfaces will make machines accessible to innovators who aren’t necessarily experts in physical machinery, in the same way that the Google Maps API makes interactive mapping an accessible feature to developers who aren’t expert cartographers and front-end developers. And better access for people who write software means wider applications for those machines.

I’ve recently encountered a couple of widely different examples that illustrate this idea. These come from very different places — an aerospace manufacturer that has built strong linkages between airplanes and software, and an advanced enthusiast who has built new controllers for a pair of industrial robots — but they both involve the development of interfaces that make machines accessible.

The Centaur, built by Aurora Flight Sciences, is an optionally-piloted aircraft: it can be flown remotely, as a drone, or by a certified pilot sitting in the plane, which satisfies U.S. restrictions against domestic drone use. Customers include defense agencies and scientists, who might need a technician onboard to monitor equipment in some cases but in others send the plane on long trips well beyond a human’s comfort and safety limitations.

John Langford, Aurora’s founder, described his company’s work to me and in the process offered a terrific characterization of what the industrial Internet does: “We’re masking the complexity of the machine.”

The intelligence that Aurora layers onto its planes reduces the entire flight process to an API. The Centaur can even be flown from the pilot’s seat in the plane through the remote-operator control. In other words, Aurora has so comprehensively captured the mechanism of flight in its software that a pilot might as well fly the airplane he’s sitting in through the digital pipeline rather than directly through the flight deck’s physical links.

A highly-evolved interface between airplane and its software means that the software can draw insight from the plane, reading control settings as well as sensors to improve its piloting performance. “An experienced human pilot might have [flown] 10,000 to 20,000 hours,” says Langford. “We already have operating systems that have hundreds of thousands of flying hours on them. Every anomaly gets built into the memory of the system. As the systems learn, you only have to see something once in order to know how to respond. The [unmanned aircraft] has flight experience that no human pilot will ever build up in his lifetime.”

The simplified interface between humans and the Centaur’s combined machinery and software might eventually make flight vastly more accessible. “What we think the robotic revolution really does is remove operating an air vehicle from the priesthood that it’s part of today, and makes it accessible to people with lower levels of training,” he says.

Trammell Hudson's PUMA robotic arm setup at NYC Resistor, with laptop running kinematics library, homemade controller stack, and robot.Trammell Hudson's PUMA robotic arm setup at NYC Resistor, with laptop running kinematics library, homemade controller stack, and robot.

Trammell Hudson's PUMA robotic arm setup at NYC Resistor, with laptop running kinematics library, homemade controller stack, and robot.

I saw a different kind of revolutionary accessibility at work when I visited Trammell Hudson at NYC Resistor, a hardware collective in Brooklyn. I came across Hudson through a blog post he wrote detailing his rehabilitation of a pair of industrial robots — reverse-engineering their controls and building his own new controller stack in place of the PLCs that had operated them before they were salvaged from a factory with wire cutters.

“The arm itself has no smarts — just motors and quadrature encoders,” he says. (Even the arm’s current position is stored in the controller’s memory, not the robot’s.) Hudson had to write his own smarts for the robot, from scratch — intelligence that, when the robot was new, resided in purpose-built controllers the size of mini-fridges but that today can be built from open-source software libraries and run on an inexpensive microprocessor.

The robot’s kinematics — the spatial intelligence that decides how to get the robot’s hand from one place to another by repositioning six different joints — run on Hudson’s laptop. He’s interested in building those mathematical models directly into a controller that could be built from widely-available parts by anyone else with a similar robot, which could give second lives to thousands of high-quality industrial automation components by taking discarded machines and assigning new intelligence to them.

“The hardware itself is very durable,” Hudson told me. “The software is where the interesting things are happening, and the controllers age very rapidly.

Hudson’s remarkable feat of Saturday-afternoon electrical engineering was made possible by open-source microcontrollers, software libraries, and hardware interfaces (and, naturally, his own ingenuity). But he told me the most important factor in the success of his project was the rise of an online community that has an extraordinarily specialized and sophisticated understanding of electronics. “The ease of finding information now is incredible,” he said. “Some guy posted the correct voltage for releasing the arm’s brake, and I was able to find it in a few minutes and avoid damaging anything.”

“We went through a white-collar dark ages in the 1980s,” Hudson said. “People stopped building things. No one took shop class.” Now hardware components, abstracted and modularized, have become accessible to anyone with a technical mindset, who can improve the physical world by writing more intelligence onto it.

In an earlier reverse-engineering project, Hudson wrote his own firmware, which became Magic Lantern, for Canon’s 5D Mark II digital SLR camera. “I have a 4 by 5 [inch] camera from the 1890s — with my Canon 5D Mark II attached to the back,” he says. “The hardware on the old camera is still working fine, but the software on the 5D is way better than chemical film.”


This is a post in our industrial Internet series, an ongoing exploration of big machines and big data. The series is produced as part of a collaboration between O’Reilly and GE.

September 18 2012

Four short links: 18 September 2012

  1. The Rapture of the Nerds (Charlie Stoss, Cory Doctorow) — available for download and purchase under a CC-A-NC-ND license.
  2. Amazon Maps API — if there is an API layer of general use to developers, Amazon will build it. They want to be the infrastructure for the web. Tim identified “the Internet Operating System”, and Amazon figured out how to put a pricetag on every syscall.
  3. Hoektronics — open source 3d printer queue management. (via Daniel Suarez)
  4. The Machine Gaze (Will Wiles) — Converging, leapfrogging technologies evoke new emotional responses within us, responses that do not yet have names. (via James Bridle)

September 04 2012

True data liberation with IFTTTT and Google Drive

IFTTT action showing Twitter archiving to EvernoteIFTTT action showing Twitter archiving to Evernote

An example IFTTT action archives tweets to Evernote

The web service IFTTT (If this, then that) accesses popular web applications via their APIs, and lets users create new actions based on changes. For instance, actions such as “upload photos to Flickr when I add them to my Dropbox folder”, or “send me email when frost is forecast”.

I had been tempted to classify IFTTT as a merely an interesting toy for playing with social media. Granted, it’s nice that I can archive all my tweets into an Evernote note, but so what? However, IFTTT’s growth in features is showing it to be more than a bauble. The service is becoming an empowering tool that gives users more control over their own data, previously often accessible by programmers alone.

This evolution crystallized for me as IFTTT announced their Google Drive integration. In addition to supporting file storage functionality, much as it does with Dropbox, IFTTT allows you to add a new row to a Google Spreadsheet. Example applications might include copying all your bit.ly bookmarks (“bitmarks”) into a spreadsheet, or logging all the videos you upload to YouTube.

Why this is a big deal? Because for the first time, it gives IFTTT users fine-grained access to their data in a simple database. This data can be freely manipulated in Google Spreadsheets without the need for coding skills.

Web applications with APIs can provide a false openness. They empower developers to create value from data, under the terms the application provider dictates, but do little for regular users. The pledge of data liberation is a great thing, but again, what can the average user do with that data?

IFTTT is a great example of a solution that gives our data back, and lets us do useful work with it. It’s the kind of tool we’ll need more of as we adapt to the digital nervous system in our lives.

  • Business users should also check out Zapier, a similar offering to IFTTT, but with more focus on business web services.
Google drive features on IFTTTGoogle drive features on IFTTT

Example uses of Google Drive in IFTTT

July 31 2012

Four short links: 31 July 2012

  1. Christchurch’s Shot at Being Innovation Central (Idealog) — Christchurch, rebuilding a destroyed CBD after earthquakes, has released plans for the new city. I hope there’s budget for architects and city developers to build visible data, sensors, etc. so the Innovation Precinct doesn’t become the Tech Ghetto.
  2. Torque Pro (Google Play Store) — a vehicle / car performance/diagnostics tool and scanner that uses an OBD II Bluetooth adapter to connect to your OBD2 engine management/ECU. Can lay out out your dashboards, track performance via GPS, and more. (via Steve O’Grady)
  3. Drone Pilots (NY Times) — at the moment, the stories are all about the technology helping our boys valiantly protecting the nation. Things will get interesting when the new technology is used against us (we just saw the possibility of this with 3D printing guns). (via Dave Pell)
  4. Avalon (GitHub) — A cloud based translation and localization utility for Python which combines human and machine translation. There’s also a how-to. (via Brian McConnell)

July 30 2012

Four short links: 30 July 2012

  1. pathodA pathological HTTP daemon for testing and torturing client software. (via Hacker News)
  2. A Walk Through Twitter’s Walled Garden (The Realtime Report) — nice breakdown of Twitter’s business model choice and consequences. Twitter wants you to be able to see the pictures and read the articles shared in your its Tweets, without leaving the garden. Costolo told the Los Angeles Times that “Twitter is heading in a direction where its 140-character messages are not so much the main attraction but rather the caption to other forms of content.” (You know all the traffic that Twitter’s been driving to web sites? Don’t count on it being there next year.) (via Jim Stogdill)
  3. My Computing Environment (Jesse Vincent) — already have a set of those gloves on order.
  4. How Speedo Created a Record-Breaking Swimsuit (Scientific American) — A new 3-D printer at Aqualab fabricated prototypes of the cap and goggles for testing within hours, rather than sending drawings to a manufacturer and waiting weeks or months. “In the past we couldn’t do many changes to the original design,” Santry says. “With this process, we completely revolutionized the goggle from scratch.” (via Eric Ries)

July 25 2012

Four short links: 26 July 2012

  1. Drones Over Somalia are Hazard to Air Traffic (Washington Post) — In a recently completed report, U.N. officials describe several narrowly averted disasters in which drones crashed into a refu­gee camp, flew dangerously close to a fuel dump and almost collided with a large passenger plane over Mogadishu, the capital. (via Jason Leopold)
  2. Sequel Pro — free and open source Mac app for managing MySQL databases. It’s an update of CocoaMySQL.
  3. Neural Network Improves Accuracy of Least Invasive Breast Cancer Test — nice use of technology to make lives better, for which the creator won the Google Science Fair. Oh yeah, she’s 17. (via Miss Representation)
  4. Free Harder to Find on Amazon — so much for ASINs being permanent and unchangeable. Amazon “updated” the ASINs for a bunch of Project Gutenberg books, which means they’ve lost all the reviews, purchase history, incoming links, and other juice that would have put them at the top of searches for those titles. Incompetence, malice, greed, or a purely innocent mistake? (via Glyn Moody)

May 30 2012

May 04 2012

Developer Week in Review: Are APIs intellectual property?

Returning after a brief hiatus due to my annual spring head cold, welcome back to your weekly dose of all things programming. Last week, I was attending the Genomes, Environments and Traits conference (I'm a participant in the Personal Genome Project), when I got notified that WWDC registration had opened up. I ended up having to type in my credit card information on my iPhone while listening to the project organizers discuss what they were doing with the saliva I had sent them. The conference itself was very interesting (although I was coming down with the aforementioned cold, so I wasn't at the top of my game). The cost to sequence a genome is plummeting — it's approaching $1,000 a pop — and it has the potential to totally revolutionize how we think about health care.

It's also an interesting example of big data, but not how we normally think about it. An individual genome isn't all that big in the scheme of things (it's about 3GB uncompressed per genome), but there are huge computational challenges involved in relating individual variations in the genome to phenotype variations (in other words, figuring out what variations are responsible for traits or diseases).

While all the West Coast developers who slept through the WWDC registration period lick their wounds, here's the rest of the news.

APIs are copyrightable, unless they aren't?

These days, I feel like you need to consider a minor in law to go with your computer science degree. In the latest news from the front, we have conflicting opinions regarding the status of APIs. On the one hand, the judge in the Oracle versus Google lawsuit has instructed the jury they should assume that APIs are copyrightable. As the linked article discusses, this could have ominous implications for any third-party re-implementation of a programming language or other software that is not open source.

Over in Europe, however, a new ruling has stated that programming languages and computer functionality are not copyrightable. So, depending on which side of the ocean you live on, APIs are either open season, or off limits. No word yet as to the legal status of APIs on the Falkland Islands ...

Fluent Conference: JavaScript & Beyond — Explore the changing worlds of JavaScript & HTML5 at the O'Reilly Fluent Conference (May 29 - 31 in San Francisco, Calif.).

Save 20% on registration with the code RADAR20

Code to make your head hurt.

For those of you who like to celebrate the perversities of life, it's hard to beat the International Obfuscated C Competition, which just released its 2011 winners. For your viewing pleasure, we have programs that compute pi, chart histograms, and even judging programs for obfuscation, all written in a manner that will have code reviewers running to the toilet with terminal bouts of nausea.

And speaking of C ...

We tend to focus a lot of attention on emerging languages, partially because many of them have novel features, and partially because the grass is always greener in a different language. It's instructive to step back sometimes and take a look at what people are actually using. The latest TIOBE Programming Community Index, which measures how much code there is out there in each of the various languages, has a new top dog, and it's our old friend C. In fact, when you factor in C#, C++ and Objective-C, C-related languages pretty much own the category. Java has now fallen to the second position, and you have to go all the way down to sixth place to find a scripting language, PHP.

Importantly, all the hot new languages, like Erlang and Scala, don't even make the top 20, and you only need half-a-percentage point to get in that list. As much as we like the new darlings on the block, the old veterans still are where most of the action (and money) is.

Got news?

Please send tips and leads here.

Related:

Reposted bynunatak nunatak

February 28 2012

Four short links: 28 February 2012

  1. Designing RESTful Interfaces (Slideshare) -- extremely good presentation on how to build HTTP APIs.
  2. Manipulating History for Fun and Profit -- if you want to make websites that are AJAX-responsive but without breaking the back button or preventing links, read this.
  3. Why Textbooks Are So Broken (Salon) -- Let's say a publisher hires a developer for a certain low-bid fee to produce seven supplemental math books for grades 3-8. The product specs call for each student book and teacher guide to have page counts of roughly 100 pages and 80 pages, respectively. The publisher wants these seven books ready for press in five weeks—over 1,400 pages. To put this in perspective, in the not too recent past at least six months would be allotted for a project of this size. But publishers customarily shrink their deadlines to get a jump on the competition, especially in today's math market. Unreasonable turnaround times are part of the new normal, something that almost guarantees a lack of quality right out of the gate.
  4. exmobaby -- wireless biosensor baby pyjamas send ECG, skin temperature, and movement data via Zigbee. (via Jo Komisarczuk)

February 07 2012

Unstructured data is worth the effort when you've got the right tools


It's dawning on companies that data analysis can yield insights and inform business decisions. As data-driven benefits grow, so do our demands about what more data can tell us and what other types we can mine.

During her PhD studies, Alyona Medelyan (@zelandiya) developed Maui, an open source tool that performs as well as professional librarians in identifying main topics in documents. Medelyan now leads the research and development of API-based products at Pingar.

Pingar senior software researcher Anna Divoli (@annadivoli) studied sentence extraction for semi-automatic annotation of biological databases. Her current research focuses on developing methodologies for acquiring knowledge from textual data.

"Big data is important in many diverse areas, such as science, social media, and enterprise," observes Divoli. "Our big data niche is analysis of unstructured text." In the interview below, Medelyan and Divoli describe their work and what they see on the horizon for unstructured data analysis.

How did you get started in big data?

Anna Divoli: I began working with big data as it relates to science during my PhD. I worked with bioinformaticians who mined proteomics data. My research was on mining information from the biomedical literature that could serve as annotation in a database of protein families.

Alyona Medelyan: Like Anna, I mainly focus on unstructured data and how it can be managed using clever algorithms. During my PhD in natural language processing and data mining, I started applying such algorithms to large datasets to investigate how time-consuming data analysis and processing tasks can be automated.

What projects are you working on now?

Alyona Medelyan: For the past two years at Pingar, I've been developing solutions for enterprise customers who accumulate unstructured data and want to search, analyze, and explore this data efficiently. We develop entity extraction, text summarization, and other text analytics solutions to help scrub and interpret unstructured data in an organization.

Anna Divoli: We're focusing on several verticals that struggle with too much textual data, such as bioscience, legal, and government. We also strive to develop language-independent solutions.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

What are the trends and challenges you're seeing in the big data space?

Anna Divoli: There are plenty of trends that span various aspects of big data, such as making the data accessible from mobile devices, cloud solutions, addressing security and privacy issues, and analyzing social data.

One trend that is pertinent to us is the increasing popularity of APIs. Plenty of APIs exist that give access to large datasets, but there also powerful APIs that manage big data efficiently, such as text analytics, entity extraction, and data mining APIs.

Alyona Medelyan: The great thing about APIs is that they can be integrated into existing applications used inside an organization.

With regard to the challenges, enterprise data is very messy, inconsistent, and spread out across multiple internal systems and applications. APIs like the ones we're working on can bring consistency and structure to a company's legacy data.

The presentation you'll be giving at the Strata Conference will focus on practical applications of mining unstructured data. Why is this an important topic to address?

Anna Divoli: Every single organization in every vertical deals with unstructured data. Tons of text is produced daily — emails, reports, proposals, patents, literature, etc. This data needs to be mined to allow fast searching, easy processing, and quick decision making.

Alyona Medelyan: Big data often stands for structured data that is collected into a well-defined database — who bought which book in an online bookstore, for example. Such databases are relatively easy to mine because they have a consistent form. At the same time, there is plenty of unstructured data that is just as valuable, but it's extremely difficult to analyze it because it lacks structure. In our presentation, we will show how to detect structure using APIs, natural language processing and text mining, and demonstrate how this creates immediate value for business users.

Are there important new tools or projects on the horizon for big data?

Alyona Medelyan: Text analytics tools are very hot right now, and they improve daily as scientists come up with new ways of making algorithms understand written text more accurately. It is amazing that an algorithm can detect names of people, organizations, and locations within seconds simply by analyzing the context in which words are used. The trend for such tools is to move toward recognition of further useful entities, such as product names, brands, events, and skills.

Anna Divoli: Also, entity relation extraction is an important trend. A relation that consistently connects two entities in many documents is important information in science and enterprise alike. Entity relation extraction helps detect new knowledge in big data.

Other trends include detecting sentiment in social data, integrating multiple languages, and applying text analytics to audio and video transcripts. The number of videos grows at a constant rate, and transcripts are even more unstructured than written text because there is no punctuation. That's another exciting area on the horizon!

Who do you follow in the big data community?

Alyona Medelyan: We tend to follow researchers in areas that are used for dealing with big data, such as natural language processing, visualization, user experience, human computer information retrieval, as well as the semantic web. Two of them are also speaking at Strata this year: Daniel Tunkelang and Marti Hearst.


This interview was edited and condensed.

Related:

Reposted bycheg00 cheg00

December 06 2011

Stickers as sensors

Rather than ask people to integrate bulky or intrusive sensors into their lives, GreenGoose's upcoming system (pre-orders start on Dec. 15; systems ship on Jan. 1) will instead provide small stickers with built-in Internet-connected sensors. Tip a water bottle and the attached GreenGoose sticker logs it through a small base station that plugs into your wireless router. Feed the dog, go for a walk, clean the house — GreenGoose has designs on all of it. No special skills required.

GreenGoose founder Brian Krejcarek calls his company's sensors "elegantly playful." In the following short interview, Krejcarek explains how the GreenGoose stickers will work and how he hopes people will use the data they acquire from their everyday activities.

What will GreenGoose stickers measure?

Brian Krejcarek: Our sensors measure things you do based on how you interact with an object. This interaction correlates to a signature of forces that our sensors try to match against known patterns that represent a specific behavior around the use of the object. When there's a match, then we send a little wireless message from the sensor to the Internet.

For example, you can put a sensor sticker on a medicine bottle or water bottle. There are certain patterns here — tip, dispense, return upright — that the sensors can pick up.

Stickers are great for this because they're simple, flexible and they easily stick to curved things like bottles. Also, existing objects or things around the house can be enabled with sensing capabilities by just sticking on a sticker. We're taking everyday things and making them more fun. We're also lowering barriers to adopting sensors by treating them in a playful way.

We're trying to make it really easy. There's no batteries to recharge or USB cables or software to worry about. The sensors last more than a year, and the range is over 200 feet, so it's completely in the background.

We're finding all kinds of new applications for these sensors. We're going to be launching with sensors that target pets — measuring when you feed your pets or walk the dog, for instance. We've got about 50 or so other sensors in development right now that we will fairly quickly release over time.

GreenGoose sensor system
The GreenGoose sensor/sticker system.

How are you applying gamification?

Brian Krejcarek: We're keeping the gamification side of this really simple to start. It's all about making people smile and sharing a laugh as they do ordinary things throughout their day. No points, levels, or badges, necessarily. We're first going to roll out a simple application ourselves around these pet sensors, but developers will have immediate access to the API and data they generate. We invite those developers to start layering on their own game mechanics. GreenGoose is a platform play.

How do you think people will use the data your sensors gather?

Brian Krejcarek: We hope that the use of the data fits nicely into applications that help people have more fun with everyday things they do. Think families and kids. Toward that end, we've got a bunch of sensors on the way for toys and doing things around the house.

What lies ahead for GreenGoose?

Brian Krejcarek: Plans going forward include launching the previously mentioned sensor kits around pets, releasing an open API to developers, and launching a sensor around physical movement (exercise) as a little card that can slip into your wallet or purse. We affectionately call it a "get-up-off-your-bum" sensor. No calorie tracking, or graphs and charts. More sensors will be released shortly afterward, too.

This interview was edited and condensed.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Related:

November 21 2011

Exposing content via APIs

This post is part of the TOC podcast series, which we'll be featuring here on Radar in the coming months. You can also subscribe to the free TOC podcast through iTunes.


Publishers and authors obviously have a sense of how they intend their content to be used, but what if there are other ways of accessing and consuming content that a publisher and author didn't even consider? It reminds me of that great Henry Ford quote: "If I'd asked people what they wanted, they would have said 'a faster horse'." The point is, sometimes we just don't know what we want. That's where exposing content via APIs can help. As we talk about in this interview with Fluidinfo CEO Terry Jones (@terrycojones), APIs enable developers to work with your content like a box of Legos, building solutions you may never have dreamed of.

Key points from the full video interview (below) include:

  • What's an API? — Just as user interfaces enable access to information by users, APIs enable access to information by programmers. [Discussed at the 0:54 mark.]
  • The "read-only" model is not the future — Publishers have grown accustomed to a one-way communication. We produce content but generally don't let users enhance or modify that content. That may have worked well in the print world, but the digital world demands more. As Terry notes, the real world is "writable." [Discussed at 5:15.]
  • Publishers are just starting to recognize audience signals — There's value in not only detecting these signals, but also in acting on them. [Discussed at 10:55.]
  • Reading has always been a social activity — Much takes place in isolation, but think about why page numbers exist, for example. [Discussed at 12:10.]
  • How do you manage control in an open API access model? — It's not as scary as you might think. There are plenty of control mechanisms that can and should exist when exposing your content via APIs. [Discussed at 13:45.]
  • Mobile changes everything — Simple paywall access via a browser isn't the best solution. Mobile offers a completely new opportunity to distribute and monetize content ... but it has to be done correctly, of course. [Discussed at 18:50.]


  • Why not just offer access via HTML5? — HTML5 is a good delivery mechanism, but APIs are more like offering a toolbox for building even more powerful solutions. [Discussed at 28:16.]

You can view the entire interview in the following video.

TOC NY 2012 — O'Reilly's TOC Conference, being held Feb. 13-15, 2012, in New York City, is where the publishing and tech industries converge. Practitioners and executives from both camps will share what they've learned and join together to navigate publishing's ongoing transformation.

Register to attend TOC 2012

Related:


Jonathan's Card: Lessons from a social experiment

Earlier this summer, author Jonathan Stark (@jonathanstark) launched a social experiment by releasing his Starbucks card to the general public. Based on the "take a penny, leave a penny" tray near some stores' cash registers, Stark encouraged people to use his Starbucks card — to spend the money on it and/or to add cash back to it. While Stark never put any stipulations on the process, some observers were taken aback when another developer, Sam Odio, explained how to use Jonathan's card to buy an iPad.

It's been several months since Starbucks shut down the experiment, and now that the frenzy around it has subsided, I asked Stark a few questions about what motivated him to begin the project and what he learned in the process.

Why did you launch the Jonathan's Card experiment?

Jonathan StarkJonathan Stark: The motivation stemmed from my underlying belief that the vast majority of people are good. An opportunity to test this belief in public and on a global scale clicked with me at a very deep level. I couldn't have articulated this at the time, but it became very clear in retrospect.

For what it's worth, here's how the experiment got started:

I had been testing various mobile payment solutions while doing research for a client project. Starbucks' iPhone app was pretty cutting edge at the time, and I liked it. I wanted to test the app on an Android phone, but Starbucks had not yet released their Android app, so I took a screenshot of the in-app barcode on my iPhone and emailed the picture to my Android device. Sure enough, the barcode reader at the Starbucks point-of-sale (POS) system was able to read the picture of the barcode on my Android phone. This blew my mind because I had essentially emailed money to myself and bought physical goods with a digital photo.

Screenshot of Jonathan Stark's Starbucks cardA screenshot of Jonathan Stark's Starbucks card (click to enlarge).

As far as I knew, this was unprecedented. So, I did what any self-respecting geek would do: I blogged about it.

In the blog post, I invited readers to download the card image to their smartphones and see if it worked for them elsewhere in the US and around the world. It did work all over the US and in a handful of places outside the US. People who used it were amazed and delighted. It was really fun giving out free coffee, so I reloaded the card online a few times. Eventually it got a bit pricey, so I figured it'd be a once in a while thing.

Then one Saturday night, I noticed that my card balance had gone up. This freaked me out because the app is linked to my debit card and I thought someone might have guessed my starbucks.com password and was emptying my checking account. Fortunately, this was not the case. What had actually happened was that one of my friends discovered that he could anonymously add money onto my card using the picture of the barcode, either in person at the POS or by entering the number at starbucks.com.

At this point, my head exploded. I instantly realized that I could use the picture of the card to create a worldwide "pay it forward" experiment. I was up all night building a landing page that described the experiment, gave instructions on how to use the card, and how to donate to the card. I also wrote a script that scraped starbucks.com every minute for the current card balance — whenever the balance changed, the card would tweet its balance. When the card balance went to $0, it would tweet for help with a link to the instructions on how to donate.

What surprised you the most about the experiment?

Jonathan Stark: There were a lot of surprises. It's hard to say what surprised me most. Here's a list of biggies:

  • That Starbucks let the experiment go on for as long as it did. Sharing the card goes against the company's terms of use, and it could have been killed right away.
  • I was surprised how many people were perfectly comfortable with the concept of buying things with their phones. It seems to me that the average smartphone user is more willing to accept the "mobile wallet" concept than industry analysts would lead you to believe. I expected more people to have security concerns. I think I only got two questions about that.
  • How fast and huge something gets when it goes viral. I was getting contacted by network TV producers within days once the experiment took on a life of its own.
  • How addictive the Twitter feed was. By the end, @jonathanscard had more than 9,000 followers, many of whom later told me that they were watching it like TV, cheering when someone would make a big donation, booing when someone would spend $100 at a pop.
  • How generous most people are. I was amazed how many people were willing to throw $10, $20, even $50 into the pool to buy a coffee for some anonymous stranger. In one week, more than $19,000 went through the card.
  • How accommodating Starbucks baristas are. We heard stories about people bringing all sorts of wacky stuff up to be scanned: digital cameras, laptops, iPads, and so on. People who didn't have any mobile devices even took to printing the barcode out and scanning it like a coupon.

What are the broader implications from this experiment?

Jonathan Stark: There is no doubt in my mind that the experiment would not have taken off like it did without the Twitter feed. It was addictive, interactive, and simple. Once the community grew and started to engage with each other we had to create a Facebook page to allow people to have threaded conversations. Twitter became the card's data feed and Facebook was where people talked about it. Both were critical but in very different ways.

Starbucks doesn't have an API, which I think is a big missed opportunity. Retailers want to make sticky and engaging loyalty programs, right? One great way to do that would be to publish an API that allows third-party developers to build on top of a loyalty program in all sorts of delightful and unexpected ways. One thing everyone was asking for during the experiment was a heat map of where the purchase activity was taking place. Because there was no API, I couldn't provide this — which is too bad because it probably would have become viral in its own right.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

This interview was edited and condensed.

Related

November 03 2011

Developer Week in Review: The hijacking of an insulin pump

A future batch of kindlingIt was a great week at the Turner household! Although we love our house, we've frequently said to each other, "You know what we could really use? A 25-foot-long tree limb wrapped in power lines blocking our driveway." Well, this weekend mother nature decided to help us fill this void in our landscaping, and threw in some ornamental cherry firewood as well (chainsawing not included). Thankfully, I spent the extra bucks on Saturday to get our LPG tank topped off, so I've got generator power for 10-14 days. Given we're on day four with no power in sight, that was a good decision.

It could have been worse, of course. For example ...

A scene from an upcoming technothriller

Plucky researcher Ann McManna walked across the room toward the podium, ready to reveal the details of the fiendish plot she had uncovered to the waiting reporters. Now the world would know about the conspiracy to corner the world supply of macadamia nuts. Her heart pounded with excitement, her mouth was dry and she perspired, in spite of the air conditioning that was making the room practically an ice box. As she approached the stage, she bumped against a table, stumbling and suddenly having trouble seeing her path through blurry eyes. Something was wrong, but she couldn't focus, couldn't identify what was happening to her, even as she collapsed to the ground. Minutes later, the paramedics would close the eyelids of her corpse.

Some fanciful invention of Tom Clancy or Robin Cook? Not anymore, thanks to research by McAfee's Barnaby Jack, presented at this year's Hacker Halted conference. Using some custom software and a special antenna, Jack was able to control Medtronic insulin pumps as far as 300 feet from the controller. He was able to disable the tones that warn a user that insulin is being pumped, and trigger a 25-unit bolus of insulin. In some circumstances, this could kill a victim.

As networked computers appear in more life-critical items, this is a good reminder that security should be job No. 1, not something to think about if you have time. Too many proprietary device manufacturers seem to depend on security through obscurity, rather than security in depth.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

The first taste is free, but you'll be back

One of the perils of depending on public APIs from for-profit companies is that they may get turned into a profit center down the road. Users of the Google Maps API learned that lesson recently, as Google announced that high-volume users will no longer have free access to the APIs starting next year. Before you start panicking, the definition of high-volume will be more than 25,000 calls a day (2,500 if you use the custom styling features), and the rate over 25,000 is $4/1,000 calls. Google claims that less than 1% of all users will run up against this limit.

The problem with using beta or "free" services in your products is that, unless the terms of use specifically say that it will be free forever, you have no contractual agreement to lean on, and the provider is able at any point to change how (or even if) the service is provided.

Linus Torvalds vs. C++

Linux progenitor Linus Torvalds has a reputation for diplomacy and fence building — that's practically the only way to herd the stampede of cats that is the Linux developer community. But when he gets upset, the results can peel the paint off the walls.

We got a good example this week, as Torvalds responded to a complaint about the fact that the git source control system was written in pure C, rather than C++. In a nutshell, Torvalds called C++ a lousy language that attracts substandard programmers and leads to sloppy, unmaintainable code. In general, I tend to take any blanket condemnation of a programming language as hyperbole, but Torvalds seems to genuinely loathe C++. We'll have to see if his anger against the language alienates any of the kernel developer base, or if people will just shrug it off as Linus being Linus.

Got news?

Please send tips and leads here.

Related:

October 12 2011

Four short links: 12 October 2011

  1. Steve Yegge's Google Platforms Rant -- epic. Read it.
  2. Guidelines for Securing Open Source Software (EFF) -- advice from the team that audited some commonly-used open source libraries. Avoid giving the user options that could compromise security, in the form of modes, dialogs, preferences, or tweaks of any sort. As security expert Ian Grigg puts it, there is "only one Mode, and it is Secure." Ask yourself if that checkbox to toggle secure connections is really necessary? When would a user really want to weaken security? To the extent you must allow such user preferences, make sure that the default is always secure. (via BoingBoing)
  3. Ladder of Abstraction -- a visual and interactive exploration of design that will delight as well as inform. (via Sacha Judd)
  4. On "Build It And They Will Come" -- I wasn't saying "build it and they will come"—I was saying "don't build it and they can't come". Wonderfully captures the idea that success can't be guaranteed, but failure is easy to ensure. (via Ed Yong)

I

July 27 2011

Four short links: 27 July 2011

  1. ContentFlow -- Javascript library to provide CoverFlow-like behaviour.
  2. Twilio Client SDK -- 1/4 cent/minute API-to-API calls, embeddable in browser apps.
  3. Postel's Principle Reconsidered (ACM) -- The Robustness Principle was formulated in an Internet of cooperators. The world has changed a lot since then. Everything, even services that you may think you control, is suspect. Excellent explanation of how interoperability and security are harder than they should be because of Postel's Law ("Be conservative in what you do, be liberal in what you accept from others.", RFC 793). (via Mike Olson)
  4. HTTP Pipelining on Mobiles -- HTTP pipelining has a much higher adoption amongst mobile browsers. Opera Mini, Opera Mobile and the Android browser all use HTTP pipelining by default. Together they account for about 40% of mobile browsing. If you’re developing a mobile site, your site is experiencing HTTP pipelining daily, and you should understand how it works. (via John Clegg)

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl