Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

November 21 2013

October 25 2013

Programming with feedback

Everyone knows what feedback is. It’s when sound systems suddenly make loud, painful screeching sounds. And that answer is correct, at least partly.

Control theory, the study and application of feedback, is a discipline with a long history. If you’ve studied electrical or mechanical engineering, you’ve probably confronted it. Although there’s an impressive and daunting body of mathematics behind control theory, the basic idea is simple. Whenever you have a varying signal, you can use feedback to control the signal, giving you a consistent output. Screaming amps at a concert are nothing but a special case in which things have gone wrong.

We use control theory all the time, without even thinking about it. We couldn’t walk if it weren’t for our body’s instinctive use of feedback; upsetting that feedback system (for example, by spinning to become dizzy) makes you fall. When you’re driving a car, you ease off the accelerator when it’s going too fast. You press the accelerator when it’s going too slow. If you undercorrect, you’ll end up going too fast (or stopping); if you overcorrect, you’ll end up jerking forward, slamming on the brakes, then jerking forward again — possibly with disastrous consequences. Cruise control is nothing more than a robotic implementation of the same feedback loop.

A few months ago, Philipp Janert and I wondered why control theory is almost never used by software developers. There are a few exceptions, mostly in embedded systems (for example, cruise control). The answer may be simple enough: these days, relatively few software developers come through the ranks as electrical or mechanical engineers, and haven’t had exposure to the theory. But that isn’t a very good answer, even if it’s correct. In his work, Philipp has found control theory useful in applications as different as ad placement and supply chain management. And we’re willing to bet that, given the exposure, other programmers will find control theory a valuable tool.

Why? Software developers have been trying to build adaptive systems for years. We have lots of ideas about collecting data, analyzing it, and building models of how those systems behave. The problem with those models is that they’re prescriptive. They might tell you the optimal speed to travel to get from point A to point B, but they can’t make real-time adjustments to maintain your speed in response to actual conditions. In contrast, feedback systems are all about responding to actual conditions (and not about optimization). Modern software systems need to react to real-world conditions, such as radical variations in load; they can allocate more processors, memory, storage, and even network connectivity in response to changing conditions. For software to react intelligently and without constant human intervention, feedback isn’t an option; it’s a necessity.

Feedback Control for Computer Systems invites software developers to explore the uses of feedback. We ultimately don’t know why software developers haven’t discovered feedback. But now there’s no excuse: it’s an elegant and effective way to control complex, dynamic processes, and an important tool for anyone interacting with the real world.

January 09 2012

The hidden language and "wonderful experience" of product reviews

How do reviews, both positive and negative, influence the price of a product on Amazon? What phrases used by reviewers make us more or less likely to complete a purchase? These are some of the questions that computer scientist Panagiotis Ipeirotis, an associate professor at New York University's Stern School of Business, set out to investigate by analyzing the text in thousands of reviews on Amazon. Ipeirotis continues to research this space.

Ipeirotis' findings are surprising: consumers will pay more for the same product if the seller's reviews are good, certain types of negative reviews actually boost sales, and spelling plays an important role.

Our interview follows.

How important are product reviews on Amazon? Can they give sellers more pricing power?

http://assets.en.oreilly.com/1/eventprovider/1/_@user_4490.jpgPanagiotis Ipeirotis: The reviews have a significant effect. When buying online, customers are not only purchasing the product, they're also inherently buying the guarantee of a seamless transaction. Customers read the feedback left from other buyers to evaluate the reputation of the seller. Since customers are willing to pay more to buy from merchants with a better reputation — something we call the "reputation premium" — that feedback tends to have an effect on future prices that the merchant can charge.

What are some of the most influential phrases?

Panagiotis Ipeirotis: "Never received" is a killer phrase in terms of reputation. It reduced the price a seller can charge by an average of $7.46 in the products examined. "Wonderful experience" is one of the most positive, increasing the price a seller can charge by $5.86 for the researched products.

How can very positive reviews be bad for sales?

Panagiotis Ipeirotis: Extremely positive reviews that contain no concrete details tend to be perceived as non-objective — written by fanboys or spammers. We observed this mainly in the context of product reviews, where superlative phrases like "Best camera!" with no further details are actually seen negatively.

Can a negative review ever be good for sales?

Panagiotis Ipeirotis: It can when the review is overly negative or criticizes aspects of the product that are not its primary purpose — the video quality in an SLR camera, for example. Or, when customers have unreasonable expectations: "Battery life lasts only for two days of shooting." Readers interpret these types of negative comments as "This is good enough for me," and it decreases their uncertainty about the product.

What is the effect of badly written reviews on sales?

Panagiotis Ipeirotis: Reviews containing spelling and grammatical errors consistently result in suboptimal outcomes, like lower sales or lower response rates. That was a fascinating but, in retrospect, expected finding. This holds true in a wide variety of settings, from reviews of electronics to hotels. It's even the case when examining email correspondence about a decision, such as whether or not to hire a contractor.

We don't know the exact reason yet, but the effect is very systematic. There are several possible explanations:

  • Readers think that the customers who buy this product are uneducated, so they don't buy it.
  • Reviews that are badly written are considered unreliable and therefore increase the uncertainty about the product.
  • Badly written reviews are unsuccessful attempts to spam and are a signal that even the other good reviews may not be authentic.

What's the relationship between the product attributes discussed in reviews and the attributes that lead to sales?

Panagiotis Ipeirotis: We observed that the aspects of a product that drive the online discussion are not necessarily the ones that define consumer decisions to buy it. For example, "zoom" tends to be discussed a lot for small point-and-shoot cameras. However, very few people are influenced by the zoom capabilities when it comes down to deciding which camera to buy.

This interview was edited and condensed.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Related:

January 06 2012

Top Stories: January 2-6, 2012

Here's a look at the top stories published across O'Reilly sites this week.

The feedback economy
We're moving beyond an information economy. The efficiencies and optimizations that come from constant and iterative feedback will soon become the norm for businesses and governments.

Epatients: The hackers of the healthcare world
The epatient community uses digital tools and the connective power of the Internet to empower patients. Here, Fred Trotter offers epatient resources and first steps.

The three topics that will define the developer world in 2012
It's a brand new year, time to look ahead to the stories that will have developers talking in 2012. Mobile will remain a hot topic, the cloud is absorbing everything, and jobs appear to be heading back to the U.S.

Understanding randomness is a double-edged sword
While Leonard Mlodinow's "The Drunkard's Walk" offers a good introduction to probabilistic thinking, it carries two problems: First, it doesn't uniformly account for skill. Second, when we're talking probability and statistics, we're talking about interchangeable events.

Traditional vs self-publishing: Neither is the perfect solution
In this video podcast, author Dan Gillmor talks about the pros and cons of traditional publishing versus self-publishing.


Tools of Change for Publishing, being held February 13-15 in New York, is where the publishing and tech industries converge. Register to attend TOC 2012.

January 04 2012

The feedback economy

Military strategist John Boyd spent a lot of time understanding how to win battles. Building on his experience as a fighter pilot, he broke down the process of observing and reacting into something called an Observe, Orient, Decide, and Act (OODA) loop. Combat, he realized, consisted of observing your circumstances, orienting yourself to your enemy's way of thinking and your environment, deciding on a course of action, and then acting on it.

OODA chart
The Observe, Orient, Decide, and Act (OODA) loop. Click to enlarge.

The most important part of this loop isn't included in the OODA acronym, however. It's the fact that it's a loop. The results of earlier actions feed back into later, hopefully wiser, ones. Over time, the fighter "gets inside" their opponent's loop, outsmarting and outmaneuvering them. The system learns.

Boyd's genius was to realize that winning requires two things: being able to collect and analyze information better, and being able to act on that information faster, incorporating what's learned into the next iteration. Today, what Boyd learned in a cockpit applies to nearly everything we do.

Data-obese, digital-fast

In our always-on lives we're flooded with cheap, abundant information. We need to capture and analyze it well, separating digital wheat from digital chaff, identifying meaningful undercurrents while ignoring meaningless social flotsam. Clay Johnson argues that we need to go on an information diet, and makes a good case for conscious consumption. In an era of information obesity, we need to eat better. There's a reason they call it a feed, after all.

It's not just an overabundance of data that makes Boyd's insights vital. In the last 20 years, much of human interaction has shifted from atoms to bits. When interactions become digital, they become instantaneous, interactive, and easily copied. It's as easy to tell the world as to tell a friend, and a day's shopping is reduced to a few clicks.

The move from atoms to bits reduces the coefficient of friction of entire industries to zero. Teenagers shun e-mail as too slow, opting for instant messages. The digitization of our world means that trips around the OODA loop happen faster than ever, and continue to accelerate.

We're drowning in data. Bits are faster than atoms. Our jungle-surplus wetware can't keep up. At least, not without Boyd's help. In a society where every person, tethered to their smartphone, is both a sensor and an end node, we need better ways to observe and orient, whether we're at home or at work, solving the world's problems or planning a play date. And we need to be constantly deciding, acting, and experimenting, feeding what we learn back into future behavior.

We're entering a feedback economy.

The big data supply chain

Consider how a company collects, analyzes, and acts on data.

The big data supply chain
The big data supply chain. Click to enlarge.

Let's look at these components in order.

Data collection

The first step in a data supply chain is to get the data in the first place.

Information comes in from a variety of sources, both public and private. We're a promiscuous society online, and with the advent of low-cost data marketplaces, it's possible to get nearly any nugget of data relatively affordably. From social network sentiment, to weather reports, to economic indicators, public information is grist for the big data mill. Alongside this, we have organization-specific data such as retail traffic, call center volumes, product recalls, or customer loyalty indicators.

The legality of collection is perhaps more restrictive than getting the data in the first place. Some data is heavily regulated — HIPAA governs healthcare, while PCI restricts financial transactions. In other cases, the act of combining data may be illegal because it generates personally identifiable information (PII). For example, courts have ruled differently on whether IP addresses aren't PII, and the California Supreme Court ruled that zip codes are. Navigating these regulations imposes some serious constraints on what can be collected and how it can be combined.

The era of ubiquitous computing means that everyone is a potential source of data, too. A modern smartphone can sense light, sound, motion, location, nearby networks and devices, and more, making it a perfect data collector. As consumers opt into loyalty programs and install applications, they become sensors that can feed the data supply chain.

In big data, the collection is often challenging because of the sheer volume of information, or the speed with which it arrives, both of which demand new approaches and architectures.

Ingesting and cleaning

Once the data is collected, it must be ingested. In traditional business intelligence (BI) parlance, this is known as Extract, Transform, and Load (ETL): the act of putting the right information into the correct tables of a database schema and manipulating certain fields to make them easier to work with.

One of the distinguishing characteristics of big data, however, is that the data is often unstructured. That means we don't know the inherent schema of the information before we start to analyze it. We may still transform the information — replacing an IP address with the name of a city, for example, or anonymizing certain fields with a one-way hash function — but we may hold onto the original data and only define its structure as we analyze it.

Hardware

The information we've ingested needs to be analyzed by people and machines. That means hardware, in the form of computing, storage, and networks. Big data doesn't change this, but it does change how it's used. Virtualization, for example, allows operators to spin up many machines temporarily, then destroy them once the processing is over.

Cloud computing is also a boon to big data. Paying by consumption destroys the barriers to entry that would prohibit many organizations from playing with large datasets, because there's no up-front investment. In many ways, big data gives clouds something to do.

Platforms

Where big data is new is in the platforms and frameworks we create to crunch large amounts of information quickly. One way to speed up data analysis is to break the data into chunks that can be analyzed in parallel. Another is to build a pipeline of processing steps, each optimized for a particular task.

Big data is often about fast results, rather than simply crunching a large amount of information. That's important for two reasons:

  1. Much of the big data work going on today is related to user interfaces and the web. Suggesting what books someone will enjoy, or delivering search results, or finding the best flight, requires an answer in the time it takes a page to load. The only way to accomplish this is to spread out the task, which is one of the reasons why Google has nearly a million servers.
  2. We analyze unstructured data iteratively. As we first explore a dataset, we don't know which dimensions matter. What if we segment by age? Filter by country? Sort by purchase price? Split the results by gender? This kind of "what if" analysis is exploratory in nature, and analysts are only as productive as their ability to explore freely. Big data may be big. But if it's not fast, it's unintelligible.

Much of the hype around big data companies today is a result of the retooling of enterprise BI. For decades, companies have relied on structured relational databases and data warehouses — many of them can't handle the exploration, lack of structure, speed, and massive sizes of big data applications.

Machine learning

One way to think about big data is that it's "more data than you can go through by hand." For much of the data we want to analyze today, we need a machine's help.

Part of that help happens at ingestion. For example, natural language processing tries to read unstructured text and deduce what it means: Was this Twitter user happy or sad? Is this call center recording good, or was the customer angry?

Machine learning is important elsewhere in the data supply chain. When we analyze information, we're trying to find signal within the noise, to discern patterns. Humans can't find signal well by themselves. Just as astronomers use algorithms to scan the night's sky for signals, then verify any promising anomalies themselves, so to can data analysts use machines to find interesting dimensions, groupings, or patterns within the data. Machines can work at a lower signal-to-noise ratio than people.

Human exploration

While machine learning is an important tool to the data analyst, there's no substitute for human eyes and ears. Displaying the data in human-readable form is hard work, stretching the limits of multi-dimensional visualization. While most analysts work with spreadsheets or simple query languages today, that's changing.

Creve Maples, an early advocate of better computer interaction, designs systems that take dozens of independent, data sources and displays them in navigable 3D environments, complete with sound and other cues. Maples' studies show that when we feed an analyst data in this way, they can often find answers in minutes instead of months.

This kind of interactivity requires the speed and parallelism explained above, as well as new interfaces and multi-sensory environments that allow an analyst to work alongside the machine, immersed in the data.

Storage

Big data takes a lot of storage. In addition to the actual information in its raw form, there's the transformed information; the virtual machines used to crunch it; the schemas and tables resulting from analysis; and the many formats that legacy tools require so they can work alongside new technology. Often, storage is a combination of cloud and on-premise storage, using traditional flat-file and relational databases alongside more recent, post-SQL storage systems.

During and after analysis, the big data supply chain needs a warehouse. Comparing year-on-year progress or changes over time means we have to keep copies of everything, along with the algorithms and queries with which we analyzed it.

Sharing and acting

All of this analysis isn't much good if we can't act on it. As with collection, this isn't simply a technical matter — it involves legislation, organizational politics, and a willingness to experiment. The data might be shared openly with the world, or closely guarded.

The best companies tie big data results into everything from hiring and firing decisions, to strategic planning, to market positioning. While it's easy to buy into big data technology, it's far harder to shift an organization's culture. In many ways, big data adoption isn't a hardware retirement issue, it's an employee retirement one.

We've seen similar resistance to change each time there's a big change in information technology. Mainframes, client-server computing, packet-based networks, and the web all had their detractors. A NASA study into the failure of Ada, the first object-oriented language, concluded that proponents had over-promised, and there was a lack of a supporting ecosystem to help the new language flourish. Big data, and its close cousin, cloud computing, are likely to encounter similar obstacles.

A big data mindset is one of experimentation, of taking measured risks and assessing their impact quickly. It's similar to the Lean Startup movement, which advocates fast, iterative learning and tight links to customers. But while a small startup can be lean because it's nascent and close to its market, a big organization needs big data and an OODA loop to react well and iterate fast.

The big data supply chain is the organizational OODA loop. It's the big business answer to the lean startup.

Measuring and collecting feedback

Just as John Boyd's OODA loop is mostly about the loop, so big data is mostly about feedback. Simply analyzing information isn't particularly useful. To work, the organization has to choose a course of action from the results, then observe what happens and use that information to collect new data or analyze things in a different way. It's a process of continuous optimization that affects every facet of a business.

Replacing everything with data

Software is eating the world. Verticals like publishing, music, real estate and banking once had strong barriers to entry. Now they've been entirely disrupted by the elimination of middlemen. The last film projector rolled off the line in 2011: movies are now digital from camera to projector. The Post Office stumbles because nobody writes letters, even as Federal Express becomes the planet's supply chain.

Companies that get themselves on a feedback footing will dominate their industries, building better things faster for less money. Those that don't are already the walking dead, and will soon be little more than case studies and colorful anecdotes. Big data, new interfaces, and ubiquitous computing are tectonic shifts in the way we live and work.

A feedback economy

Big data, continuous optimization, and replacing everything with data pave the way for something far larger, and far more important, than simple business efficiency. They usher in a new era for humanity, with all its warts and glory. They herald the arrival of the feedback economy.

The efficiencies and optimizations that come from constant, iterative feedback will soon become the norm for businesses and governments. We're moving beyond an information economy. Information on its own isn't an advantage, anyway. Instead, this is the era of the feedback economy, and Boyd is, in many ways, the first feedback economist.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20


Related:


Reposted bydatenwolf datenwolf

November 03 2011

Four short links: 3 November 2011

  1. Feedback Without Frustration (YouTube) -- Scott Berkun at the HIVE conference talks about how feedback fails, and how to get it successfully. He is so good.
  2. Americhrome -- history of the official palette of the United States of America.
  3. Discovering Talented Musicians with Musical Analysis (Google Research blgo) -- very clever, they do acoustical analysis and then train up a machine learning engine by asking humans to rate some tracks. Then they set it loose on YouTube and it finds people who are good but not yet popular. My favourite: I'll Follow You Into The Dark by a gentleman with a wonderful voice.
  4. Dark Sky (Kickstarter) -- hyperlocal hyper-realtime weather prediction. Uses radar imagery to figure out what's going on around you, then tells you what the weather will be like for the next 30-60 minutes. Clever use of data plus software.

July 14 2011

Four short links: 14 July 2011

  1. Digging into Technology's Past -- stories of the amazing work behind the visual 6502 project and how they reconstructed and simulated the legendary 6502 chip. To analyze and then preserve the 6502, James treated it like the site of an excavation. First, he needed to expose the actual chip by removing its packaging of essentially “billiard-ball plastic.” He eroded the casing by squirting it with very hot, concentrated sulfuric acid. After cleaning the chip with an ultrasonic cleaner—much like what’s used for dentures or contact lenses—he could see its top layer.
  2. Leaflet -- BSD-licensed lightweight Javascript library for interactive maps, using the Open Street Map.
  3. Too Many Public Works Built on Rosy Scenarios (Bloomberg) -- a feedback loop with real data being built to improve accuracy estimating infrastructure project costs. He would like to see better incentives -- punishment for errors, rewards for accuracy -- combined with a requirement that forecasts not only consider the expected characteristics of the specific project but, once that calculation is made, adjust the estimate based on an “outside view,” reflecting the cost overruns of similar projects. That way, the “unexpected” problems that happen over and over again would be taken into consideration. Such scrutiny would, of course, make some projects look much less appealing -- which is exactly what has happened in the U.K., where “reference-class forecasting” is now required. “The government stopped a number of projects dead in their tracks when they saw the forecasts,” Flyvbjerg says. “This had never happened before.”
  4. Neurovigil Gets Cash Injection To Read Your Mind (FastCompany) -- "an anonymous American industrialist and technology visionary" put tens of millions into this company, which has hardware to gather mineable data. iBrain promises to open a huge pipeline of data with its powerful but simple brain-reading tech, which is gaining traction thanks to technological advances. But the other half of the potentailly lucrative equation is the ability to analyze the trove of data coming from iBrain. And that's where NeuroVigil's SPEARS algorithm enters the picture. Not only is the company simplifying collection of brain data with a device that can be relatively comfortably worn during all sorts of tasks--sleeping, driving, watching advertising--but the combination of iBrain and SPEARS multiplies the efficiency of data analysis. (via Vaughan Bell)

July 07 2011

Four short links: 7 July 2011

  1. Commodore 64 PC -- gorgeous retro look with fairly zippy modern internals. (via Rob Passarella)
  2. Designing Github for Mac -- a retrospective from the author of the excellent Mac client for github. He talks about what he learned and its origins, design, and development. Remember web development in 2004? When you had to create pixel-perfect comps because every element on screen was an image? That’s what developing for Cocoa is. Drawing in code is slow and painful. Images are easier to work with and result in more performant code. Remember these days? This meant my Photoshop files had to be a lot more fleshed out than I’ve been accustomed to in recent years. I usually get about 80% complete in Photoshop (using tons of screenshotting & layer flattening), then jump into code and tweak to completion. But with Cocoa, I ended up fleshing out that last 20% in Photoshop.
  3. Feedback Loops (Wired) -- covers startups and products that use feedback loops to help us change our behaviour. The best sort of delivery device “isn’t cognitively loading at all,” he says. “It uses colors, patterns, angles, speed—visual cues that don’t distract us but remind us.” This creates what Rose calls “enchantment.” Enchanted objects, he says, don’t register as gadgets or even as technology at all, but rather as friendly tools that beguile us into action. In short, they’re magical. (via Joshua Porter)
  4. continuous.io -- hosted continuous integration. (via Jacob Kaplan-Moss)

March 04 2011

Computers are looking back at us

As researchers work to increase human-computer interactivity, the lines between real and digital worlds are blurring. Augmented reality (AR), just in its infant stage, may be set to explode. As the founders of Bubbli, a startup developing an AR iPhone app, said in a recent Silicon Valley Blog post by Barry Bazzell: "Once we understand reality through a camera lens ... the virtual and real become indistinguishable.'"

Eyetracking

Kevin Kelly, co-founder and senior maverick at Wired magazine, recently pointed out in a keynote speech at TOC 2011 that soon the computers we're looking at would be looking back at us (the image above is from Kelly's presentation).

"Soon" turns out to be now: Developers at Swedish company Tobii Technology have created 20 computers that are controlled by eye movement. Tom Simonite described the technology in a recent post for MIT's Technology Review:

The two cameras below the laptop's screen use infrared light to track a user's pupils. An infrared light source located next to the cameras lights up the user's face and creates a "glint" in the eyes that can be accurately tracked. The position of those points is used to create a 3-D model of the eyes that is used to calculate what part of the screen the user is looking at; the information is updated 40 times per second.

The exciting part here is a comment in the article by Barbara Barclay, general manager of Tobii North America:

We built this conceptual prototype to see how close we are to being ready to use eye tracking for the mass market ... We think it may be ready.

Tobii released the following video to explain the system:

This type of increased interaction has potential across industries — intelligent search, AR, personalization, recommendations, to name just a few channels. Search models built on interaction data gathered directly from the user could also augment the social aggregation that search engine companies are currently focused on. Engines could incorporate what you like with what you see and do.



Related:


January 05 2011

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl