Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 21 2014

January 27 2014

Four short links: 27 January 2014

  1. Druid — open source clustered data store (not key-value store) for real-time exploratory analytics on large datasets.
  2. It’s Time to Engineer Some Filter Failure (Jon Udell) — Our filters have become so successful that we fail to notice: We don’t control them, They have agendas, and They distort our connections to people and ideas. That idea that algorithms have agendas is worth emphasising. Reality doesn’t have an agenda, but the deployer of a similarity metric has decided what features to look for, what metric they’re optimising, and what to do with the similarity data. These are all choices with an agenda.
  3. Capstone — open source multi-architecture disassembly engine.
  4. The Future of Employment (PDF) — We note that this prediction implies a truncation in the current trend towards labour market polarization, with growing employment in high and low-wage occupations, accompanied by a hollowing-out of middle-income jobs. Rather than reducing the demand for middle-income occupations, which has been the pattern over the past decades, our model predicts that computerisation will mainly substitute for low-skill and low-wage jobs in the near future. By contrast, high-skill and high-wage occupations are the least susceptible to computer capital. (via The Atlantic)

January 15 2014

Four short links: 15 January 2014

  1. Hackers Gain ‘Full Control’ of Critical SCADA Systems (IT News) — The vulnerabilities were discovered by Russian researchers who over the last year probed popular and high-end ICS and supervisory control and data acquisition (SCADA) systems used to control everything from home solar panel installations to critical national infrastructure. More on the Botnet of Things.
  2. mclMarkov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs (also known as networks) based on simulation of (stochastic) flow in graphs.
  3. Facebook to Launch Flipboard-like Reader (Recode) — what I’d actually like to see is Facebook join the open web by producing and consuming RSS/Atom/anything feeds, but that’s a long shot. I fear it’ll either limit you to whatever circle-jerk-of-prosperity paywall-penetrating content-for-advertising-eyeballs trades the Facebook execs have made, or else it’ll be a leech on the scrotum of the open web by consuming RSS without producing it. I’m all out of respect for empire-builders who think you’re a fool if you value the open web. AOL might have died, but its vision of content kings running the network is alive and well in the hands of Facebook and Google. I’ll gladly post about the actual product launch if it is neither partnership eyeball-abuse nor parasitism.
  4. Map Projections Illustrated with a Face (Flowing Data) — really neat, wish I’d had these when I was getting my head around map projections.

December 26 2013

Four short links: 26 December 2013

  1. Nest Protect Teardown (Sparkfun) — initial teardown of another piece of domestic industrial Internet.
  2. LogsThe distributed log can be seen as the data structure which models the problem of consensus. Not kidding when he calls it “real-time data’s unifying abstraction”.
  3. Mining the Web to Predict Future Events (PDF) — Mining 22 years of news stories to predict future events. (via Ben Lorica)
  4. Nanocubesa fast datastructure for in-memory data cubes developed at the Information Visualization department at AT&T Labs – Research. Nanocubes can be used to explore datasets with billions of elements at interactive rates in a web browser, and in some cases it uses sufficiently little memory that you can run a nanocube in a modern-day laptop. (via Ben Lorica)

November 21 2013

September 07 2013

August 12 2013

Four short links: 14 August 2013

  1. bookcision — bookmarklet to download your Kindle highlights. (via Nelson Minar)
  2. Algorithm for a Perfectly Balanced Photo Gallery — remember this when it comes time to lay out your 2013 “Happy Holidays!” card.
  3. Long Stories (Fast Company Labs) — Our strategy was to still produce feature stories as discrete articles, but then to tie them back to the stub article with lots of prominent links, again taking advantage of the storyline and context we had built up there, making our feature stories sharper and less full of catch-up material.
  4. Massachusetts Software Tax (Fast Company Labs) — breakdown of why this crappily-written law is bad news for online companies. Laws are the IEDs of the Internet: it’s easy to make massively value-destroying regulation and hard to get it fixed.

August 01 2013

Four short links: 2 August 2013

  1. Unhappy Truckers and Other Algorithmic ProblemsEven the insides of vans are subjected to a kind of routing algorithm; the next time you get a package, look for a three-letter letter code, like “RDL.” That means “rear door left,” and it is so the driver has to take as few steps as possible to locate the package. (via Sam Minnee)
  2. Fuel3D: A Sub-$1000 3D Scanner (Kickstarter) — a point-and-shoot 3D imaging system that captures extremely high resolution mesh and color information of objects. Fuel3D is the world’s first 3D scanner to combine pre-calibrated stereo cameras with photometric imaging to capture and process files in seconds.
  3. Corporate Open Source Anti-Patterns (YouTube) — Brian Cantrill’s talk, slides here. (via Daniel Bachhuber)
  4. Hacking for Humanity) (The Economist) — Getting PhDs and data specialists to donate their skills to charities is the idea behind the event’s organizer, DataKind UK, an offshoot of the American nonprofit group.

June 04 2013

Four short links: 4 June 2013

  1. WeevilScout — browser app that turns your browser into a worker for distributed computation tasks. See the poster (PDF). (via Ben Lorica)
  2. sregex (Github) — A non-backtracking regex engine library for large data streams. See also slide notes from a YAPC::NA talk. (via Ivan Ristic)
  3. Bobby Tables — a guide to preventing SQL injections. (via Andy Lester)
  4. Deep Learning Using Support Vector Machines (Arxiv) — we are proposing to train all layers of the deep networks by backpropagating gradients through the top level SVM, learning features of all layers. Our experiments show that simply replacing softmax with linear SVMs gives significant gains on datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop’s face expression recognition challenge. (via Oliver Grisel)

March 08 2013

Four short links: 8 March 2013

  1. mlcompa free website for objectively comparing machine learning programs across various datasets for multiple problem domains.
  2. Printing Code: Programming and the Visual Arts (Vimeo) — Rune Madsen’s talk from Heroku’s Waza. (via Andrew Odewahn)
  3. What Data Brokers Know About You (ProPublica) — excellent run-down on the compilers of big data about us. Where are they getting all this info? The stores where you shop sell it to them.
  4. Subjective Impressions Do Not Mirror Online Reading Effort: Concurrent EEG-Eyetracking Evidence from the Reading of Books and Digital Media (PLOSone) — Comprehension accuracy did not differ across the three media for either group and EEG and eye fixations were the same. Yet readers stated they preferred paper. That preference, the authors conclude, isn’t because it’s less readable. From this perspective, the subjective ratings of our participants (and those in previous studies) may be viewed as attitudes within a period of cultural change.

January 11 2013

Four short links: 11 January 2013

  1. How to Redesign Your App Without Pissing Everybody Off (Anil Dash) — the basic straightforward stuff that gets your users on-side. Anil’s making a career out of being an adult.
  2. Clockwork Raven (Twitter) — open source project to send data analysis tasks to Mechanical Turkers.
  3. Updates from the Tour in China (Bunnie Huang) — my dream geek tourism trip: going around Chinese factories and bazaars with MIT geeks.
  4. How to Implement an Algorithm from a Scientific PaperI have implemented many complex algorithms from books and scientific publications, and this article sums up what I have learned while searching, reading, coding and debugging. (via Siah)

Four short links: 11 January 2013

  1. How to Redesign Your App Without Pissing Everybody Off (Anil Dash) — the basic straightforward stuff that gets your users on-side. Anil’s making a career out of being an adult.
  2. Clockwork Raven (Twitter) — open source project to send data analysis tasks to Mechanical Turkers.
  3. Updates from the Tour in China (Bunnie Huang) — my dream geek tourism trip: going around Chinese factories and bazaars with MIT geeks.
  4. How to Implement an Algorithm from a Scientific PaperI have implemented many complex algorithms from books and scientific publications, and this article sums up what I have learned while searching, reading, coding and debugging. (via Siah)

October 17 2012

Four short links: 17 October 2012

  1. Beyond Goods and Services: The Unmeasured Rise of the Data-Driven Economy — excellent points about data as neither good nor service, and how data use goes unmeasured by economists and thus doesn’t influence policy. According to statistics from the Bureau of Economic Analysis, real consumption of ‘internet access’ has been falling since the second quarter of 2011. In other words, according to official U.S. government figures, consumer access to the Internet—including mobile—has been a drag on economic growth for the past year and a half. (via Mike Loukides)
  2. How Crooks Turn Even Crappy Hacked PCs Into Money (Brian Krebs) — show to your corporate IT overlords, or your parents, to explain why you want them to get rid of the Windows XP machines. (via BoingBoing)
  3. Open Data Structures — an open content textbook (Java and C++ editions; CC-BY licensed) on data structures. (via Hacker News)
  4. Mobiforge — test what gets sent back to mobile browsers. This site sends the HTTP headers that a mobile browser would. cf yesterday’s Responsivator. (via Ronan Cremin)

October 16 2012

Four short links: 16 October 2012

  1. cir.ca — news app for iPhone, which lets you track updates and further news on a given story. (via Andy Baio)
  2. DataWrangler (Stanford) — an interactive tool for data cleaning and transformation. Spend less time formatting and more time analyzing your data. From the Stanford Visualization Group.
  3. Responsivator — see how websites look at different screen sizes.
  4. Accountable Algorithms (Ed Felten) — When we talk about making an algorithmic public process open, we mean two separate things. First, we want transparency: the public knows what the algorithm is. Second, we want the execution of the algorithm to be accountable: the public can check to make sure that the algorithm was executed correctly in a particular case. Transparency is addressed by traditional open government principles; but accountability is different.

October 08 2012

Four short links: 8 October 2012

  1. Beware the Drones (Washington Times) — the temptation to send difficult to detect, unmanned aircraft into foreign airspace with perceived impunity means policymakers will naturally incline towards aggressive use of drones and hyperactive interventionism, leading us to a future that is ultimately plagued by more, not less warfare and conflict. This. Also, what I haven’t seen commented on with the Israeli air force shooting down a (presumably Hezbollah) drone: low cost of drones vs high cost of maintaining an air force to intercept, means this is asymmetric unmanned warfare.
  2. Scanbooth (github) — a collection of software for running a 3D scanning booth. Greg Borenstein said to me, “we need tools to scan and modify before 3D printing can take off.” (via Jeremy Herrman)
  3. Bitcoin’s Value is Decentralization (Paul Bohm) — Bitcoin isn’t just a currency but an elegant universal solution to the Byzantine Generals’ Problem, one of the core problems of reaching consensus in Distributed Systems. Until recently it was thought to not be practically solvable at all, much less on a global scale. Irrespective of its currency aspects, many experts believe Bitcoin is brilliant in that it technically made possible what was previously thought impossible. (via Mike Loukides)
  4. Blue Collar Coder (Anil Dash) — I am proud of, and impressed by, Craigslist’s ability to serve hundreds of millions of users with a few dozen employees. But I want the next Craigslist to optimize for providing dozens of jobs in each of the towns it serves, and I want educators in those cities to prepare young people to step into those jobs. Time for a Massively Multiplayer Online Economy, as opposed to today’s fun economic games of Shave The Have-Nots and Race To The Oligarchy.

August 17 2012

Wall Street’s robots are not out to get you

ABOVE by Lyfetime, on FlickrTechnology is critical to today’s financial markets. It’s also surprisingly controversial. In most industries, increasing technological involvement is progress, not a problem. And yet, people who believe that computers should drive cars suddenly become Luddites when they talk about computers in trading.

There’s widespread public sentiment that technology in finance just screws the “little guy.” Some of that sentiment is due to concern about a few extremely high-profile errors. A lot of it is rooted in generalized mistrust of the entire financial industry. Part of the problem is that media coverage on the issue is depressingly simplistic. Hyperbolic articles about the “rogue robots of Wall Street” insinuate that high-frequency trading (HFT) is evil without saying much else. Very few of those articles explain that HFT is a catchall term that describes a host of different strategies, some of which are extremely beneficial to the public market.

I spent about six years as a trader, using automated systems to make markets and execute arbitrage strategies. From 2004-2011, as our algorithms and technology became more sophisticated, it was increasingly rare for a trader to have to enter a manual order. Even in 2004, “manual” meant instructing an assistant to type the order into a terminal; it was still routed to the exchange by a computer. Automating orders reduced the frequency of human “fat finger” errors. It meant that we could adjust our bids and offers in a stock immediately if the broader market moved, which enabled us to post tighter markets. It allowed us to manage risk more efficiently. More subtly, algorithms also reduced the impact of human biases — especially useful when liquidating a position that had turned out badly. Technology made trading firms like us more profitable, but it also benefited the people on the other sides of those trades. They got tighter spreads and deeper liquidity.

Many HFT strategies have been around for decades. A common one is exchange arbitrage, which Time magazine recently described in an article entitled “High Frequency Trading: Wall Street’s Doomsday Machine?”:

A high-frequency trader might try to take advantage of minuscule differences in prices between securities offered on different exchanges: ABC stock could be offered for one price in New York and for a slightly higher price in London. With a high-powered computer and an ‘algorithm,’ a trader could buy the cheap stock and sell the expensive one almost simultaneously, making an almost risk-free profit for himself.

It’s a little bit more difficult than that paragraph makes it sound, but the premise is true — computers are great for trades like that. As technology improved, exchange arb went from being largely manual to being run almost entirely via computer, and the market in the same stock across exchanges became substantially more efficient. (And as a result of competition, the strategy is now substantially less profitable for the firms that run it.)

Market making — posting both a bid and an offer in a security and profiting from the bid-ask spread — is presumably what Knight Capital was doing when it experienced “technical difficulties.” The strategy dates from the time when exchanges were organized around physical trading pits. Those were the bad old days, when there was little transparency and automation, and specialists and brokers could make money ripping off clients who didn’t have access to technology. Market makers act as liquidity providers, and they are an important part of a well-functioning market. Automated trading enables them to manage their orders efficiently and quickly, and helps to reduce risk.

So how do those high-profile screw-ups happen? They begin with human error (or, at least, poor judgment). Computerized trading systems can amplify these errors; it would be difficult for a person sending manual orders to simultaneously botch their markets in 148 different companies, as Knight did. But it’s nonsense to make the leap from one brokerage experiencing severe technical difficulties to claiming that automated market-making creates some sort of systemic risk. The way the market handled the Knight fiasco is how markets are supposed to function — stupidly priced orders came in, the market absorbed them, the U.S. Securities and Exchange Commission (SEC) and the exchanges adhered to their rules regarding which trades could be busted (ultimately letting most of the trades stand and resulting in a $440 million loss for Knight).

There are some aspects of HFT that are cause for concern. Certain strategies have exacerbated unfortunate feedback loops. The Flash Crash illustrated that an increase in volume doesn’t necessarily mean an increase in real liquidity. Nanex recently put together a graph (or a “horrifying GIF“) showing the sharply increasing number of quotes transmitted via automated systems across various exchanges. What it shows isn’t actual trades, but it does call attention to a problem called “quote spam.” Algorithms that employ this strategy generate a large number of buy and sell orders that are placed in the market and then are canceled almost instantly. They aren’t real liquidity; the machine placing them has no intention of getting a fill — it’s flooding the market with orders that competitor systems have to process. This activity leads to an increase in short-term volatility and higher trading costs.

The New York Times just ran an interesting article on HFT that included data on the average cost of trading one share of stock. From 2000 to 2010, it dropped from $.076 to $.035. Then it appears to have leveled off, and even increased slightly, to $.038 in 2012. If (as that data suggests) we’ve arrived at the point where the “market efficiency” benefit of HFT is outweighed by the risk of increased volatility or occasional instability, then regulators need to step in. The challenge is determining how to disincentivize destabilizing behavior without negatively impacting genuine liquidity providers. One possibility is to impose a financial transaction tax, possibly based on how long the order remains in the market or on the number of orders sent per second.

Rethinking regulation and market safeguards in light of new technology is absolutely appropriate. But the state of discourse in the mainstream press — mostly comprised of scare articles about “Wall Street’s terrifying robot invasion” — is unfortunate. Maligning computerized strategies because they are computerized is the wrong way to think about the future of our financial markets.

Photo: ABOVE by Lyfetime, on Flickr

Related:

June 04 2012

Can Future Advisor be the self-driving car for financial advice?

Future AdvisorLast year, venture capitalist Marc Andreessen famously wrote that software is eating the world. The impact of algorithms upon media, education, healthcare and government, among many other verticals, is just beginning to be felt, and with still unfolding consequences for the industries disrupted.

Whether it's the prospect of IBM's Watson offering a diagnosis to a patient or Google's self-driving car taking over on the morning commute, there are going to be serious concerns raised about safety, power, control and influence.

Doctors and lawyers note, for good reason, that their public appearances on radio, television and the Internet should not be viewed as medical or legal advice. While financial advice may not pose the same threat to a citizen as an incorrect medical diagnosis or treatment, poor advice could have pretty significant downstream outcomes.

That risk isn't stopping a new crop of startups from looking for a piece of the billions of dollars paid every year to financial advisors. Future Advisor launched in 2010 with the goal of providing better financial advice through the Internet using data and algorithms. They're competing against startups like Wealthfront and Betterment, among others.

Not everyone is convinced of the validity of this algorithmically mediated approach to financial advice. Mike Alfred, the co-founder of BrightScope (which has liberated financial advisor data itself), wrote in Forbes this spring that online investment firms are wrong about financial advisors:

"While singularity proponents may disagree with me here, I believe that some professions have a fundamentally human component that will never be replaced by computers, machines, or algorithms. Josh Brown, an independent advisor at Fusion Analytics Investment Partners in NYC, recently wrote that 'for 12,000 years, anywhere someone has had wealth through the history of civilization, there's been a desire to pay others for advice in managing it.' In some ways, it's no different from the reason why many seek out the help of a psychiatrist. People want the comfort of a human presence when things aren't going well. A computer arguably may know how to allocate funds in a normal market environment, but can it talk you off the cliff when things go to hell? I don't think so. Ric Edelman, Chairman & CEO of Edelman Financial Services, brings up another important point. According to him, 'most consumers are delegators and procrastinators, and need the advisor to get them to do what they know they need to do but won't do if left on their own'."

To get the other side of this story, I recently talked with Bo Lu (@bolu), one of the two co-founders of Future Advisor. Lu explained how the service works, where the data comes from and whether we should fear the dispassionate influence of our new robotic financial advisor overlords.

Where did the idea for Future Advisor come from?

Lu: The story behind Future Advisor is one of personal frustration. We started the company in 2010 when my co-founder and I were working at Microsoft. Our friends who had reached their mid-20s were really making money for the first time in their lives. They were now being asked to make decisions, such as "Where do I open an IRA? What do I do with my 401K?" As is often the case, they went to the friend who had the most experience, which in this case turned out to be me. So I said, "Well, let's just find you guys a good financial advisor and then we'll do this," because somehow in my mind, I thought, "Financial advisors do this."

It turned out that all of the financial advisors we found fell into two distinct classes. One were folks that were really nice but essentially in very kind words said, "Maybe you'd be more comfortable at the lower stakes table." We didn't meet any of their minimums. You needed a million dollars or at least a half million to get their services.

The other kinds of financial advisors who didn't have minimums immediately started trying to sell my friends term life insurance and annuities. I'm like, "These guys are 25. There's no reason for you to be doing this." Then I realized there was a misalignment of incentives there. We noticed that our friends were making a small set of the same mistakes over and over again, such as not having the right diversification for their age and their portfolio, or paying too much in mutual fund fees. Most people didn't understand that mutual funds charged fees and were not being tax efficient. We said, "Okay, this looks like a data problem that we can help solve for you guys." That's the genesis out of which Future Advisor was born.

What problem are you working on solving?

Bo Lu: Future Advisor is really trying to do one single thing: deliver on the vision that high-quality financial advice should be able to be produced cheaply and, thus, be broadly accessible to everyone.

If you look at the current U.S. market of financial advisors and you multiply the number of financial advisors in the U.S. — which is roughly a quarter-million people — by what is generally accepted to be a full book of clients, you'll realize that even at full capacity, the U.S. advisor market can serve only about 11% of U.S. households.

In serving that 11% of U.S. households, the advisory market for retail investing makes about $20 billion. This is a classic market where a service is extremely expensive but in being so can only serve a small percentage of the addressable market. As we walked into this, we realized that we're part of something bigger. If you look at 60 years ago, a big problem was that everyone wanted a color television and they just weren't being manufactured quickly or cheaply enough. Manufacturing scale has caught up to us. Now, everything you want you generally can have because manufactured things are cheap. Creating services is still extremely expensive and non-scalable. Healthcare as a service, education as a service and, of course, financial services, financial advising service comes to mind. What we're doing is taking information technology, like computer science, to scale a service in the way the electrical engineering of our forefathers scaled manufacturing.

How big is the team? How are you working together?

Bo Lu: The team has eight people in Seattle. It's almost exactly half finance and half engineering. We unabashedly have a bunch of engineers from MIT, which is where my co-founder went to school, essentially sucking the brains out of the finance team and putting them in software. It's really funny because a lot of the time when we design an algorithm, we actually just sit down and say, "Okay, let's look at a bunch of examples and see what the intuitive decisions are of science people and then try to encode them."

We rely heavily on the existing academic literature in both computational finance and economics because a lot of this work has been done. The interesting thing is that the knowledge is not the problem. The knowledge exists, and it's unequivocal in the things that are good for investors. Paying less in fees is good for investors. Being more tax efficient is good for investors. How to do that is relatively easy. What's hard for the industry for a long time has been to scalably apply those principles in a nuanced way to everybody's unique situation. That's something that software is uniquely good at doing.

How do you think about the responsibility of providing financial advice that traditionally has been offered by highly certified professionals who've taken exams, worked at banks, and are expensive to get to because of that professional experience?

Bo Lu: There's a couple of answers to that question, one of which is the folks on our team have the certifications that people look for. We've got certified financial advisors*, CFAs, which is a private designation on the team. We have math PhDs from the University of Washington on the team. The people who create the software are the caliber of people that you would want to be sitting down with you and helping you with your finances in the first place.

The second part of that is that we ourselves are a registered investment advisor. You'll see many websites that on the bottom say, "This is not intended to be financial advice." We don't say that. This is intended to be financial advice. We're registered federally with the SEC as a registered investment advisor and have passed all of the exams necessary.

*In the interview, Lu said that FutureAdvisor has 'certified financial advisors'. In this context, CFA stood for something else: The Future Advisor team includes Simon Moore, a chartered financial analyst, who advises the startup on investing algorithms design.

Where does the financial data behind the site come from?

Bo Lu: From the consumer side, the site has only four steps. These four steps are very familiar to anyone who's used a financial advisor before. A client signs up for the products. It's a free web service, designed to help everyone. In step one, they answer a couple of questions about their personal situation: age, how much they make, when they want to retire. Then they're asked the kinds of questions that good financial advisors ask, such as your risk tolerance. Here, you start to see that we rely on academic work as much as possible.

There is a great set of work out of the University of Kentucky on risk tolerance questionnaires. Whereas most companies just use some questionnaire they came up with internally, we went and scoured literature to find exact questions that were specifically worded — and have been tested under those wordings to yield statistically significant deviations in determining risk tolerance. So we use those questions. With that information, the algorithm can then come up with a target portfolio allocation for the customer.

In step two, the customer can synchronize or import data from their existing financial institutions into the software. We use Yodlee, which you've written about before. It's the same technology that Mint used to import detailed data about what you already hold in your 401K, in your IRA, and in all of your other investment accounts.

Step three is the dashboard. The dashboard shows your investments at a level that makes sense, rather than current brokerages where when you log in, they tell you how much money you have, with a list of funds you have, and how much they've changed in the last 24 hours of trading. We answer four questions on the dashboard.

  1. Am I on track?
  2. Am I well-diversified for this goal?
  3. Am I overpaying in hidden fees in my mutual funds?
  4. Am I as tax efficient as I could be?

We answer those four questions and then in the final step of the process, we give algorithmically-generated, step-by-step instructions about how to improve your portfolio. This includes specific advice like "this many shares of Fund X to buy this many shares of Fund Y" in your IRA. When the consumer sees this, he or she can go and, with this help, clean up their portfolios. It's kind of like diagnosis and prescription for your portfolio.

There are three separate streams of data underlying the product. One is the Yodlee stream, which is detailed holdings data from hundreds of financial institutions. Two is data about what's in a fund. That comes from Morningstar. Morningstar, of course, gets it from the SEC because mutual funds are required to disclose this. So we can tell, for example, if a fund is an international fund or a domestic fund, what the fees are, and what it holds. The third dataset is from the datasets that we have to tier in ourselves, which is 401K data from the Department of Labor.

On top of this triad of datasets sits our algorithm, which has undergone six to eight months of beta testing with customers. (We launched the product in March 2012.) That algorithm asks, "Okay, given these three datasets, what is the current state of your portfolio? What is the minimum number of moves to reduce both transaction costs and any capital gains that you might incur to get you from where you are to roughly where you need to be?" That's how the product works under the covers.

What's the business model?

Bo Lu: You can think of it as similar to Redfin. Redfin allows individual realtors to do more work by using algorithms to help them do all of the repetitive parts. Our product and the web service is free and will always be free. Information wants to be free. That's how we work in software. It doesn't cost us anything for an additional person to come and use the website.

The way that Future Advisor makes money is that we charge for advisor time. A small percentage of customers will have individual questions about their specific situation or want to talk to a human being and have them answer some questions. This is actually good in two ways.

One, it helps the transition from a purely human service to what we think will eventually be an almost purely digital service. People who are somewhere along that continuum of wanting someone to talk to but don't need someone full-time to talk to can still do that.

Two, those conversations are a great way for us to find out, in aggregate, what the things are that the software doesn't yet do or doesn't do well. Overall, if we take a ton of calls that are all the same, then it means there's an opportunity for the software to step in, scale that process, and help people who don't want to call us or who can't afford to call us to get that information.

What's the next step?

Bo Lu: This is a problem that has a dramatic possible impact attached to it. Personal investing, what the industry calls "retail investing," is a closed-loop system. Money goes in, and it's your money, and it stays there for a while. Then it comes out, and it's still your money. There's very little additional value creation by the financial advisory industry.

It may sound like I'm going out on a limb to say this, but it's generally accepted that the value creation of you and I putting our hard-earned money into the market is actually done by companies. Companies deploy that capital, they grow, and they return that capital in the form of higher stock prices or dividends, fueling the engine of our economic growth.

There are companies across the country and across the world adding value to people's lives. There's little to no value to be added by financial advisors trying to pick stocks. It's actually academically proven that there's negative value to be added there because it turns out the only people who make money are financial advisors.

This is a $20 billion market. But really what that means is that it's a $20 billion tax on individual American investors. If we're successful, we're going to reduce that $20 billion tax to a much smaller number by orders of magnitude. The money that's saved is kept by individual investors, and they keep more of what's theirs.

Because of the size of this market and the size of the possible impact, we are venture-backed because we can really change the world for the better if we're successful. There are a bunch of the great folks in the Valley who have done a lot of work in money and the democratization of software and money tools.

What's the vision for the future of your startup?

Bo Lu: I was just reading your story about smart disclosure a little while ago. There's a great analogy in there that I think applies aptly to us. It's maps. The first maps were paper. Today if you look at the way a retail investor absorbs information, it's mostly paper. They get a prospectus in the mail. They have a bunch of disclosures they have to sign — and the paper is extremely hard to read. I don't know if you've ever tried to read a prospectus; it's something that very few of us enjoy. (I happen to be one of them, but I understand if not everyone's me.) They're extremely hard to parse.

Then we moved on to the digital age of folks taking the data embedded in those prospectuses and making them available. That was Morningstar, right? Now we're moving into the age of folks taking that data and mating it with other data, such as 401K data and your own personal financial holdings data, to make individual personalized recommendations. That's Future Advisor the way it is today.

But just as maps moved from paper maps to Google Maps, it didn't stop there. It moves and has moved to self-autonomous cars. There will be a day when you and I don't ever have to look at a map because, rather than the map being a tool to help me make the decision to get somewhere, the map will be a part of a service I use that just gets the job done. It gets me from point A to point B.

In finance, the job is to invest my money properly. Steward it so that it grows, so that it's there for me when I retire. That's our vision as well. We're going to move from being an information service to actually doing it for you. It's just a default way so that if you do nothing, your financial assets are well taken care of. That's what we think is the ultimate vision of this: Everything works beautifully and you no longer have to think about it.

We're now asked to make ridiculous decisions about spreading money between a checking account, an IRA, a savings account and a 401K, which really make no sense to most of us. The vision is to have one pot of money that invests itself correctly, that you put money into when you earn money. You take money out when you spend it. You don't have to make any decisions that you were never trained nor educated to make about your own personal finances because it just does the right thing. The self-driving car is our vision.

Connecting the future of personal finance with an autonomous car is an interesting perspective. Just as with outsourcing driving, however, there's the potential for negative outcomes. Do you have any concerns about the algorithm going awry?

Bo Lu: We are extremely cognizant of the weighty matters that we are working with here. We have a ton of testing that happens internally. You could even criticize us, as a software development firm, in that we're moving slower than other software development firms. We're not going to move as quickly as Twitter or Foursquare because, to be honest, if they mess up, it's not that big a deal. We're extremely careful about it.

At the same time, I think the Google self-driving car analogy is apt because people immediately say, "Well, what if the car gets into an accident?" Those kinds of fears exist in all fields that matter.


Analysis: Why this matters

"The analogy that comes to mind for me isn't the self-driving car," commented Mike Loukides, via email. "It's personalized medicine."

One of the big problems in health care is that to qualify treatments, we do testing over a very wide sample, and reject it if it doesn't work better than a placebo. But what about drugs that are 100% effective on 10% of the population, but 0% effective on 90%? They're almost certainly rejected. It strikes me that what Future Advisor is doing isn't so much helping you to go on autopilot, but getting beyond generic prescriptions and generating customized advice, just as a future MD might be able to do a DNA sequence in his office and generate a custom treatment.

The secret sauce for Future Advisor is the combination of personal data, open government data and proprietary algorithms. The key to realizing value, in this context, is combining multiple data streams with a user interface that's easy for a consumer to navigate. That combination has long been known by another name: It's a mashup. But the mashups of 2012 have something that those of 2002 didn't have, at least in volume or quality: data.

Future Advisor, Redfin (real estate) or Castlight (healthcare) are all interesting examples of entrepreneurs creating data products from democratized government data. Future Advisor uses data from consumers and the U.S. Department of Labor, Redfin synthesizes data from economists and government agencies, and Castlight uses health data from the U.S. Department of Health and Human Services. In each case, they provide a valuable service and/or product by making sense of that data deluge.

Related:

May 21 2012

Four short links: 21 May 2012

  1. Objectivist C -- very clever. In Objectivist-C, each program is free to acquire as many resources as it can, without interference from the operating system. (via Tim O'Reilly)
  2. Zynga and Facebook Stock Oddities (The Atlantic) -- signs of robotrading, a reminder that we're surrounded by algorithms and only notice them when they go awry.
  3. The Final ROFLcon and Mobile's Impact on Internet Culture (Andy Baio) -- These days, memes spread faster and wider than ever, with social networks acting as the fuel for mass distribution. But it's possible we may see less mutation and remixing in the near future. As Internet usage shifts from desktops and laptops to mobile devices and tablets, the ability to mutate memes in a meaningful way becomes harder.
  4. Oh Mi Bod -- I was impressed to learn that one can buy vibrators that can be controlled from an iPhone. Insert iBone joke here. (via Cary Gibson)

May 02 2012

Four short links: 2 May 2012

  1. Punting on SxSW (Brad Feld) -- I came across this old post and thought: if you can make money by being a dick, or make money by being a caring family person, why would you choose to be a dick? As far as I can tell, being a dick is optional. Brogrammers, take note. Be more like Brad Feld, who prioritises his family and acts accordingly.
  2. Probabilistic Structures for Data Mining -- readable introduction to useful algorithms and datastructures showing their performance, reliability, and resources trade-off. (via Hacker News)
  3. Dataset -- a Javascript library for transforming, querying, manipulating data from different sources.
  4. Many HTTPS Servers are Insecure -- 75% still vulnerable to the BEAST attack.

February 07 2012

Unstructured data is worth the effort when you've got the right tools


It's dawning on companies that data analysis can yield insights and inform business decisions. As data-driven benefits grow, so do our demands about what more data can tell us and what other types we can mine.

During her PhD studies, Alyona Medelyan (@zelandiya) developed Maui, an open source tool that performs as well as professional librarians in identifying main topics in documents. Medelyan now leads the research and development of API-based products at Pingar.

Pingar senior software researcher Anna Divoli (@annadivoli) studied sentence extraction for semi-automatic annotation of biological databases. Her current research focuses on developing methodologies for acquiring knowledge from textual data.

"Big data is important in many diverse areas, such as science, social media, and enterprise," observes Divoli. "Our big data niche is analysis of unstructured text." In the interview below, Medelyan and Divoli describe their work and what they see on the horizon for unstructured data analysis.

How did you get started in big data?

Anna Divoli: I began working with big data as it relates to science during my PhD. I worked with bioinformaticians who mined proteomics data. My research was on mining information from the biomedical literature that could serve as annotation in a database of protein families.

Alyona Medelyan: Like Anna, I mainly focus on unstructured data and how it can be managed using clever algorithms. During my PhD in natural language processing and data mining, I started applying such algorithms to large datasets to investigate how time-consuming data analysis and processing tasks can be automated.

What projects are you working on now?

Alyona Medelyan: For the past two years at Pingar, I've been developing solutions for enterprise customers who accumulate unstructured data and want to search, analyze, and explore this data efficiently. We develop entity extraction, text summarization, and other text analytics solutions to help scrub and interpret unstructured data in an organization.

Anna Divoli: We're focusing on several verticals that struggle with too much textual data, such as bioscience, legal, and government. We also strive to develop language-independent solutions.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

What are the trends and challenges you're seeing in the big data space?

Anna Divoli: There are plenty of trends that span various aspects of big data, such as making the data accessible from mobile devices, cloud solutions, addressing security and privacy issues, and analyzing social data.

One trend that is pertinent to us is the increasing popularity of APIs. Plenty of APIs exist that give access to large datasets, but there also powerful APIs that manage big data efficiently, such as text analytics, entity extraction, and data mining APIs.

Alyona Medelyan: The great thing about APIs is that they can be integrated into existing applications used inside an organization.

With regard to the challenges, enterprise data is very messy, inconsistent, and spread out across multiple internal systems and applications. APIs like the ones we're working on can bring consistency and structure to a company's legacy data.

The presentation you'll be giving at the Strata Conference will focus on practical applications of mining unstructured data. Why is this an important topic to address?

Anna Divoli: Every single organization in every vertical deals with unstructured data. Tons of text is produced daily — emails, reports, proposals, patents, literature, etc. This data needs to be mined to allow fast searching, easy processing, and quick decision making.

Alyona Medelyan: Big data often stands for structured data that is collected into a well-defined database — who bought which book in an online bookstore, for example. Such databases are relatively easy to mine because they have a consistent form. At the same time, there is plenty of unstructured data that is just as valuable, but it's extremely difficult to analyze it because it lacks structure. In our presentation, we will show how to detect structure using APIs, natural language processing and text mining, and demonstrate how this creates immediate value for business users.

Are there important new tools or projects on the horizon for big data?

Alyona Medelyan: Text analytics tools are very hot right now, and they improve daily as scientists come up with new ways of making algorithms understand written text more accurately. It is amazing that an algorithm can detect names of people, organizations, and locations within seconds simply by analyzing the context in which words are used. The trend for such tools is to move toward recognition of further useful entities, such as product names, brands, events, and skills.

Anna Divoli: Also, entity relation extraction is an important trend. A relation that consistently connects two entities in many documents is important information in science and enterprise alike. Entity relation extraction helps detect new knowledge in big data.

Other trends include detecting sentiment in social data, integrating multiple languages, and applying text analytics to audio and video transcripts. The number of videos grows at a constant rate, and transcripts are even more unstructured than written text because there is no punctuation. That's another exciting area on the horizon!

Who do you follow in the big data community?

Alyona Medelyan: We tend to follow researchers in areas that are used for dealing with big data, such as natural language processing, visualization, user experience, human computer information retrieval, as well as the semantic web. Two of them are also speaking at Strata this year: Daniel Tunkelang and Marti Hearst.


This interview was edited and condensed.

Related:

Reposted bycheg00 cheg00
Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl