Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

April 14 2011

3 big challenges in location development

With the goal of indexing the entire web by location, Fwix founder and Where 2.0 speaker Darian Shirazi (@darian314) has had to dig in to a host of location-based development issues. In the following interview, he discusses the biggest challenges in location and how Fwix is addressing them.


What are the most challenging aspects of location development?

Darian ShiraziDarian Shirazi: There are three really big challenges. The first is probably the least difficult of the three, and that's getting accurate real-time information around locations. Crawling the web in real-time is difficult, especially if you're analyzing millions of pieces of data. It's even difficult for a company like Google just because there's so much data out there. And in local, it's even more important for crawling to be done in real-time, because you need to know what's happening near you right now. That requires a distributed crawling system, and that's tough to build.

The second problem, the most difficult we've had to solve, is entity extraction. That's the process of taking a news article and figuring out what locations it mentions or what locations it's about. If you see an article that mentions five of the best restaurants in the Mission District, being able to analyze that content and note, for example, that "Hog 'N Rocks" is a restaurant on 19th and Mission, is really tough. That requires us to linguistically understand an entity and what is a pronoun and what isn't a pronoun. Then you get into all of these edge conditions where a restaurant like Hog 'N Rocks might be called "Hogs & Rocks" or "Hogs and Rocks" or "H 'N Rocks." You want to catch those entities to be able to say, "This article is about these restaurants and these lat/longs."

The third problem we've had to tackle is building a places taxonomy that you can match against. If you use SimpleGeo's API or Google Places' API, you're not going to get the detailed taxonomy required to match the identified entities. You won't be able to get the different spellings of certain restaurants, and you won't necessarily know that, colloquially, "Dom and Vinnie's Pizza Shop" is just called "Dom's." Being able to identify those against the taxonomy is quite difficult and requires structuring the data in a certain way, so that matching against aliases is done quickly.

Fwix
Identifying and extracting entities, like restaurants, is a challenge for location developers.

How are you dealing with those challenges?

Darian Shirazi: We have a bunch of different taggers that we put into the system that we've worked through over time to determine which are good at identifying certain entities. Some taggers are very good at tagging cities, some are better at tagging businesses, and some are really good at identifying the difference between a person, place, or thing. So we have these quorum taggers that are being applied to the data to determine the validity of the tag or whether a tag gets detected.

The way that you test it is that you have a system that allows you to input hints, and you test the hint. The hints get put into a queue of other hints that we're testing. We run a regression test and then we see if that hint improved the tagging ability or made it worse. At this point, the process is really about moving the accuracy needle a quarter of a percent per week. That's just how this game goes. If you talk to the people at Google or Bing, they'll all say the same thing.



At Where 2.0 you'll be talking about an "open places database." What is that?


Darian Shirazi: A truly open database is a huge initiative, and something that we're working toward. I can't really give details as to exactly what it's going to be, but we're working with a few partners to come up with an open places database that is actually complete.

We think that an open places database is a lot more than just a list of places — it's a list of places and content, it's a list of places and the reviews associated with those businesses, it's the list of parks and the people that have checked in at those parks, etc. Additionally, an open places database, in our minds, is something that you can contribute to. We want users and developers and everyone to come back to us and say, "Dom and Vinnie's is really just called Dom's." We also want to be able to give information to people in any format. One of the things that we'll be allowing is if you contact us, we'll give you a data dump of our places database. We'll give you a full licensed copy of it, if you want it.

Where 2.0: 2011, being held April 19-21 in Santa Clara, Calif., will explore the intersection of location technologies and trends in software development, business strategies, and marketing.

Save 25% on registration with the code WHR11RAD

How do you see location technology evolving?

Darian Shirazi: Looking toward the future, I think augmented reality is going to be a big deal. I don't mean augmented reality in the sense of a game or in the sense of Yelp's Monocle, which is a small additive to their app to show reviews in the camera view. I think of augmented reality as you are at a location and you want to see the metadata about that location. I believe that when you have a phone or a location-enabled device, you should be able to get a sense for what's going on right there and a sense for the context. That's the key.

This interview was edited and condensed.



Related:


March 09 2011

Why location data is a mess, and what can be done about it

Between identifying relevant and accurate data sources, harmonizing data from multiple sources, and finding new ways to store and manipulate that data, location technology can be messy, says SimpleGeo's Chris Hutchins (@hutchins). But there are ways to clean it up. Hutchins explains how in the following interview.


What makes location data messy?

Chris HutchinsChris Hutchins: The primary reasons are:

  • The ever-complicated restrictions, licenses, and use rights that come with different datasets — this can include requirements to use a company's map tiles, to share back all derivative works, and sponsored listings or advertisements alongside the data.
  • Conflating records that represent the same location/business/place between multiple datasets is an incredibly arduous process.
  • With small datasets, spatial queries are quite simple. However, as datasets grow exponentially in size, indexing that data to enable fast queries becomes difficult.
  • Location is usually an opinion, not a fact. For example, there are very strong views about where neighborhoods start and end.
  • The nature of location-based information requires all technology to handle real-time requests against datasets that are always changing.

What can be done to clean up location data?

Chris Hutchins: Part of cleaning up is understanding the situation. By being aware of the limitations of certain databases or of the restrictions that some datasets require, you can better understand your capabilities.

Specifically related to data, ensuring that your data source is providing clean and up-to-date data means you won't be sending end users to the wrong location or giving them false information. Also, as more companies understand what their core competency is — and what it isn't — they learn to trust other companies to handle the things that require a more niche expertise. Understanding that this technology is new and learning to embrace tools and services in their infancy will certainly give you an edge with location data.

Where 2.0: 2011, being held April 19-21 in Santa Clara, Calif., will explore the intersection of location technologies and trends in software development, business strategies, and marketing.

Save 25% on registration with the code WHR11RAD



What are the most challenging aspects of location-aware development?


Chris Hutchins: The primary challenges we hear about are a lack of fast and accurate tools for storing, manipulating and querying spatial data, and the fact that most data is expensive and comes with restrictive terms of use. Today's geospatial infrastructure platforms are antiquated, so building the back-end infrastructure for applications takes a long time and requires some very niche skills.

How is SimpleGeo Places being used?

Chris Hutchins: SimpleGeo Places is a free database of business listings and points of interest (POI), which is being used by applications to get an up-to-date view of local businesses without having to manage a large and changing spatial database in-house. Most current POI databases have restrictive terms of use and are expensive. We believe that this has impeded innovation in the development of location-aware services and applications, so SimpleGeo provides an amount of usage of our Places data at no cost to developers and it will always be free of restrictive licensing.

What future developments do you see for location technology?

Chris Hutchins: The future of location is context, where apps will be better at giving you relevant information based on real-time information about where you are and what's around you. I'm really looking forward to a world where by knowing where I've been in the past, the things my friends like, the weather, and more, applications will be able to pinpoint where I might be interested in going and what I might be interested in doing, as well as getting me there.

This interview was edited and condensed.



Related:


Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl