Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

June 09 2011

Why Facebook isn't the best home for your public events

In an earlier episode of this series I discussed how Facebook events can flow through elmcity hubs by way of Facebook's search API. Last week I added another, and more direct, method. Now you can use a Facebook iCalendar URL (the export link at the bottom of Facebook's Events page) to route your public events through an elmcity hub.

The benefit, of course, is convenience. If you're promoting a public community event, Facebook is a great way to get the word out and keep track of who's coming. Ideally you should only have to write down the event data once. If you can enter the data in Facebook and then syndicate it elsewhere, that seems like a win.

In Syndicating Facebook events I explain how this can work. But I also suggest that your Facebook account might not be the best authoritative home for your public event data. Let's consider why not.

Here's a public event that I'm promoting:

Facebook public event

Here's how it looks in a rendering of the Keene elmcity hub:

Rendering of the Keene elmcity hub

And here's the link to the End of the world (again) event:

Did you click it? If so, one of two things happened. If you were logged into Facebook you saw the event. If not you saw this:

Facebook login page

Is this a public event or not? It depends on what you mean by public.
In this case the event is public within Facebook but not available on the
open web. The restriction is problematic. Elmcity hubs are transparent
conduits, they reveal their sources, curators do their work out in the
open, and communities served by elmcity hubs can see how those hubs
are constituted. Quasi-public URLs like this one aren't in the spirit
of the project.

My end-of-the-world event is obviously an illustrative joke. But consider two other organizations whose events appear in that elmcity screenshot: the Gilsum Church and the City of Keene. These organizations are currently using Google Calendar to manage their public events. They use Google Calendar's widget to display events on their websites, and they route Google Calendar's iCalendar feeds through the elmcity hub.

Now that elmcity can receive iCalendar feeds from Facebook, the church and the city could use their Facebook accounts, instead of Google Calendar, to manage their public events. Should they? I think not. Public information should be really public, not just quasi-public.

What's more, organizations should strive to own and control their online identities (and associated data) to the extent they can. From that perspective, using services like Google Calendar or Hotmail Calendar are also problematic. But you have choices. While it's convenient to use the free services of Google Calendar or Hotmail Calendar, and I recommend both, I regard them as training wheels. An organization that cares about owning its identity and data, as all ultimately should, can use any standard calendar system to publish a feed to a URL served by a host that it pays and trusts, using an Internet domain name that it paid for and owns.

Either way, how could an organization manage its public event stream using standard calendar software while still tapping into Facebook's excellent social dynamics? Here's what I'd like to see:

Example Facebook login page

It's great that Facebook offers outbound iCalendar feeds. I'd also like to see it accept inbound feeds. And that should work everywhere, by the way, not just for Facebook and not just for calendar events. Consider photos. I should be able to pay a service to archive and manage my complete photo stream. If I choose to share some of those photos on Facebook and others on Flickr, both should syndicate the photos from my online archive using a standard feed protocol -- say Atom, or if richer type information is needed, OData.

The elmcity project is, above all, an invitation to explore what it means to be the authoritative source of your own data. Among other things, it means that we should expect services to be able to use our data without owning our data. And that services should be able to acquire our data not only by capturing our keystrokes, but also by syndicating from URLs that we claim as our authoritative sources.

OSCON Data 2011, being held July 25-27 in Portland, Ore., is a gathering for developers who are hands-on, doing the systems work and evolving architectures and tools to manage data. (This event is co-located with OSCON.)

Save 20% on registration with the code OS11RAD


December 22 2010

How will the elmcity service scale? Like the web!

During a recent talk at Harvard's Berkman Center, Scott MacLeod asked (via the IRC backchannel): "How does the elmcity service scale?" He wondered, in particular, whether the service could support an online university like the World University and School that might produce an unlimited number of class schedules.

My short answer was that the elmcity service scales like the web. But what does that really mean? I promised Scott that I'd spell it out here. We'll start with an analogy. As I mentioned in The power of informal contracts, the elmcity project envisions a web of calendar feeds that's analogous to the blogosphere's web of RSS and Atom feeds. We take for granted that the blogosphere scales like the web. A blog feed is just a special kind of web page. Anybody can create a blog and publish its feed at some URL. Why not calendars too? We haven't thought about them in the same way, but the ICS (iCalendar) files that our calendar programs export are the moral equivalents of the RSS and Atom feeds that our blog publishing tools export. Anybody can create a calendar and publish its feed at some URL.

These webs -- of HTML pages, of blog feeds, of calendar feeds -- are notionally webs of peers. We can all publish, and we can all read, without relying on a central authority or privileged hub. There are, to be sure, powerful centralized services. My blog, for example, is one of millions hosted at, aggregated by Bloglines and Google Reader, and indexed by Google and Bing. But these services, while convenient, are optional. So long as we can publish our blogs somewhere online, advertise their URLs, and get the DNS to resolve their domain names, we can have a working blogosphere. The necessary and sufficient condition is that we can all publish resources (e.g., pages and feeds), and that we can all access those resources.

For the calendarsphere that I envision, a service like elmcity is likewise optional. Let's suppose that the World University and School succeeds wildly. At any given moment there are tens of thousands of courses on offer, each with its own course page and also with its own calendar. Instructors publish course pages using any web publishing tool, and also publish calendars using any calendar publishing tool -- Google Calendar, or Outlook, or Apple iCal, or another calendar program. Students pick schedules of courses, bookmark the course pages, and load the course calendars into any of these same calendar programs. The calendar software merges the separate course calendars and combines them with the students' personal calendars. These calendar programs are thus aggregators of calendar feeds in the same way that feedreaders like NetNewsWire or Google Reader are aggregators of blog feeds.

Given a baseline web of peers, it's useful to be able to merge our individual views of them into pooled spaces. NetNewsWire is a personal feedreader, but Google Reader is social. In the pool created by Google Reader, data finds data and people find people. The elmcity service aims to create that same kind of effect in the realm of public calendar events. When we pool our separate calendars, we publicize the events that we are promoting, we discover events that others are promoting, and we see all our public events on common timelines.

What constrains our ability to scale out pools of calendars? Let's continue the analogy to the blogosphere. Google Reader constitutes one pooled space for blog feeds, Bloglines another. Because the data aggregated by these services conforms to open standards (i.e., RSS and Atom), other services can create blog pools too. Likewise in the calendarsphere, Google Calendar is one way to pool calendars, the elmcity service is another, Calagator is a third. Others can play too.

How can we scale these providers of calendar pools? Along one axis, each provider needs to be able to grow its computing power. Google Calendar scales on this axis by using Google's cloud platform. The elmcity service uses Azure, the Microsoft cloud platform. Note that elmcity, unlike Google Calendar, is an open source service. That means you could run your own instance of it, using your own Azure account, but you'd still be relying on the Azure compute fabric.

Calagator, based on Ruby on Rails, could be deployed either to a conventional hosting environment or to a cloud platform. It would thus scale, along the compute axis, as either environment allows. The elmcity service could be used in this way too. The service is written for Azure, but the core aggregation engine is independent of Azure and could be deployed to a conventional hosting environment.

For feed aggregators, another axis of scale is the number of feeds that can be processed. When that number grows, the time required to connect to many feeds and ingest their contents becomes a constraint. The elmcity service currently supports 50 calendar hubs. Thrice daily, each hub pulls data from Eventful, Upcoming, EventBrite, Facebook, and a list of iCalendar feeds. So far a single Azure worker role can easily do all this work. I'll dial up the number of workers if needed, but first I want to squeeze as much parallelism as I can out of each worker. To that end, I recently upgraded to the 4.0 version of the .NET Framework in order to exploit its dramatically simplified parallel processing. In this week's companion article I show how the elmcity service uses that new capability to optimize the time required to gather feeds from many sources.

Pub/sub networks can also scale by coalescing feeds. Consider a calendar hub operated, for some city, by the online arm of that city's newspaper. One model is flat. The newspaper runs a hub whose registry lists all the calendar feeds in town. But another model is hierarchical. In that model, there's a hub for arts and culture, a hub for sports and recreation, a hub for city government, and so on. Each hub gathers events from many feeds, and publishes the merged result on its own website for its own constituency. If the newspaper wants to include all those feeds, it can list them individually in its own registry. But why aggregate arts, sports, or recreation feeds more than once? The newspaper's uber-hub can, instead, reuse the arts, sports, and recreation feeds curated by those respective hubs, adding their merged outputs to its own set of curated feeds. Such reuse can cut down the computational time and effort required to propagate feeds throughout the network.

None of these mechanisms will matter, though, until a vibrant ecosystem of calendar feeds requires them. That's the ultimate constraint. Scaling the calendarsphere isn't a problem yet, but it would be a good problem to have. First, though, we've got to light up a whole bunch of feeds.


September 22 2010

Personal data stores and pub/sub networks

The elmcity project joins five streams of data about public calendar events. Four of them are well-known services: Facebook, EventBrite, Upcoming, and Eventful. They all work the same way. You sign up for a service, you post your events there, other people can go there to find out about your events. What they find, when they go there, are copies of your event data. If you want to promote an event in more than one place, you have to push a copy to each place. If you change the time or day of an event, you have to revisit all those places and push new copies to each.

The fifth stream works differently. It's a loosely-coupled network of publishers and subscribers. To join it you post events once to your own website, blog, or online calendar, in a way that yields two complementary outputs. For people, you offer HTML files that can be read and printed. For mechanized web services like elmcity, you offer iCalendar feeds that can be aggregated and syndicated. If you want to promote an event in more than one place, you ask other services to subscribe to your feed. If you change the time or day of the event, every subscriber sees the change.

The first and best example of a decentralized pub/sub network is the blogosphere. My original blogging tool, Radio UserLand, embodied the pub/sub pattern. It made everything you wrote automatically available in two ways: as HTML for people to read, and as RSS for machines to process. What's more, Radio UserLand didn't just produce RSS feeds that other services could read and aggregate. It was itself an aggregator that pointed the way toward what became a vibrant ecosystem of applications -- and services -- that knew how to merge RSS streams. In that network the feeds we published flowed freely, and appeared in many contexts. But they always remained tethered to original sources that we stamped with our identities, hosted wherever we liked, and controlled ourselves. Every RSS feed that was published, no matter where it was published, contributed to a global pool of RSS feeds. Any aggregator could create a view of the blogosphere by merging a set of feeds, chosen from the global pool, based on subject, author, place, time, or combinations of these selectors.

Now social streams have largely eclipsed RSS readers, and the feed reading service I've used for years -- Bloglines -- will soon go dark. Dave Winer thinks the RSS ecosystem could be rebooted, and argues for centralized subscription handling on the next turn of the crank. Of course definitions tend to blur when we talk about centralized versus decentralized services. Consider FriendFeed. It's centralized in the sense that a single provider offers the service. But it can be used to create many RSS hubs that merge many streams for many purposes. In The power of informal contracts I showed how an instance of FriendFeed merges a particular set of RSS feeds to create a news service just for elmcity curators. The elmcity service itself has the same kind of dual nature. A single provider offers the service. But many curators can use it to spin up many event hubs, each tuned to a location or topic.

The early blogosphere proved that we could create and share many views drawn from the same pool of feeds. That's one of the bedrock principles that I hope we'll remember and carry forward to other pub/sub networks. Another principle is that we ought to control and syndicate our data. Radio UserLand, for example, was happy to host your blog, just as Twitter and Facebook are now happy to host your online social presence. But unlike Twitter and Facebook, Radio UserLand was just as happy to let you push your data to another host. To play in the syndication network your feed just had to exist -- it didn't matter where -- and be known to one or more hubs.

This notion of a cloud-based personal data store is only now starting to come into focus. When I was groping for a term to describe it back in 2007 I came up with hosted lifebits. More recently the Internet Identity Workshop gang have settled on personal data store, as recently described by Kaliya Hamlin and Phil Windley. The acronym is variously PDS or PDX, where X, as Kaliya says, stands for "store, service, locker, bank, broker, vault, etc." Phil elaborates:

The term itself is a problem. When you say "store" or "locker" people assume that this is a place to put things (not surprisingly). While there will certainly be data stored in the PDS, that really misses its primary purposes: acting as a broker for all the data you've got stored all over the place, and managing the metadata about that data. That is, it is a single place, but a place of indirection not storage. The PDS is the place where services that need access to your data will come for permission, metadata, and location.

The elmcity service aligns with that vision. If we require the calendar data for a city, town, or neighborhood to live in a single place of storage, we'll never agree to use the same place. Thus the elmcity service merges streams from Facebook, EventBrite, Upcoming, and Eventful. But those streams are fed by people who put copies of their events into them, one event at at time, once per stream. What if we managed our public calendar data canonically, in personal (or organizational) data stores fed from our own preferred calendar applications? These data stores would in turn feed downstream hubs like Facebook, EventBrite, Upcoming, and Eventful, all of which could -- although they currently don't -- receive and transmit such feeds. Other hubs, based on instances of the elmcity service or a similar system, would enable curators to create particular geographic or topical views.

I've identified a handful of common calendar applications that can publish calendar data at URLs accessible to any such hub, in a format (iCalendar) that enables automated processing. The short list includes Google Calendar, Outlook, Apple iCal, and Windows Live Calendar. But there are many others. Here's the full list of producers as captured so far by the elmcity service:

feed producer# of feeds -//Google Inc//Google Calendar 70.9054//EN151-//Meetup Inc//RemoteApi//EN14unknown14iCalendar-Ruby6e-vanced event management system6-//DDay.iCal//NONSGML Limited Event Feeds//NONSGML//EN4-// Inc//NONSGML R25//EN3-//Drupal iCal API//EN3-//Microsoft Corporation//Windows Live Calendar//EN3-//Trumba Corporation//Trumba Calendar Services 0.11.6830//EN2-//herald-dispatch/calendar//NONSGML v1.0//EN1-//WebCalendar-v1.1.21Zvents Ical1Coldfusion81-//Intand Corporation//Tandem for Schools//EN1-//strange bird labs//Drupal iCal API//EN1-//SchoolCenter/NONSGML Calendar v9.0//EN1-//blogTO//NONSGML Toronto Events V1.0//EN1-//Events at Stanford//iCal4j 1.0//EN1-//University of California\\, Berkeley//UCB Events Calendar//EN 1-//EVDB// Inc.//mySportSite//EN1Mobile Geographics Tides 3988 20101

Google Calendar dominates overwhelmingly, but the long tail hints at the variety of event sources that could feed into a calendar-oriented pub/sub network. How much of the total event flow comes by way of this assortment of iCalendar sources, as compared to centralized sources? Here's the breakdown:

(Click to enlarge)

It's roughly half Eventful, a third Upcoming, a fifth iCalendar. There's negligible flow from EventBrite, which focuses on big events. Likewise FaceBook where the focus, though it's evolving, remains on group versus world visibility.

In a companion piece at O'Reilly Answers I show how I made this visualization. It's a nice example of another kind of pub/sub network, in this case one that's enabled by the OData protocol. For our purposes here, I just want to draw attention to the varying contributions made by the five streams to each of the hubs. The Eventful stream is strong almost everywhere. The Upcoming and iCalendar tributaries are only strong in some places. But where the iCalendar stream does flow powerfully, there's a curator who has mined one or more rich veins of data from a school system, or a city government, or a newspaper. Today the vast majority of these organizations think of the calendar information they push as text for people to read. Few realize it is also data for networks to syndicate. When that mindset changes, a river of data will be unleashed.


Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
No Soup for you

Don't be the product, buy the product!

YES, I want to SOUP ●UP for ...