Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

December 22 2010

How will the elmcity service scale? Like the web!

During a recent talk at Harvard's Berkman Center, Scott MacLeod asked (via the IRC backchannel): "How does the elmcity service scale?" He wondered, in particular, whether the service could support an online university like the World University and School that might produce an unlimited number of class schedules.

My short answer was that the elmcity service scales like the web. But what does that really mean? I promised Scott that I'd spell it out here. We'll start with an analogy. As I mentioned in The power of informal contracts, the elmcity project envisions a web of calendar feeds that's analogous to the blogosphere's web of RSS and Atom feeds. We take for granted that the blogosphere scales like the web. A blog feed is just a special kind of web page. Anybody can create a blog and publish its feed at some URL. Why not calendars too? We haven't thought about them in the same way, but the ICS (iCalendar) files that our calendar programs export are the moral equivalents of the RSS and Atom feeds that our blog publishing tools export. Anybody can create a calendar and publish its feed at some URL.

These webs -- of HTML pages, of blog feeds, of calendar feeds -- are notionally webs of peers. We can all publish, and we can all read, without relying on a central authority or privileged hub. There are, to be sure, powerful centralized services. My blog, for example, is one of millions hosted at wordpress.com, aggregated by Bloglines and Google Reader, and indexed by Google and Bing. But these services, while convenient, are optional. So long as we can publish our blogs somewhere online, advertise their URLs, and get the DNS to resolve their domain names, we can have a working blogosphere. The necessary and sufficient condition is that we can all publish resources (e.g., pages and feeds), and that we can all access those resources.

For the calendarsphere that I envision, a service like elmcity is likewise optional. Let's suppose that the World University and School succeeds wildly. At any given moment there are tens of thousands of courses on offer, each with its own course page and also with its own calendar. Instructors publish course pages using any web publishing tool, and also publish calendars using any calendar publishing tool -- Google Calendar, or Outlook, or Apple iCal, or another calendar program. Students pick schedules of courses, bookmark the course pages, and load the course calendars into any of these same calendar programs. The calendar software merges the separate course calendars and combines them with the students' personal calendars. These calendar programs are thus aggregators of calendar feeds in the same way that feedreaders like NetNewsWire or Google Reader are aggregators of blog feeds.

Given a baseline web of peers, it's useful to be able to merge our individual views of them into pooled spaces. NetNewsWire is a personal feedreader, but Google Reader is social. In the pool created by Google Reader, data finds data and people find people. The elmcity service aims to create that same kind of effect in the realm of public calendar events. When we pool our separate calendars, we publicize the events that we are promoting, we discover events that others are promoting, and we see all our public events on common timelines.

What constrains our ability to scale out pools of calendars? Let's continue the analogy to the blogosphere. Google Reader constitutes one pooled space for blog feeds, Bloglines another. Because the data aggregated by these services conforms to open standards (i.e., RSS and Atom), other services can create blog pools too. Likewise in the calendarsphere, Google Calendar is one way to pool calendars, the elmcity service is another, Calagator is a third. Others can play too.

How can we scale these providers of calendar pools? Along one axis, each provider needs to be able to grow its computing power. Google Calendar scales on this axis by using Google's cloud platform. The elmcity service uses Azure, the Microsoft cloud platform. Note that elmcity, unlike Google Calendar, is an open source service. That means you could run your own instance of it, using your own Azure account, but you'd still be relying on the Azure compute fabric.

Calagator, based on Ruby on Rails, could be deployed either to a conventional hosting environment or to a cloud platform. It would thus scale, along the compute axis, as either environment allows. The elmcity service could be used in this way too. The service is written for Azure, but the core aggregation engine is independent of Azure and could be deployed to a conventional hosting environment.

For feed aggregators, another axis of scale is the number of feeds that can be processed. When that number grows, the time required to connect to many feeds and ingest their contents becomes a constraint. The elmcity service currently supports 50 calendar hubs. Thrice daily, each hub pulls data from Eventful, Upcoming, EventBrite, Facebook, and a list of iCalendar feeds. So far a single Azure worker role can easily do all this work. I'll dial up the number of workers if needed, but first I want to squeeze as much parallelism as I can out of each worker. To that end, I recently upgraded to the 4.0 version of the .NET Framework in order to exploit its dramatically simplified parallel processing. In this week's companion article I show how the elmcity service uses that new capability to optimize the time required to gather feeds from many sources.

Pub/sub networks can also scale by coalescing feeds. Consider a calendar hub operated, for some city, by the online arm of that city's newspaper. One model is flat. The newspaper runs a hub whose registry lists all the calendar feeds in town. But another model is hierarchical. In that model, there's a hub for arts and culture, a hub for sports and recreation, a hub for city government, and so on. Each hub gathers events from many feeds, and publishes the merged result on its own website for its own constituency. If the newspaper wants to include all those feeds, it can list them individually in its own registry. But why aggregate arts, sports, or recreation feeds more than once? The newspaper's uber-hub can, instead, reuse the arts, sports, and recreation feeds curated by those respective hubs, adding their merged outputs to its own set of curated feeds. Such reuse can cut down the computational time and effort required to propagate feeds throughout the network.

None of these mechanisms will matter, though, until a vibrant ecosystem of calendar feeds requires them. That's the ultimate constraint. Scaling the calendarsphere isn't a problem yet, but it would be a good problem to have. First, though, we've got to light up a whole bunch of feeds.



Related:




October 26 2010

A lesson in civics, public data, and computational principles

Among the elmcity hubs that started up last week were Santa Rosa, Calif. and Bellingham, Wash.. Both towns' local newspapers, the Press Democrat and the Bellingham Herald, use a service called Zvents to manage their online calendars. The curators who started the Santa Rosa and Bellingham hubs, Sean Boisen and Tim Sawtell, wondered if they could subscribe these hubs to iCalendar feeds from Zvents.

At first glance the answer was yes. On the Press Democrat's site, for example, if you view the Arts & Crafts category, you'll find this encouraging cluster of icons and links:

An iCalendar feed? Sweet! But alas, while that "Save as iCal" does yield an iCalendar response, it's an empty shell:

BEGIN:VCALENDAR
VERSION:2.0
CALSCALE:GREGORIAN
METHOD:PUBLISH
PRODID:Zvents Ical
END:VCALENDAR

Why? Beats me, if someone knows I'd love to hear the explanation. Meanwhile, what about the corresponding RSS feed? I wasn't hopeful. In my work on the elmcity project I often see an error of the sort I discussed in "Developing intuitions about data." People tend to conflate the purposes of an RSS feed, which typically conveys headlines and links to people, and an iCalendar feed, which conveys dates and times to computers. This category error is so common that I've enshrined it in a slide I've used in several recent talks.

But I opened up the Press Democrat's RSS feed anyway, and here is what I found:

<item>
<title>Event: JLNS Holiday Home Tour & Winter Market at Friedman Event Center, Sat, Nov 20 10:00a</title>
<description>The Junior League of Napa-Sonoma presents a tour of prestigious homes in Bennett Valley, all festively decorated, with all proceeds to benefit local charities</description>
<link>http://events.pressdemocrat.com/santa-rosa-ca/events/show/139081965-jlns-holiday-home-tour-winter-market</link>
<xCal:dtstart>2010-11-20 10:00:00 +0000</xCal:dtstart>
<xCal:dtend>2010-11-20 16:00:00 +0000</xCal:dtend>
<xCal:location>http://events.pressdemocrat.com/santa-rosa-ca/venues/show/672937-friedman-event-center</xCal:location>
</item>

Fig. 1: An event in the Press Democrat's RSS events feed

Even if you know about such things as XML, RSS, and xCal, pretend for a moment that you don't. Anyone can see that there is structure here: <xCal:dtstart>2010-11-20 10:00:00 +0000</xCal:dtstart>. That makes this feed very different from most RSS feeds that purport to represent calendar events, which typically look like this:

<item>
<title>Event: JLNS Holiday Home Tour & Winter Market at Friedman Event Center, Sat, Nov 20 10:00a</title>
<description>The Junior League of Napa-Sonoma presents a tour of prestigious homes in Bennett Valley, all festively decorated, with all proceeds to benefit local charities</description>
<link>http://events.pressdemocrat.com/santa-rosa-ca/events/show/139081965-jlns-holiday-home-tour-winter-market</link>
</item>

Fig. 2: Same event in a typical RSS events feed



We humans have no trouble understanding Sat, Nov 20 10:00a. The year is omitted but we know what's meant. Likewise we can parse a wide range of alternatives, such as Saturday, November 20, at 10:00. Does that mean AM or PM? We just know that it's AM; a home tour wouldn't start on Saturday at 10PM. Conversely we just know that a blues band wouldn't start playing on Saturday at 10AM.

Since we aren't aware that we hold this tacit knowledge, it doesn't occur to us that computers lack it, or that as a result they require explicit rules and structure. But if you want your data to syndicate around the web, you've got to provide rule-based structure. Since iCalendar is the most ubiquitous format for event data, that's currently the best way to do it. Here's that same event in iCalendar:

BEGIN:VCALENDAR
VERSION:2.0
CALSCALE:GREGORIAN
METHOD:PUBLISH
PRODID:Zvents Ical
BEGIN:VCALENDAR
BEGIN:VEVENT
DTSTART:20101120T100000
DTEND:20101120T16000000
SUMMARY:Event: JLNS Holiday Home Tour & Winter Market at Friedman 
  Event Center, Sat, Nov 20 10:00a
END:VEVENT
END:VCALENDAR
Fig. 3: Same event in an iCalendar feed

A point that technologists often miss, when we fight religious wars amongst ourselves about competing formats -- RSS versus Atom, iCalendar versus xCalendar, and so on -- is that the existence of structure matters far more than the kind of structure. Fig. 1 and Fig. 3 are two species within the same genus. Fig. 2, though, belongs to another phylum altogether. If you're using the method shown in Fig. 2 to syndicate your data on the web, you're doing it wrong. That RSS feed is no more useful for the purpose than a PDF file, or an HTML file.

When I realized that Zvents produces RSS+xCal feeds, and that multiple newspaper sites rely on Zvents, I added support for that format to the elmcity service. A translator reads RSS+xCal and writes iCalendar. Because the Zvents flavor of RSS+xCal is well structured, it was trivial to create that translator.

This new feature for elmcity hubs creates some interesting opportunities. For example, since each Zvents feed is the result of a query, the set of these RSS+xCal feeds is unbounded. Here's one kind of query used on the Press Democrat's events page; it lists events in the "Dance" category.

http://events.pressdemocrat.com/search?cat=4&st=event

We can easily transform that URL into one that yields the corresponding RSS feed:

http://events.pressdemocrat.com/search?cat=4&st=event&rss=1

Observing this, Tim Sawtell was able to merge a set of categorized feeds into the Santa Rosa hub. In doing so, he illustrated a number of key principles that computational thinkers know and apply:

  1. query -- the feed is the output of an open-ended search
  2. data structure -- a structured representation of the search is available as RSS+xCal
  3. transformation -- from RSS+xCal to iCalendar
  4. abstraction and generalization - what works for one category works for all

Even more is possible. Suppose you're a grief counselor in Santa Rosa, and you would like to provide your clientele with a comprehensive list of support resources. Here's a useful search:

http://events.pressdemocrat.com/search?swhat=bereavement

It yields two recurring events for two different support groups at Hospice By The Bay.

Free Hospice By The Bay Drop-in Group Supports Newly Bereaved
Join others who are beginning the journey through grief at a free, ongoing, drop-in ... 10/26/2010 Tuesday
12:00p to 1:00p
(repeats 9 times)
Hospice By The Bay,
Sonoma CA Hospice By The Bay Support Group for Spousal/Partner Loss
Hospice By The Bay offers an eight-week support group to help adults who have lost ... 10/26/2010 Tuesday
10:00a to 11:30a
Hospice By The Bay,
Sonoma CA
Fig. 4: Bereavement support group meetings in Santa Rosa, via the Press Democrat

Here's a transformation of that search URL that yields a RSS+xCal data feed:

http://events.pressdemocrat.com/search?swhat=bereavement&rss=1

That feed can now be further transformed into an iCalendar feed and included in an elmcity hub, or in any other cloud-based service or device-based app that reads iCalendar feeds. If you wanted to create a bereavement category in an elmcity hub you'd be off to a great start! But where else would you look? There's plenty of information about public events on the web today. But only a tiny fraction of it exists as structured data that can flow through syndicated networks. Most of it lives in PDF files, or HTML files, that are only valuable to people who find their way to the sites that serve up those files.

In an effort to visualize this iceberg of unstructured information below the waterline of the data web, I added a feature to the elmcity service that searches for recurring events. It works by looking for the kinds of phrases that we humans use in our discourse: first Monday of every month at 9PM or 2nd and 4th Tuesday, 6:15-7:45 pm. In this week's companion article I show how that search harvests pages containing these terms from Google and Bing. Here, let's consider a few of the 3,500 items found when running that kind of search for Santa Rosa:

google: 1253
bing: 2023
google_and_bing: 292

1. Hannah Caratti, Pre-Licensed Professional, Santa Rosa, CA 95404 ... (google)

Every Monday at 6pm - 7pm $20 - $30 per session. Meditation & Stress Reduction Group ... Chronic Pain or Illness Therapist in Santa Rosa, CA ...

2. Bob Greenberg, Marriage & Family Therapist, Santa Rosa, CA 95404 ... (google)

Every Monday at 12am - 12am $40+ per session. An in depth group for adult ...

3. Classes at the Women's Health and Birth Center (google)

Every Monday (except for holiday Mondays). Group/walk-in from 12 noon ... Women's Health and Birth Center since 1993, 583 Summerfield Road Santa Rosa, CA 95405.

4. North Bay Bereavement and Grief Support Programs (google)

Every Monday, Noon-1:30 p.m.. Back to top ... 547 Mendocino Avenue, Santa Rosa, CA 95401 (Parking garage 521 7th Street) ...

Fig. 5: Unstructured event data for Santa Rosa

Investigating the fourth item, North Bay Bereavement and Grief Support Programs, we find a bunch of events represented in an unstructured way:

Bereaved Parents: For parents whose young or adult child has died. 2nd and 4th Thursdays, 6:00 - 7:30 p.m.



Family and Caregiver Support Groups: For adults whose loved one has a life-threatening illness.
Every Tuesday, 4:00-5:30 p.m.



Survivors of Suicide: For those who have lost a loved one to suicide.
Every Monday, Noon-1:30 p.m.



People in Grief: For people whose loved one has died.
Every Wednesday, 6:00-7:30 p.m.



Partner Loss - Evening: For adults whose spouse or partner has died.
2nd and 4th Tuesday, 6:15-7:45 p.m.



Partner Loss - Daytime: For adults whose spouse or partner has died.
Every Wednesday, 11:00 a.m. - 12:30 p.m.



Fig. 6: Unstructured event data about bereavement support groups in Santa Rosa

I'm sure the Press Democrat would love to include these events on its calendar. It can't, though, because there's only one way for Sutter VNA and Hospice to get its support group meetings onto the Press Democrat's calendar. Somebody has to log into the site and input the data.

That model has never worked well, and it never will. The folks at Sutter VNA and Hospice only want to input that information once, on their own website. And that's all they should be expected to do! Their site ought to be the authoritative source for both human-readable information about events and machine-readable data that can syndicate to the Press Democrat or to any other site that needs it.

Unfortunately the Sutter VNA folks don't know about this dual possibility, and don't realize that they could achieve it using Google Calendar, or Hotmail Calendar, or any other single source of human-readable text and machine-readable data about public events.

Likewise, the Press Democrat does not realize that it could subscribe to a data feed from Sutter VNA, once, and thereafter automatically receive a stream of data as comprehensive and accurate as the authoritative source wishes to provide.

This model for collective information management relies on principles that computational thinkers know and apply, including:

  1. pub/sub -- the communication pattern is publish/subscribe
  2. indirection -- event data is passed by reference, not by value, from publisher to subscriber
  3. syndication -- a loosely-coupled network of publishers and subscribers

How might we teach these kinds of principles to the Sutter VNAs and Press Democrats of the world? Maybe we can start by teaching them to the kids we think are digital natives, but who don't actually learn these principles -- because we haven't formulated them and don't teach them.

If you teach in a middle school or a high school, here's an interesting civics lesson you could try. Spin up an elmcity hub for your town, point kids at the unstructured iceberg revealed by the search feature, and show them how to use a service like Google Calendar or Hotmail Calendar to convert unstructured event information into structured event data that can syndicate through the hub.

The task can easily be parallelized by carving the list of search results into chunks, and assigning chunks to individual students or teams of students. Working together they should soon be able to produce a substantial calendar of events that won't appear in any existing online directory. That calendar will be both a valuable civic contribution and a lesson in underlying principles.

For extra credit, have the students engage with the sources and explain the principles to them. The script might go like this:

Dear Mr. Jones,

We're students at the Jefferson Middle School, and we're working on a class project to improve the amount and quality of online event information for our community. We noticed that the following information is available on your website: [EXAMPLES].

However, these events aren't published in a form that enables them to show up automatically elsewhere -- for example, on the Herald's site, or the Chamber of Commerce site, or on people's personal calendars. To show how that can work, we have reformulated your information as a data feed. You can see it merged together with other data feeds here: [EXAMPLE].

This is just a demonstration. We're not the appropriate source for your data, you are. As part of our class project, we're reaching out to organizations like yours to show them how they can publish their own event information in two ways: as text for people to read, and also as data for computers to process and for networks to syndicate.

We know that sounds complicated, but it's really just a way of applying the ordinary calendar software that you probably already have and use. May we contact the person in your organization who's responsible for the events page on your website, and make a presentation about how you could be publishing event information in a more useful way?

Sincerely,

Kayla Smith, Tim Miller, Samantha Williams
Jefferson Middle School Civic Data Project

If you're not a teacher yourself, but you know teachers who might like to try this project-based exercise in civic data gathering and computational thinking, by all means invite them to contact me. I'll be happy to help set up the exercise, support it, and document the outcome.



Related:




Sponsored post
soup-sponsored
Reposted bySchrammelhammelMrCoffeinmybetterworldkonikonikonikonikoniambassadorofdumbgroeschtlNaitliszpikkumyygittimmoejeschge

September 22 2010

Personal data stores and pub/sub networks

The elmcity project joins five streams of data about public calendar events. Four of them are well-known services: Facebook, EventBrite, Upcoming, and Eventful. They all work the same way. You sign up for a service, you post your events there, other people can go there to find out about your events. What they find, when they go there, are copies of your event data. If you want to promote an event in more than one place, you have to push a copy to each place. If you change the time or day of an event, you have to revisit all those places and push new copies to each.

The fifth stream works differently. It's a loosely-coupled network of publishers and subscribers. To join it you post events once to your own website, blog, or online calendar, in a way that yields two complementary outputs. For people, you offer HTML files that can be read and printed. For mechanized web services like elmcity, you offer iCalendar feeds that can be aggregated and syndicated. If you want to promote an event in more than one place, you ask other services to subscribe to your feed. If you change the time or day of the event, every subscriber sees the change.

The first and best example of a decentralized pub/sub network is the blogosphere. My original blogging tool, Radio UserLand, embodied the pub/sub pattern. It made everything you wrote automatically available in two ways: as HTML for people to read, and as RSS for machines to process. What's more, Radio UserLand didn't just produce RSS feeds that other services could read and aggregate. It was itself an aggregator that pointed the way toward what became a vibrant ecosystem of applications -- and services -- that knew how to merge RSS streams. In that network the feeds we published flowed freely, and appeared in many contexts. But they always remained tethered to original sources that we stamped with our identities, hosted wherever we liked, and controlled ourselves. Every RSS feed that was published, no matter where it was published, contributed to a global pool of RSS feeds. Any aggregator could create a view of the blogosphere by merging a set of feeds, chosen from the global pool, based on subject, author, place, time, or combinations of these selectors.

Now social streams have largely eclipsed RSS readers, and the feed reading service I've used for years -- Bloglines -- will soon go dark. Dave Winer thinks the RSS ecosystem could be rebooted, and argues for centralized subscription handling on the next turn of the crank. Of course definitions tend to blur when we talk about centralized versus decentralized services. Consider FriendFeed. It's centralized in the sense that a single provider offers the service. But it can be used to create many RSS hubs that merge many streams for many purposes. In The power of informal contracts I showed how an instance of FriendFeed merges a particular set of RSS feeds to create a news service just for elmcity curators. The elmcity service itself has the same kind of dual nature. A single provider offers the service. But many curators can use it to spin up many event hubs, each tuned to a location or topic.

The early blogosphere proved that we could create and share many views drawn from the same pool of feeds. That's one of the bedrock principles that I hope we'll remember and carry forward to other pub/sub networks. Another principle is that we ought to control and syndicate our data. Radio UserLand, for example, was happy to host your blog, just as Twitter and Facebook are now happy to host your online social presence. But unlike Twitter and Facebook, Radio UserLand was just as happy to let you push your data to another host. To play in the syndication network your feed just had to exist -- it didn't matter where -- and be known to one or more hubs.

This notion of a cloud-based personal data store is only now starting to come into focus. When I was groping for a term to describe it back in 2007 I came up with hosted lifebits. More recently the Internet Identity Workshop gang have settled on personal data store, as recently described by Kaliya Hamlin and Phil Windley. The acronym is variously PDS or PDX, where X, as Kaliya says, stands for "store, service, locker, bank, broker, vault, etc." Phil elaborates:

The term itself is a problem. When you say "store" or "locker" people assume that this is a place to put things (not surprisingly). While there will certainly be data stored in the PDS, that really misses its primary purposes: acting as a broker for all the data you've got stored all over the place, and managing the metadata about that data. That is, it is a single place, but a place of indirection not storage. The PDS is the place where services that need access to your data will come for permission, metadata, and location.

The elmcity service aligns with that vision. If we require the calendar data for a city, town, or neighborhood to live in a single place of storage, we'll never agree to use the same place. Thus the elmcity service merges streams from Facebook, EventBrite, Upcoming, and Eventful. But those streams are fed by people who put copies of their events into them, one event at at time, once per stream. What if we managed our public calendar data canonically, in personal (or organizational) data stores fed from our own preferred calendar applications? These data stores would in turn feed downstream hubs like Facebook, EventBrite, Upcoming, and Eventful, all of which could -- although they currently don't -- receive and transmit such feeds. Other hubs, based on instances of the elmcity service or a similar system, would enable curators to create particular geographic or topical views.

I've identified a handful of common calendar applications that can publish calendar data at URLs accessible to any such hub, in a format (iCalendar) that enables automated processing. The short list includes Google Calendar, Outlook, Apple iCal, and Windows Live Calendar. But there are many others. Here's the full list of producers as captured so far by the elmcity service:

feed producer# of feeds -//Google Inc//Google Calendar 70.9054//EN151-//Meetup Inc//RemoteApi//EN14unknown14iCalendar-Ruby6e-vanced event management system6-//DDay.iCal//NONSGML ddaysoftware.com//EN5-//Last.fm Limited Event Feeds//NONSGML//EN4-//openmikes.org/NONSGML openmikes.org//EN3-//CollegeNET Inc//NONSGML R25//EN3-//Drupal iCal API//EN3-//Microsoft Corporation//Windows Live Calendar//EN3-//Trumba Corporation//Trumba Calendar Services 0.11.6830//EN2-//herald-dispatch/calendar//NONSGML v1.0//EN1-//WebCalendar-v1.1.21Zvents Ical1Coldfusion81-//Intand Corporation//Tandem for Schools//EN1-//strange bird labs//Drupal iCal API//EN1-//SchoolCenter/NONSGML Calendar v9.0//EN1-//blogTO//NONSGML Toronto Events V1.0//EN1-//Events at Stanford//iCal4j 1.0//EN1-//University of California\\, Berkeley//UCB Events Calendar//EN 1-//EVDB//www.eventful.com//EN1-//mySportSite Inc.//mySportSite//EN1Mobile Geographics Tides 3988 20101

Google Calendar dominates overwhelmingly, but the long tail hints at the variety of event sources that could feed into a calendar-oriented pub/sub network. How much of the total event flow comes by way of this assortment of iCalendar sources, as compared to centralized sources? Here's the breakdown:


(Click to enlarge)

It's roughly half Eventful, a third Upcoming, a fifth iCalendar. There's negligible flow from EventBrite, which focuses on big events. Likewise FaceBook where the focus, though it's evolving, remains on group versus world visibility.

In a companion piece at O'Reilly Answers I show how I made this visualization. It's a nice example of another kind of pub/sub network, in this case one that's enabled by the OData protocol. For our purposes here, I just want to draw attention to the varying contributions made by the five streams to each of the hubs. The Eventful stream is strong almost everywhere. The Upcoming and iCalendar tributaries are only strong in some places. But where the iCalendar stream does flow powerfully, there's a curator who has mined one or more rich veins of data from a school system, or a city government, or a newspaper. Today the vast majority of these organizations think of the calendar information they push as text for people to read. Few realize it is also data for networks to syndicate. When that mindset changes, a river of data will be unleashed.



Related:


August 03 2010

Lessons learned building the elmcity service

In 1995 I started writing a column for BYTE about the development of the magazine's website, plus some early examples of what we now call web services and social media. When I started, I knew very little about Apache, Perl, and the Common Gateway Interface. But I was lucky to be able to learn by doing, by explaining what I learned to my readers, and by relaying what they were teaching me. Because I came to the project with a beginner's mind, the column became a launchpad for a lot of people who were just getting started on web development.

Nowadays I'm working on another web project, the elmcity calendar aggregator. And I came to this project with a different kind of beginner's mind. I had built a first version of the service a few years back in Python, on Linux, using the Django framework. After I joined Microsoft I decided to recreate it on Azure. I started in Python -- specifically, IronPython. But Azure was brand new at the time, and not very friendly to IronPython. So I switched to C# and .NET. I knew more about that environment than I had once known about Perl and CPAN, but not a whole lot more. That inexperience qualifies me to write another series of learning-by-doing essays, and that's what this will be.

The code, which is under an Apache 2.0 license, will live on github. I'll discuss it in detail over on O'Reilly Answers. In this space, I'll reflect on larger themes: building and operating a cloud service in 2010, in a way that cooperates with other services and straddles two different cultures.

You know the cultural stereotypes. In the open source realm, services
written in dynamically-typed languages like Python and Ruby wrangle
streams of open data for the public good. In the enterprise zone,
services written in statically-typed languages like C# and Java
manage proprietary data for profit. What happens when you mix open
source goals, styles, and attitudes with Microsoft tools, languages,
and frameworks? You get a cultural mashup. That's what the elmcity
project is, and what this series will explore.

Recently I had dinner with Adrian Holovaty. He's the force behind Django, the popular Python-based web development framework, and EveryBlock, an engine for hyperlocal news and information. Adrian asked me what it's like to build software the way I've been doing it for the last year: in C# (and IronPython), on Azure, using Visual Studio Express. I picked the first example that came into my head: "When I rename a variable or method," I said, "it gets automatically renamed across the whole project." Adrian's response was: "I've never used a tool like that, so I don't know what I'm missing."

Of course it goes both ways. A lot of developers on the Microsoft side of the fence have never used Django, or Rails, and they don't know what they're missing either.

If you've followed my work over the years, you know I've always been a best-of-both-worlds pragmatist. So this will be an atypical narrative about C# and .NET development. I see through the lens of Perl, Python, HTTP, and REST, with a bias toward The Simplest Thing That Could Possibly Work.

You shouldn't have to drink a gallon of Kool-Aid, and then have a brain transplant, in order to start producing useful results. Back in the BYTE era I was struck by how little I actually had to learn about Perl and CGI in order to accomplish my goals. Likewise, I've barely scratched the surface of C#, .NET, Visual Studio, and Azure as I've developed the elmcity service.

I claim that's a good thing. There are many more services needing to be built than there are Adrian Holovatys available to build them. One of Microsoft's great strengths has always been the empowerment of the average developer. It should be possible for a useful service to be built, maintained, and evolved by somebody who isn't a great programmer. And trust me, I'm not. But the languages, tools, framework, and platform that I'm using for this project have enabled me to be better than I otherwise would be.

Finally, this series is about the wider goals of the elmcity project. It was born of my frustration with the web's longstanding failure to outperform posters on shop windows and kiosks as a source of information about goings-on in our cities, towns, and neighborhoods. I'm trying to bootstrap an ecosystem of iCalendar feeds that's analogous to the existing network of RSS and Atom feeds. The elmcity service is an example of what Rohit Khare memorably called syndication-oriented architecture. It embraces that style by syndicating with other services such as delicious and FriendFeed. And it will ultimately succeed only when everyone involved in the events ecosystem -- event owners and promoters as well as print and online aggregators -- can plug into a network of syndicated data feeds. So I'll talk about lessons learned while building and running the service, but also about why we need to broadly enable -- and popularize! -- a decentralized style of social information management. Because it's not just about events and calendars. We're all becoming publishers and consumers of many different kinds of data. Centralized repositories won't work. We have to learn how to network our data.

April 01 2010

02mydafsoup-01
Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
(PRO)
No Soup for you

Don't be the product, buy the product!

close
YES, I want to SOUP ●UP for ...