Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

October 22 2013

Mining the social web, again

When we first published Mining the Social Web, I thought it was one of the most important books I worked on that year. Now that we’re publishing a second edition (which I didn’t work on), I find that I agree with myself. With this new edition, Mining the Social Web is more important than ever.

While we’re seeing more and more cynicism about the value of data, and particularly “big data,” that cynicism isn’t shared by most people who actually work with data. Data has undoubtedly been overhyped and oversold, but the best way to arm yourself against the hype machine is to start working with data yourself, to find out what you can and can’t learn. And there’s no shortage of data around. Everything we do leaves a cloud of data behind it: Twitter, Facebook, Google+ — to say nothing of the thousands of other social sites out there, such as Pinterest, Yelp, Foursquare, you name it. Google is doing a great job of mining your data for value. Why shouldn’t you?

There are few better ways to learn about mining social data than by starting with Twitter; Twitter is really a ready-made laboratory for the new data scientist. And this book is without a doubt the best and most thorough approach to mining Twitter data out there. But that’s only a starting point. We hear a lot in the press about sentiment analysis and mining unstructured text data; this book shows you how to do it. If you need to mine the data in web pages or email archives, this book shows you how. And if you want to understand how to people collaborate on projects, Mining the Social Web is the only place I’ve seen that analyzes GitHub data.

All of the examples in the book are available on Github. In addition to the example code, which is bundled into IPython notebooks, Matthew has provided a VirtualBox VM that installs Python, all the libraries you need to run the examples, the examples themselves, and an IPython server. Checking out the examples is as simple as installing Virtual Box, installing Vagrant, cloning the 2nd edition’s Github archive, and typing “vagrant up.” (This quick start guide summarizes all of that.) You can execute the examples for yourself in the virtual machine; modify them; and use the virtual machine for your own projects, since it’s a fully functional Linux system with Python, Java, MongoDB, and other necessities pre-installed. You can view this as a book with accompanying examples in a particularly nice package, or you can view the book as “premium support” for an open source project that consists of the examples and the VM.

If you want to engage with the data that’s surrounding you, Mining the Social Web is the best place to start. Use it to learn, to experiment, and to build your own data projects.

September 24 2013

Four short links: 25 September 2013

  1. Salesforce ArchitectureOur search tier runs on commodity Linux hosts, each of which is augmented with a 640 GiB PCI-E flash drive which serves as a caching layer for search requests. These hosts get their data from a shared SAN array via an NFS file system. Search indexes are stored on the flash drive to enable greater performance for search throughput. Architecture porn.
  2. Gerrit Code Review (Github) — tool for doing code reviews on Github codebases. (via Chris Aniszczyk)
  3. Humanize (Github) — Javascript to turn “first” into a list position, format numbers, generate plurals in English, etc. (via Pete Warden)
  4. Users vs Apps (Tim Bray) — the wrong thing being shared with the wrong people, even once, can ruin a trust relationship forever. Personally, I’m pretty hard-line about this one. I’m currently refusing to update the Android app from my bank, CIBC, because it wants access to my contacts. You know what the right amount of “social” content is in my relationship with my bank? Zero, that’s what.

August 21 2013

Github, die GPL und die Wirren der Open-Source-Lizenzen

Ein Großteil an Code wird von Entwicklern ohne Lizenz ins Netz gestellt, etwa bei Github. Das birgt bereits Probleme, doch dahinter steckt eine weitere Entwicklung: Auch Copyleft-Modelle wie das der GPL setzen ein starkes Urheberrecht voraus. Beides ist für viele Entwickler zu unflexibel und damit nicht mehr attraktiv, so Armin Ronacher.

Die General Public License (GNU GPL) war lange der Eckpfeiler der Open-Source-Bewegung – zumindest konnte man diesen Eindruck gewinnen. Bei genauerem Hinsehen bestand die Open-Source-Welt seit jeher aus vielen Lizenzen, die GNU GPL war nur ein kleiner Teil davon. Doch in den letzten Jahren ist immer deutlicher erkennbar, dass viele Entwickler aus verschiedenen Gründen einen offenen Hass für diese Lizenzen aufgebaut haben.

Erstaunlich ist, wie wenig heutzutage über die Lizenz diskutiert wird. Für mich ist das Thema durch Github wieder relevant geworden. Als Quelltext-Hoster ist Github momentan ein Zentrum der Open-Source-Bewegung, doch zugleich findet sich dort mehr zweckwidrig als zweckdienlich lizenzierte Software. Github hat versucht, das zu ändern und eine Lizenzauswahl eingeführt. Ich halte das für eine sehr schlechte Idee – besonders weil es das Thema GPL und alle Folgefragen wieder aufrollt.

Hier geht es daher um die Geschichte der Open-Source-Lizenzen, was sich zu verändern scheint – und darum, was wir tun können, um die Situation zu verbessern.

GPL: Was bisher passierte

Bevor die General Public License in der Version 3 (GPLv3) veröffentlicht wurde, war die GPLv2 die am weitesten verbreitete Copyleft-Lizenz. Copyleft und GNU GPL galten als eine Einheit. Die General Public License ist eine sehr restriktive Lizenz, da sie nicht lediglich eine handvoll Bedingungen festschreibt und den Rest erlaubt, sondern Rechte nach Art einer Whitelist aufführt. Aus diesem Grund wurde die Kompatibilität der GPL stets diskutiert. Bei der Frage der GPL-Kompatibilität geht es darum, eine Lizenz per Downgrade mit der GPL kompatibel zu machen. Bei den meisten Lizenzen war das möglich, aber einige Lizenzen enthalten Klauseln, die es unmöglich machen. Weit bekannt ist das Beispiel der Apache License 2.0, die durch zusätzliche Restriktionen für Patente als GPL-inkompatibel angesehen wurde, ähnlich einige Versionen der Mozilla Public License.

Als 2007 dann eine Version der GPL erarbeitet wurde, gewann die Frage der GPL-Kompatibilität ein weiteres Mal an Komplexität: Durch die Funktionsweise der GPL-Lizenzen sind verschiedene Versionen untereinander nicht kompatibel. Das ist nicht sonderlich überraschend. Sieht man sich aber an, wie das Ökosystem eigentlich funktionieren sollte und wie tatsächlich lizenziert wird, hat es enorme Auswirkungen.

Denn es gibt eine Menge Code, der je nach Betrachtungsweise entweder unter GPLv2 oder GPLv3 steht. Der Grund ist, dass Code bei GPL unter einer bestimmten Version oder „jeder späteren Version” (any later version) lizenziert werden kann. Und wie wird definiert, wie eine spätere Version aussieht? Durch die GPL selbst. Wenn ein Entwickler festlegt, dass für eine Software eine bestimmte Lizenzversion „oder jede spätere Version” gelten soll, haben Nachnutzer die Wahl: Sie können den Bedingungen der jeweiligen Version oder denen der späteren folgen.

Drei Lager in der GPL-Welt

Momentan gibt es daher drei Lager: Das erste, das bei der GPLv2 geblieben ist. Das zweite, das auf die GPLv3 hochgestuft hat. Und das dritte, in dem je nach Kontext entweder die GPLv2 oder GPLv3 genutzt wird. Ärger über die GPLv3 war am stärksten bei Linux und Busybox zu vernehmen: Beide entschieden, dass die einzig anwendbare Lizenz die GPLv2 ist. Auf der anderen Seite wurde ein Großteil des GNU-Codes vor ein paar Jahren auf GPLv3 überführt.

Das Ergebnis ist, dass GNU und Linux inzwischen in verschiedenen Welten leben. Ironischerweise steht „GNU/Linux” jetzt für einen Lizenzkonflikt. Da die meisten der GNU-Projekte unter der GPLv3 stehen und Linux immer bei GPLv2 bleiben wird, kann es kein Codesharing mehr zwischen diesen Projekten geben.

Das vermutlich größte Problem mit GPLv3 für Unternehmen ist ein Bestandteil der Lizenz, der als „Anti-Tivoisierung” bekannt ist. Ein zusätzlicher Abschnitt mit Bedingungen für den Fall, dass Software Teil eines Geräts im Consumer-Bereich wird. Im Kern wird gefordert, dass modifizierte Software auf einem unmodifizierten Gerät laufen muss. Die Lizenz verlangt, dass die Signaturschlüssel offengelegt sind und die Bedienungsanleitung Informationen darüber enthält, wie modifizierte Software installiert werden kann. Und es muss sichergestellt sein, dass modifizierte Software überhaupt auf dem Gerät läuft. Immerhin verlangt die Lizenz nicht, dass der Hersteller die Garantie dann aufrechterhalten muss.

Im Allgemeinen sind die Lizenzbedingungen damit ein großes Problem für Unternehmen. Apple zum Beispiel verkauft mit dem iPad und iPhone Geräte mit einem gesicherten Bootloader. Somit wäre es Apple unmöglich, den GPLv3-Bedingungen nachzukommen, ohne die Sicherheitssysteme komplett entfallen zu lassen. Es betrifft aber nicht nur Apple: In keinem Appstore wird man Software unter der GPLv3 finden. Die Lizenzbeschränkungen sind bei Googles Play Store und ähnlichen Vertriebssystemen ebenfalls inkompatibel zur GPLv3.

Die Anti-GPL-Bewegung

Neben diesen Entwicklungen in der GPL-Umwelt gibt es weitere. Nicht alle hatten vergleichbaren Einfluss, aber sie haben dazu geführt, dass die GPL von vielen Entwicklern in anderem Licht gesehen wird. Android und weitere Projekte versuchen mittlerweile, das ganze System der GPL loszuwerden. Android geht dabei sehr weit und bietet einen GPL-freien Userspace an. In den Lizenzinformationen wird im Grundsatz die Apache License 2.0 bevorzugt, ausgenommen davon sind etwa Kernelmodule.

Warum also gibt es plötzlich so viel Angst vor GPL? Zum Teil liegt es daran, dass GPL schon immer eine radikale Lizenz war, vor allem weil eine Rückübertragung der Rechte fehlt. Es gibt etwa eine Klausel, die als „GPLv2-Todesstrafe” bekannt ist. Sie besagt, dass jedem, der die Lizenzregeln verletzt, automatisch die Lizenz entzogen bleibt, solange nicht ausdrücklich eine neue vergeben wurde. Ohne verbindlichen Rechteinhaber aber hieße das, man müsste jeden, der am Code mitgewirkt hat, nach einer neuen Lizenz fragen.

Darüber hinaus ist mittlerweile deutlich geworden, dass einige sogar der Meinung sind, man könne der Free Software Foundation nicht trauen. Es gibt hier zwei Fraktionen: Erstens diejenigen, die an die Ideologie Richard Stallmans glauben; zweitens diejenigen, die die GPLv2 Lizenz in Ordnung finden, aber nicht mit der Richtung einverstanden sind, in die sie sich entwickelt. Linus Torvalds ist eindeutig ein Vertreter der letzteren Fraktion. Sie existiert, weil die Free Software Foundation stark in ihrer eigenen Welt gefangen ist, in der Cloud Computing Teufelszeug ist, Smartphones nichts anderes als Ortungsgeräte und Android etwas ist, dass durch die GPL verhindert werden muss. Es gibt GPL-Unterstützer, die nicht die aktuelle Sichtweise der Free Software Foundation unterstützen. Selbst einige GNU-Projekte widersprechen den Zielen von GNU und der Free Software Foundation. Das Projekt GnuTLS etwa hat sich im Dezember 2012 von GNU gelöst.

Code ohne Lizenz

Nach einer – nicht wissenschaftlichen – Untersuchung durch Aaron Williamson vom Software Freedom Law Center sind nur bei 15 Prozent aller Repositories Lizenzdateien enthalten; nur etwa 25 Prozent erwähnen die Lizenz in der Readme-Datei. Williamson untersuchte dafür 28 Prozent der ältesten Github-Repositories – nur ein Drittel aller Projekte hatte eine Copyleft-Lizenz. Von den lizenzierten Repositories stand die klare Mehrheit entweder unter MIT/BSD- oder Apache-2-Lizenz.

Das sind keine zufriedenstellenden Ergebnisse: Der Trend, Code ohne Lizenzerklärungen ins Netz zu stellen, ist bedenklich und wirft Fragen auf. Er zeigt aber weniger, dass Entwickler nichts von Lizenzen wissen als vielmehr, dass sie sie für unwichtig und vernachlässigbar erachten. Deshalb sehe ich Githubs neues Lizenzauswahl-Werkzeug als problematisch an. Beim Erstellen eines neuen Verzeichnisses erscheint jetzt ein Lizenzwahl-Dialog; nur ohne Erklärung, was die Lizenz bedeutet. „Apache v2 License”, „GPLv2” und „MIT” werden hervorgehoben. Zwei dieser Lizenzen aber – Apache und GPLv2 – sind nicht untereinander kompatibel.

Screenshot: Lizenzauswahl bei Github

Screenshot: Lizenzauswahl bei Github

Wenn aber Entwickler zuvor keine Zeit damit verbracht haben, eine Lizenz zum Repository hinzuzufügen, dann wird es jetzt dazu führen, dass sie nicht über die Konsequenzen ihrer Wahl nachdenken. Angesichts all der verschiedenen Versionen von GPL und den rechtlichen Implikationen, die mit ihnen einhergehen, fürchte ich, dass das neue Lizenzauswahl-Werkzeug die Lage nur schlechter machen wird.

Wirren der Lizenzkompatibilität

Wenn die GPL ins Spiel kommt, hört der Spaß beim Lizenzieren auf: Zu viele Dinge und Wechselwirkungen sind zu beachten. Bedenkt man die unterschiedlichen Interpretationen der Lizenz, wird es noch schlimmer.

Das aber ist nicht nur ein Problem der GPL: Auch die Apache-Softwarelizenz ist ein ziemlicher Brocken. Ich bin mir sicher, dass nicht jeder, der Code unter die Lizenz gestellt hat, die Implikationen kennt. Die MIT-Lizenz dagegen umfasst gerade einmal zwei Paragraphen und einen Gewährleistungs-Auschluss, doch hier sind die Wechselwirkungen mit verschiedenen Jurisdiktionen nicht jedem klar.

Die implizite Annahme ist, dass irgendwie amerikanisches Recht Anwendung findet, was nicht immer der Fall ist. Open-Source-Entwicklung ist international und nicht jedes Land ist gleich. Deutschland und Österreich etwa haben wenige Bestimmungen zum eigentlichen Urheberrecht und keine Mechanismen, um es zu übertragen. Stattdessen werden Nutzungsrechte übertragen, die der Rechteinhaber unterlizenzieren kann. Da das in den Lizenzerklärungen nicht vorkommt, frage ich mich manchmal, ob mir aus solchen Formalitäten noch einmal jemand einen Strick drehen kann.

Lizenzen für die Mashup-Generation

Ich glaube, zur Zeit passiert etwas Neues in meiner Generation. Und das ist vermutlich der wichtigste Grund, warum es mit der GPL bergab geht: Meine Generation will ein eingeschränkteres Urheberrecht als bisher und kürzere Schutzfristen. Interessanterweise möchte Richard Stallman genau das nicht. Ihm ist schmerzhaft bewusst, dass auch Copyleft auf Copyright basiert und daher nur mit einem starken Copyright im Rücken durchgesetzt werden kann.

Wer Software unter BSD- oder MIT-Lizenz stellt, den würde es vermutlich nicht stören, wenn das Urheberrecht abgeschafft oder stark eingeschränkt werden würde. Richard Stallmans Welt würde zusammenbrechen. Er meinte etwa, dass sich die Piratenpartei als Bumerang für die freie-Software-Bewegung herausstellen werde.

Die neue Generation aber hat eine veränderte Sichtweise auf sharing und auf Geld. Sie will das Teilen von Inhalten und Software einfach machen, aber gleichzeitig eine unabhängige Monetarisierung ermöglichen. Es ist die Generation, die Remixe bei Youtube hochlädt, die kommentierte Walkthroughs für Computerspiele erstellt und auf viele andere Weisen mit den Inhalten anderer zu arbeiten gelernt hat.

Ein Erste-Hilfe-Kasten für Lizenzen

Wir sollten darüber nachdenken, unsere Softwarelizenz-Umwelt zu vereinfachen – weil wir sonst nicht abschätzen können, was in ein paar Jahren auf uns zukommt. Die Implikationen von Softwarelizenzen zu verdeutlichen und Hilfe zu geben, um die für die jeweiligen Ziele geeignete Lizenz auszuwählen, das wäre ein interessantes Vorhaben. Dazu würden zum Beispiel Grafiken gehören, die auf Kompatibilitätsprobleme hinweisen; die klarmachen, wie sich fehlende Erklärungen von Mitwirkenden an Software auswirken; und was passiert, wenn Rechteinhaber sterben oder nicht mehr auffindbar sind.

Ich bin sicher, dass ein guter User-Experience-Designer es schaffen würde, die Lizenzgrundlagen in 10 Minuten einfach erfahrbar zu machen. Die Informationen müssten von einem Rechtsanwalt und Mitgliedern der Community kontrolliert werden, um die Folgen für das Ökosystem fundiert einzuschätzen. Im Moment glaube ich jedenfalls, dass die Lizenzauswahl bei Github eine sehr schlechte Lösung für das Problem ist, dass Code ohne Lizenz veröffentlicht wird. Womöglich ist sie sogar schädlich, solange die Auswirkungen der jeweiligen Lizenzen nicht klar sind.

Dieser Artikel ist eine gekürzte Fassung von Armin Ronachers Posting „Licensing in a Post Copyright World”. Übersetzung: Anne-Christin Mook. Lizenz: CC BY-NC-SA.

July 25 2013

Four short links: 25 July 2013

  1. More Git and GitHub Secrets (Zach Holman) — wizards tricks. (via Rowan Crawford)
  2. Building a Keyboard from Scratch (Jesse Vincent) — for the connoisseur.
  3. Practicing Deployment (Laura Thomson) — you should build the capability for continuous deployment, even if you never intend to continuously deploy.
  4. 3D Printed Atoms (Thingiverse) — customize and 3d-print a Bohr model of any atom.

June 17 2013

March 19 2013

The City of Chicago wants you to fork its data on GitHub

GitHub has been gaining new prominence as the use of open source software in government grows.

Earlier this month, I included a few thoughts from Chicago’s chief information officer, Brett Goldstein, about the city’s use of GitHub, in a piece exploring GitHub’s role in government.

While Goldstein says that Chicago’s open data portal will remain the primary means through which Chicago releases public sector data, publishing open data on GitHub is an experiment that will be interesting to watch, in terms of whether it affects reuse or collaboration around it.

In a followup email, Goldstein, who also serves as Chicago’s chief data officer, shared more about why the city is on GitHub and what they’re learning. Our discussion follows.

Chicago's presence on GitHubChicago's presence on GitHub

The City of Chicago is on GitHub.

What has your experience on GitHub been like to date?

Brett Goldstein: It has been a positive experience so far. Our local developer community is very excited by the MIT License on these datasets, and we have received positive reactions from outside of Chicago as well.

This is a new experiment for us, so we are learning along with the community. For instance, GitHub was not built to be a data portal, so it was difficult to upload our buildings dataset, which was over 2GB. We are rethinking how to deploy that data more efficiently.

Why use GitHub, as opposed to some other data repository?

Brett Goldstein: GitHub provides the ability to download, fork, make pull requests, and merge changes back to the original data. This is a new experiment, where we can see if it’s possible to crowdsource better data. GitHub provides the necessary functionality. We already had a presence on GitHub, so it was a natural extension to that as a complement to our existing data portal.

Why does it make sense for the city to use or publish open source code?

Brett Goldstein: Three reasons. First, it solves issues with incorporating data in open source and proprietary projects. The city’s data is available to be used publicly, and this step removes any remaining licensing barriers. These datasets were targeted because they are incredibly useful in the daily life of residents and visitors to Chicago. They are the most likely to be used in outside projects. We hope this data can be incorporated into existing projects. We also hope that developers will feel more comfortable developing applications or services based on an open source license.

Second, it fits within the city’s ethos and vision for data. These datasets are items that are visible in daily life — transportation and buildings. It is not proprietary data and should be open, editable, and usable by the public.

Third, we engage in projects like this because they ultimately benefit the people of Chicago. Not only do our residents get better apps when we do what we can to support a more creative and vibrant developer community, they also will get a smarter and more nimble government using tools that are created by sharing data.

We open source many of our projects because we feel the methodology and data will benefit other municipalities.

Is anyone pulling it or collaborating with you? Have you used that code? Would you, if it happened?

Brett Goldstein: We collaborated with Ian Dees, who is a significant contributor to OpenStreetMaps, to launch this idea. We anticipate that buildings data will be integrated in OpenStreetMaps now that it’s available with a compatible license.

We have had 21 forks and a handful of pull requests fixing some issues in our README. We have not had a pull request fixing the actual data.

We do intend to merge requests to fix the data and are working on our internal process to review, reject, and merge requests. This is an exciting experiment for us, really at the forefront of what governments are doing, and we are learning along with the community as well.

Is anyone using the open data that wasn’t before, now that it’s JSON?

Brett Goldstein: We seem to be reaching a new audience with posting data on GitHub, working in tandem with our heavily trafficked data portal. A core goal of this administration is to make data open and available. We have one of the most ambitious open data programs in the country. Our portal has over 400 datasets that are machine readable, downloadable and searchable. Since it’s hosted on Socrata, basic analysis of the data is possible as well.

March 08 2013

GitHub gains new prominence as the use of open source within governments grows

github-social-codinggithub-social-codingWhen it comes to government IT in 2013, GitHub may have surpassed Twitter and Facebook as the most interesting social network. 

GitHub’s profile has been rising recently, from a Wired article about open source in government, to its high profile use by the White House and within the Consumer Financial Protection Bureau. This March, after the first White House hackathon in February, the administration’s digital team posted its new API standards on GitHub. In addition to the U.S., code from the United Kingdom, Canada, Argentina and Finland is also on the platform.

“We’re reaching a tipping point where we’re seeing more collaboration not only within government agencies, but also between different agencies, and between the government and the public,” said GitHub head of communications Liz Clinkenbeard, when I asked her for comment.

Overall, 2012 was a breakout year for the use of GitHub by government, with more than 350 government code repositories by year’s end.

Total government GitHub repositoriesTotal government GitHub repositories

Total number of government repositories on GitHub.

In January 2012, the British government committed the code for GOV.UK to GitHub.

NASA, after its first commit, added 11 more code repositories over the course of the year.

In September, the new Open Gov Foundation published the code for the MADISON legislative platform. In December, the U.S. Code went on GitHub.

GitHub’s profile was raised further in Washington this week when Ben Balter was announced as the company’s federal liaison. Balter made some open source history last year, when he was part of the federal government’s first agency-to-agency pull request. He also was a big part of giving the White House some much-needed geek cred when he coded the administration’s digital government strategy in HTML5.

Balter will be GitHub’s first government-focused employee. He won’t, however, be saddled with an undecipherable title. In a sly dig at the slow-moving institutions of government, and in keeping with GitHub’s love for octocats, Balter will be the first “Government Bureaucat,” focused on “helping government to do all sorts of governmenty things, well, more awesomely,” wrote GitHub CIO Scott Chacon.

Part of Balter’s job will be to evangelize the use of GitHub’s platform as well as open source in government, in general. The latter will come naturally to him, given how he and the other Presidential Innovation Fellows approached their work.

“Virtually everything the Presidential Innovation Fellows touched was open sourced,” said Balter when I interviewed him earlier this week. “That’s everything from better IT procurement software to internal tools that we used to streamline paperwork. Even more important, much of that development (particularly RFPEZ) happened entirely in the open. We were taking the open source ethos and applying it to how government solutions were developed, regardless whether or not the code was eventually public. That’s a big shift.”

Balter is a proponent of social coding in the open as a means of providing some transparency to interested citizens. “You can go back and see why an agency made a certain decision, especially when tools like these are used to aid formal decision making,” he said. “That can have an empowering effect on the public.”

Forking code in city hall and beyond

There’s notable government activity beyond the Beltway as well.

The City of Chicago is now on GitHub, where chief data officer and city CIO Brett Goldstein is releasing open data as JSON files, along with open source code.

Both Goldstein and Philadelphia chief data officer Mark Headd are also laudably participating in conversations about code and data on Hacker News threads.

“Chicago has released over 400 datasets using our data portal, which is located at data.cityofchicago.org,” Headd wrote on HackerNews. While Goldstein says that the city’s portal will remain the primary way they release public sector data, publishing data on GitHub is an experiment that will be interesting to watch, in terms of whether it affects reuse.

“We hope [the datasets on GitHub] will be widely used by open source projects, businesses, or non-profits,” wrote Goldstein. “GitHub also allows an on-going collaboration with editing and improving data, unlike the typical portal technology. Because it’s an open source license, data can be hosted on other services, and we’d also like to see applications that could facilitate easier editing of geographic data by non-technical users.”

Headd is also on GitHub in a professional capacity, where he and his colleagues have been publishing code to a City of Philadelphia repository.

“We use [GitHub] to share some of our official city apps,” commented Headd on the same Hacker News thread. “These are usually simple web apps built with tools like Bootstrap and jQuery. We’ll be open sourcing more of these going forward. Not only are we interested in sharing the code for these apps, we’re actively encouraging people to fork, improve and send pull requests.”

While there’s still a long road ahead for widespread code sharing between the public and government, the economic circumstances of cities and agencies could create the conditions for more code sharing inside government. In a TED Talk last year, Clay Shirky suggested that adopting open source methods for collaboration could even transform government.

A more modest (although still audacious) goal would be to simply change how government IT is done.

“I’ve often said, the hardest part of being a software developer is training yourself to Google the problem first and see if someone else has already solved it,” said Balter during our interview. “I think we’re going to see government begin to learn that lesson, especially as budgets begin to tighten. It’s a relative ‘app store’ of technology solutions just waiting to be used or improved upon. That’s the first step: rather than going out to a contractor and reinventing the wheel each time, it’s training ourselves that we’re part of a larger ecosystem and to look for prior art. On the flip side, it’s about contributing back to that commons once the problem has been solved. It’s about realizing you’re part of a community. We’re quickly approaching a tipping point where it’s going to be easier for government to work together than alone. All this means that a taxpayer’s dollar can go further, do more with less, and ultimately deliver better citizen services.”

Some people may understandably bridle at including open source code and open data under the broader umbrella of “open government,” particularly if such efforts are not balanced by adherence to good government principles around transparency and accountability.

That said, there’s reason to hail collaboration around software and data as bonafide examples of 21st century civic participation, where better platforms for social coding enable improved outcomes. The commits and pulls of staff and residents on GitHub may feel like small steps, but they represent measurable progress toward more government not just of the people, but with the people.

“Open source in government is nothing new,” said Balter. “What’s new is that we’re finally approaching a tipping point at which, for federal employees, it’s going to be easier to work together, than work apart. Whereas before, ‘open source’ often meant compiling, zipping, and uploading, when you fuse the internal development tools with the external publishing tools, and you make those tools incredibly easy to use, participating in the open source community becomes trivial. Often, it can be more painful for an agency to avoid it completely. I think we’re about to see a big uptick in the amount of open source participation, and not just in the traditional sense. Open source can be between business units within an agency. Often the left hand doesn’t know what the right is doing between agencies. The problems agencies face are not unique. Often the taxpayer is paying to solve the same problem multiple times. Ultimately, in a collaborative commons with the public, we’re working together to make our government better.”

February 01 2013

Four short links: 1 February 2013

  1. Icon Fonts are Awesome — yes, yes they are. (via Fog Creek)
  2. What the Rails Security Issue Means for Your Startup — excellent, clear, emphatic advice on how and why security matters and what it looks like when you take it seriously.
  3. The Indiepocalypse (Andy Baio) — We’re at the beginning of an indiepocalypse — a global shift in how culture is made, from a traditional publisher model to independently produced and distributed works.
  4. China, GitHub, and MITMNo browser would prevent the authorities from using their ultimate tool though: certificates signed by the China Internet Network Information Center. CNNIC is controlled by the government through the Ministry of Industry and Information Technology. They are recognized by all major browsers as a trusted Certificate Authority. If they sign a fake certificate used in a man-in-the-middle attack, no browser will warn of any usual activity. The discussion of how GitHub (or any site) could be MITM’d is fascinating, as is the pros and cons for a national security agency to coopt the certificate-signing NIC.

January 04 2013

December 12 2012

Four short links: 12 December 2012

  1. Kiwi Bond Films Are The Most Violent (Peter Griffin) — it wasn’t always furry-footed plucky adventurers in Middle Earth, my friends. Included to show that you can take an evidence-based approach to almost any argument.
  2. Are Githubbers Taking Open Source Seriously?nearly 140 of the 175 projects analyzed contain such an easily findable license information, or more precisely 78%. Or, alternatively 22% of Github projects don’t have easily findable license information. zomg. (via Simon Phipps)
  3. The Oh Shit (Matt Jones) — the condition of best-laid plans meeting reality. When all the drawings, sections, detailed drawings and meticulous sourcing in the world clash with odd corners of the physical world, weather, materials and not least the vagaries of human labour. It’s what Bryan Boyer calls the “Matter Battle”. He puts it beautifully: “One enters a Matter Battle when there is an attempt to execute the desires of the mind in any medium of physical matter.”
  4. Text Messages Direct to your Contact Lens (The Telegraph) — I want this so bad. It’s a future I can believe in. Of course, the free ones will have spam.

November 21 2012

Four short links: 21 November 2012

  1. gboom — commandline tool for making gists.
  2. Pixel Based Websites — great collection of Javascript tools for working with sprites and backgrounds.
  3. Indie Game The Movie: Case Study — lessons learned, lots of detail, about the self-publishing crowdfunding success story of this documentary. Last piece in the series busts the myth that only big name people can make it work. (via Andy Baio)
  4. Adobe Proto — tablet app for making prototypes and wireframes. (via Josh Clark)

September 21 2012

Four short links: 21 September 2012

  1. Business Intelligence on FarmsMachines keep track of all kinds of data about each cow, including the chemical properties of its milk, and flag when a particular cow is having problems or could be sick. The software can compare current data with historical patterns for the entire herd, and relate to weather conditions and other seasonal variations. Now a farmer can track his herd on his iPad without having to get out of bed, or even from another state. (via Slashdot)
  2. USAxGITHUB — monitor activity on all the US Federal Government’s github repositories. (via Sarah Milstein)
  3. Rethinking Robotics — $22k general purpose industrial robot. “‘It feels like a true Macintosh moment for the robot world,’ said Tony Fadell, the former Apple executive who oversaw the development of the iPod and the iPhone. Baxter will come equipped with a library of simple tasks, or behaviors — for example, a “common sense” capability to recognize it must have an object in its hand before it can move and release it.” (via David ten Have)
  4. Shift LabsShift Labs makes low-cost medical devices for resource-limited settings. [Crowd]Fund the manufacture and field testing of the Drip Clip [...] a replacement for expensive pumps that dose fluid from IV bags.

July 25 2012

Inside GitHub’s role in community-building and other open source advances

In this video interview, Matthew McCullough of GitHub discusses what they’ve learned over time as they grow and watch projects develop there. Highlights from the full video interview include:

  • How GitHub builds on Git’s strengths to allow more people to collaborate on a project [Discussed at the 00:30 mark]
  • The value of stability and simple URLs [Discussed at the 02:05 mark]
  • Forking as the next level of democracy for software [Discussed at the 04:02 mark]
  • The ability to build a community around a GitHub repo [Discussed at the 05:05 mark]
  • GitHub for education, and the use of open source projects for university work [Discussed at the 06:26 mark]
  • The value of line-level comments in source code [Discussed at the 09:36 mark]
  • How to be a productive contributor [Discussed at the 10:53 mark]
  • Tools for Windows users [Discussed at the 11:56 mark]

You can view the entire conversation in the following video:

Related:

July 23 2012

Four short links: 23 July 2012

  1. Unmanned Systems North America 2012 — huge tradeshow for drones. (via Directions Magazine)
  2. On Thneeds and the Death of Display Ads (John Battelle) — the video interstitial. Once anathema to nearly every publisher on the planet, this full page unit is now standard on the New York Times, Wired, Forbes, and countless other publishing sites. And while audiences may balk at seeing a full-page video ad after clicking from a search engine or other referring agent, the fact is, skipping the ad is about as hard as turning the page in a magazine. And in magazines, full page ads work for marketers. If you’d raised a kid on AdBlocker, and then at age 15 she saw the ad-filled Internet for the first time, she’d think her browser had been taken over by malware. (via Tim Bray)
  3. The Most Important Social Network: GitHubI suspect that GitHub’s servers now contain the world’s largest corpus of commentary around intellectual production.
  4. Crowdfunded Genomics — a girl with a never-before-seen developmental disorder had her exome (the useful bits of DNA) sequenced, and a never-before-seen DNA mutation found. The money for it was raised by crowdfunding. (via Ed Yong)

May 24 2012

Jon Loeliger offers some practices to use with Git

After finishing the second edition of "Version Control with Git," author Jon Loeliger talked to me about some of the advice he offers and how to use Git effectively as changes to code pile up.

Highlights from the full video interview include:

  • What's new in Git since the first edition of the book? [Discussed at the 0:38 mark]
  • Importance of understanding concepts behind Git [Discussed at the 2:40 mark]
  • How to manage complicated branching [Discussed at the 3:33 mark]
  • Aspects of Github beyond storage [Discussed at the 6:22 mark]

You can view the entire conversation in the following video:

OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.

Save 20% on registration with the code RADAR

Related:

April 09 2012

The Consumer Financial Protection Bureau shares code built for the people with the people

Editor's Note: This guest post is written by Matthew Burton, the acting deputy chief information officer of the Consumer Financial Protection Bureau (@CFPB). The quiet evolution in government IT has been a long road, with many forks. In the original version of this piece, published on the CFPB's blog, Burton needed to take the time to explain what open source software is because many people in government and citizens in the country still don't understand it, unlike readers here at Radar. That's why the post below includes a short section outlining the basics of open source. — Alex Howard.


The Consumer Financial Protection Bureau (CFPB) was fortunate to be born in the digital era. We've been able to rethink many of the practices that make financial products confusing to consumers and certain regulations burdensome for businesses. We've also been able to launch the CFPB with a state-of-the-art technical infrastructure that's more stable and more cost-effective than an equivalent system was just 10 years ago.

Many of the things we're doing are new to government, which has made them difficult to achieve. But the hard part lies ahead. While our current technology is great, those of us on the CFPB's Technology & Innovation team will have failed if we're still using the same tools 10 years from now. Our goal is not to tie the Bureau to 2012's technology, but to create something that stays modern and relevant — no matter the year.

Good internal technology policies can help, especially the policy that governs our use of software source code. We are unveiling that policy today.

Source code is the set of instructions that tells software how to work. This is distinct from data, which is the content that a user inputs into the software. Unlike data, most users never see software source code; it works behind the scenes while the users interact with their data through a more intuitive, human-friendly interface.

Some software lets users modify its source code, so that they can tweak the code to achieve their own goals if the software doesn't specifically do what users want. Source code that can be freely modified and redistributed is known as "open-source software," and it has been instrumental to the CFPB's innovation efforts for a few reasons:

  • It is usually very easy to acquire, as there are no ongoing licensing fees. Just pay once, and the product is yours.
  • It keeps our data open. If we decide one day to move our website to another platform, we don't have to worry about whether the current platform is going to keep us from exporting all of our data. (Only some proprietary software keeps its data open, but all open source software does so.)
  • It lets us use tailor-made tools without having to build those tools from scratch. This lets us do things that nobody else has ever done, and do them quickly.

Until recently, the federal government was hesitant to adopt open-source software due to a perceived ambiguity around its legal status as a commercial good. In 2009, however, the Department of Defense made it clear that open source software products are on equal footing with their proprietary counterparts.

We agree, and the first section of our source code policy is unequivocal: We use open-source software, and we do so because it helps us fulfill our mission.

Open-source software works because it enables people from around the world to share their contributions with each other. The CFPB has benefited tremendously from other people's efforts, so it's only right that we give back to the community by sharing our work with others.

This brings us to the second part of our policy: When we build our own software or contract with a third party to build it for us, we will share the code with the public at no charge. Exceptions will be made when source code exposes sensitive details that would put the Bureau at risk for security breaches; but we believe that, in general, hiding source code does not make the software safer.

We're sharing our code for a few reasons:

  • First, it is the right thing to do: the Bureau will use public dollars to create the source code, so the public should have access to that creation.
  • Second, it gives the public a window into how a government agency conducts its business. Our job is to protect consumers and to regulate financial institutions, and every citizen deserves to know exactly how we perform those missions.
  • Third, code sharing makes our products better. By letting the development community propose modifications , our software will become more stable, more secure, and more powerful with less time and expense from our team. Sharing our code positions us to maintain a technological pace that would otherwise be impossible for a government agency.

The CFPB is serious about building great technology. This policy will not necessarily make that an easy job, but it will make the goal achievable.

Our policy is available in three formats: HTML, for easy access; PDF, for good presentation; and as a GitHub Gist, which will make it easy for other organizations to adopt a similar policy and will allow the public to easily track any revisions we make to the policy.

If you're a coder, keep an eye on our GitHub account. We'll be releasing code for a few projects in the coming weeks.

Related:

October 04 2011

Four short links: 4 October 2011

  1. jfdi.asia -- Singaporean version of TechStars, with 100-day program ("the bootcamp") Jan-Apr 2012. Startups from anywhere in the world can apply, and will want to because Singapore is the gateway to Asia. They'll also have mentors from around the world.
  2. Oracle NoSQLdb -- Oracle want to sell you a distributed key-value store. It's called "Oracle NoSQL" (as opposed to PostgreSQL, which is SQL No-Oracle). (via Edd Dumbill)
  3. Facebook Browser -- interesting thoughts about why the browser might be a good play for Facebook. I'm not so sure: browsers don't lend themselves to small teams, and search advertising doesn't feel like a good fit with Facebook's existing work. Still, making me grumpy again to see browsers become weapons again.
  4. Bitbucket -- a competitor to Github, from the folks behind the widely-respected Jira and Confluence tools. I'm a little puzzled, to be honest: Github doesn't seem to have weak spots (the way, for example, that Sourceforge did).

July 07 2011

Four short links: 7 July 2011

  1. Commodore 64 PC -- gorgeous retro look with fairly zippy modern internals. (via Rob Passarella)
  2. Designing Github for Mac -- a retrospective from the author of the excellent Mac client for github. He talks about what he learned and its origins, design, and development. Remember web development in 2004? When you had to create pixel-perfect comps because every element on screen was an image? That’s what developing for Cocoa is. Drawing in code is slow and painful. Images are easier to work with and result in more performant code. Remember these days? This meant my Photoshop files had to be a lot more fleshed out than I’ve been accustomed to in recent years. I usually get about 80% complete in Photoshop (using tons of screenshotting & layer flattening), then jump into code and tweak to completion. But with Cocoa, I ended up fleshing out that last 20% in Photoshop.
  3. Feedback Loops (Wired) -- covers startups and products that use feedback loops to help us change our behaviour. The best sort of delivery device “isn’t cognitively loading at all,” he says. “It uses colors, patterns, angles, speed—visual cues that don’t distract us but remind us.” This creates what Rose calls “enchantment.” Enchanted objects, he says, don’t register as gadgets or even as technology at all, but rather as friendly tools that beguile us into action. In short, they’re magical. (via Joshua Porter)
  4. continuous.io -- hosted continuous integration. (via Jacob Kaplan-Moss)

April 18 2011

Four short links: 18 April 2011

  1. Your Community is Your Best Feature -- Gina Trapani's CodeConf talk: useful, true, and moving. There's not much in this world that has all three of those attributes.
  2. Metrics Everywhere -- another CodeConf talk, this time explaining Yammer's use of metrics to quantify the actual state of their operations. Nice philosophical guide to the different ways you want to measure things (gauges, counters, meters, histograms, and timers). I agree with the first half, but must say that it will always be an uphill battle to craft a panegyric that will make hearts and minds soar at the mention of "business value". Such an ugly phrase for such an important idea. (via Bryce Roberts)
  3. On Earthquakes in Tokyo (Bunnie Huang) -- Personal earthquake alarms are quite popular in Tokyo. Just as lightning precedes thunder, these alarms give you a few seconds warning to an incoming tremor. The alarm has a distinct sound, and this leads to a kind of pavlovian conditioning. All conversation stops, and everyone just waits in a state of heightened awareness, since the alarm can’t tell you how big it is—it just tells you one is coming. You can see the fight or flight gears turning in everyone’s heads. Some people cry; some people laugh; some people start texting furiously; others just sit and wait. Information won't provoke the same reaction in everyone: for some it's impending doom, for others another day at the office. Data is not neutral; it requires interpretation and context.
  4. AccentuateUs -- Firefox plugin to Unicodify text (so if you type "cafe", the software turns it into "café"). The math behind it is explained on the dataists blog. There's an API and other interfaces, even a vim plugin.

April 07 2011

Four short links: 7 April 2011

  1. The Freight Train That is Android -- Google’s aim is defensive not offensive. They are not trying to make a profit on Android or Chrome. They want to take any layer that lives between themselves and the consumer and make it free (or even less than free). [...] In essence, they are not just building a moat; Google is also scorching the earth for 250 miles around the outside of the castle to ensure no one can approach it. (via Fred Wilson)
  2. Group Think (New York Magazine) -- Big Idea tomes typically pull promiscuously from behavioral economics, cognitive science, and evolutionary psychology. They coin phrases the way Zimbabwe prints bills. They relish upending conventional wisdom: Not thinking becomes thinking, everything bad turns out to be good, and the world is—go figure—flat. (With Gladwell’s Blink, this mania for the counterintuitive runs top-speed into a wall, crumples to the ground, and stares dizzily at the little birds circling overhead. This is, let me remind you, a best-selling book about the counterintuitive importance of thinking intuitively.) A piercing take on pop science/fad management books.
  3. Product Design at GitHub -- Every employee at GitHub is a product designer. We only hire smart people we trust to make our product better. We don’t have managers dictating what to work on. We don’t require executive signoff to ship features. Executives, system administrators, developers, and designers concieve, ship, and remove features alike. (via Simon Willison)
  4. Linus on Android Headers Claims -- "seems totally bogus". I blogged the Android headers claim earlier, have been meaning to run this rather definitive "ignore it, it was noise" note. Apologies for showing you crap that was wrong: that's why I try not to show weather-report "news", but to find projects that illustrate trends.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl