Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

August 12 2011

Four short links: 12 August 2011

  1. Hippocampus Text Adventure -- written as an exercise in learning Python, you explore the hippocampus. It's simple, but I like the idea of educational text adventures. (Well, educational in that you learn about more than the axe-throwing behaviour of the cave-dwelling dwarf)
  2. Pandas -- BSD-licensed Python data analysis library.
  3. Building Lanyrd -- Simon Willison's talk (with slides) about the technology under Lanyrd and the challenges in building with and deploying it.
  4. Electronic Skin Monitors Heart, Brain, and Muscles (Discover Magazine blogs) -- this is freaking awesome proof-of-concept. Interview with the creator of a skin-mounted sensor, attached like a sticker, is flexible, inductively powered, and much more. This represents a major step forward in possibilities for personal data-gathering. (via Courtney Johnston)

August 08 2011

Four short links: 8 August 2011

  1. Bulbflow -- a Python framework for graph databases: it's like an ORM for graphs. (via Joshua Schachter)
  2. Nomograms -- the lost art of graphical computing. (via John D Cook)
  3. Web Intents -- adding Android-style Intents to the web. Services register their intention to be able to handle an action on the user's behalf. Applications request to start an Action of a certain verb (share, edit, view, pick etc) and the system will find the appropriate Services for the user to use based on the user's preference.
  4. Finagle (GitHub) -- Twitter's asynchronous network stack for the JVM that you can use to build asynchronous Remote Procedure Call (RPC) clients and servers in Java, Scala, or any JVM-hosted language. Finagle provides a rich set of tools that are protocol independent.

May 17 2011

Four short links: 17 May 2011

  1. Sorting Out 9/11 (New Yorker) -- the thorniest problem for the 9/11 memorial was the ordering of the names. Computer science to the rescue!
  2. Tagger -- Python library for extracting tags (statistically significant words or phrases) from a piece of text.
  3. Free Science, One Paper at a Time (Wired) -- Jonathan Eisen's attempt to collect and distribute his father's scientific papers (which were written while a federal employee, so in the public domain), thwarted by old-fashioned scientific publishing. “But now,” says Jonathan Eisen, “there’s this thing called the Internet. It changes not just how things can be done but how they should be done.”
  4. Internet Archive Launches Physical Archive -- I'm keen to see how this develops, because physical storage has problems that digital does not. I'd love to see the donor agreement require the donor to give the archive full rights to digitize and distribute under open licenses. That'd put the Internet Archive a step in front of traditional archives, museums, libraries, and galleries, whose donor agreements typically let donors place arbitrary specifications on use and reuse ("must be inaccessible for 50 years", "no commercial use", "no use that compromises the work", etc.), all of which are barriers to wholesale digitization and reuse.

April 25 2011

Four short links: 25 April 2011

  1. E-Referral Evaluation Interim Findings -- in general good, but note this: The outstanding system issues are an ongoing source of frustration and concern, including [...] automated data uptake from the GP [General Practitioner=family doctor] PMS [Patient Management System], that sometimes has clearly inaccurate or contradictory information. When you connect systems, you realize the limitations of the data in them.
  2. c64iphone (GitHub) -- the source to an iPhone/iPad app from the store, released under GPLv3. It incorporates the Frodo emulator. Sweet Freedom.
  3. mlpy -- machine learning Python library, a high-performance Python package for predictive modeling. It makes extensive use of NumPy to provide fast N-dimensional array manipulation and easy integration of C code. (via Joshua Schachter)
  4. What is The Truth Behind 9 Out of 10 Startups Fail? (Quora) -- some very interesting pointers and statistics, such as Hall and Woodward (2007) analyze a dataset of all VC-backed firms and show the highly skewed distribution of outcomes. VC revenue averages $5 million per VC-backed company. Founding team averages $9 million per VC-backed company (most from small probability of great success). The economically rational founding team would sell at time of VC funding for $900,000 to avoid the undiversified risk. (via Hacker News)

March 21 2011

Four short links: 21 March 2011

  1. Javascript Trie Performance Analysis (John Resig) -- if you program in Javascript and you're not up to John's skill level (*cough*) then you should read this and follow along. It's a ride-along in the brain of a master.
  2. Think Stats -- an introduction to statistics for Python programmers. (via Edd Dumbill)
  3. Bolefloor -- they build curvy wooden floors. Instead of straightening naturally curvy wood (which is wasteful), they use CV and CAD/CAM to figure the smallest cuts to slot strips of wood together. It's gorgeous, green, and geeky. (via BoingBoing)
  4. Extracting Article Text from HTML Documents -- everyone's doing it, now you know how. It's the theory behind the lovingly hand-crafted magic of readability. (via Hacker News)

March 17 2011

March 02 2011

Four short links: 2 March 2011

  1. Unicode in Python, Completely Demystified -- a good introduction to Unicode in Python, which helped me with some code. (via Hacker News)
  2. A Ban on Brain-Boosting Drugs (Chronicle of Higher Education) -- Simply calling the use of study drugs "unfair" tells us nothing about why colleges should ban them. If such drugs really do improve academic performance among healthy students (and the evidence is scant), shouldn't colleges put them in the drinking water instead? After all, it would be unfair to permit wealthy students to use them if less privileged students can't afford them. As we start to hack our bodies and minds, we'll face more questions about legitimacy and ethics of those actions. Not, of course, about using coffee and Coca-Cola, ubiquitous performance-enhancing stimulants that are mysteriously absent from bans and prohibitions.
  3. Copywrongs -- Matt Blaze spits the dummy on IEEE and ACM copyright policies. In particular, the IEEE is explicitly preventing authors from distributing copies of the final paper. We write scientific papers first and last because we want them read. When papers were disseminated solely in print form it might have been reasonable to expect authors to donate the copyright in exchange for production and distribution. Today, of course, this model seems, at best, quaintly out of touch with the needs of researchers and academics who no longer desire or tolerate the delay and expense of seeking out printed copies of far-flung documents. We expect to find on it on the open web, and not hidden behind a paywall, either.
  4. On the Engineering of SaaS -- An upgrade process, for example, is an entirely different beast. Making it robust and repeatable is far less important than making it quick and reversible. This is because the upgrade only every happens once: on your install. Also, it only ever has to work right in one, exact variant of the environment: yours. And while typical customers of software can schedule an outage to perform an upgrade, scheduling downtime in SaaS is nearly impossible. So, you must be able to deploy new releases quickly, if not entirely seamlessly — and in the event of failure, rollback just as rapidly.

February 02 2011

Four Short Links: 2 February 2011

  1. Seven Foundational Visualization Papers -- seven classics in the field that are cited and useful again and again.
  2. Git Immersion -- a "walking tour" of Git inspired by the premise that to know a thing is to do it. Cf Learn Python the Hard Way or even NASA's Planet Makeover. We'll see more and more tutorials that require participation because you don't get muscle memory by reading. (NASA link via BoingBoing
  3. Readability -- strips out ads and sends money to the publishers you like. I'd never thought of a business model as something that's imposed from the outside quite like this, but there you go.
  4. Quora's Technology Examined (Phil Whelan) -- In this blog post I will delve into the snippets of information available on Quora and look at Quora from a technical perspective. What technical decisions have they made? What does their architecture look like? What languages and frameworks do they use? How do they make that search bar respond so quickly? Lots of Python. (via Joshua Schachter on Delicious)

December 22 2010

Developer Year in Review: Programming Languages

Continuing our look at the year in development, let's move on to the exciting land of languages. We'll finish off next week with operating systems.

Java: Strategic asset or red-headed stepchild?

Watching Oracle's machinations around Java can be more than a little confusing. One minute, they're talking about forking it into free and commercial versions, a potential slap in the face to the open source community. Then they refused to let Apache's Harmony project have access to key testing suites to certify the Java alternative. But then Oracle ended the year on their hands and knees begging Apache to stay in the JCP (and failing).

Meanwhile, we saw yet another "that's not really Java" lawsuit. This time Oracle was suing Google over the Android implementation. Evidently, having Bill Gates and Steve Ballmer as dire enemies wasn't good enough for Larry Ellison, so he's trying to add Sergey Brin and Larry Page to his list as well.

On a side note, has anyone noticed how Java basically took over the mobile space? Of the three major smartphone platforms (sorry Windows, you have a ways to go before you make that list again ...), two of them run Java of some sort. If you add in J2ME, which is inside many of the "clamshell" phones, Java is the dominant player in mobile.

It was also a good year for the JVM, as JVM-powered languages such as Closure, Groovy and Scala leveraged the omnipresence of Java to gain traction.

I see your 8 cores, and raise you 8

Functional programming considers to gain in popularity in the years ahead, mainly as programmers try to come to terms with how to leverage all the multi-threaded power available to them in modern hardware. Along with the aforementioned Scala, Erlang and Haskell have also seen commercial deployments increase.

Francesco Cesarini gave a great talk at OSCON on how Erlang can help developers. Unfortunately, there was no transcript, because it had no side effects. (Trust me, the functional programmers in the readership are falling over laughing.)

In other language news ...

Perl: Perl 6 still lags "Duke Nukem Forever" as far as being promised software still awaiting final shipment, but only by three years.

PHP: With adding PHP to their language arsenal, you can now run PHP on all the major cloud-based platforms (the others being Amazon, Windows and Google.)

Ruby: No new major version of Ruby this year, nor any earth-shattering news, but it continues to be the language that all the cool kids use.

Python: Release 3.2 is on track for a Q1 2011 release. "Python" is also a lousy word to put into a Google News search, unless you enjoy reading about people smuggling snakes through customs and DPW workers making unexpected discoveries in sewers.

That's it for this week. I'll take a look at the year in operating systems in the next edition. Suggestions are always welcome, so please send tips or news here.


December 09 2010

Strata Gems: Make beautiful graphs of your Twitter network

We're publishing a new Strata Gem each day all the way through to December 24. Yesterday's Gem: Explore and visualize graphs with Gephi.

Strata 2011 Where better to start analyzing social networks than with your own? Using the graphing tool Gephi and a little bit of Python script, you can analyze your own Twitter network, revealing the inherent structure among those you follow. It's also a fun way to learn more about network analysis.

Inspired by the LinkedIn Gephi graphs, I analyzed my Twitter friend network. I took everybody that I followed on Twitter, and found out who among them followed each other. I've shared the Python code I used to do this on

To use the script, you need to create a Twitter application and use command-line OAuth authentication to get the tokens to plug into the script. Writing about that is a bit gnarly for this post, but the easiest way I've found to authenticate a script with OAuth is by using the oauth command-line tool that ships with the Ruby OAuth gem.

The output of my Twitter-reading tool is a graph, in GraphML, suitable for import into Gephi. The graph has a node for each person, and an edge for each "follows" relationship. On initial load into Gephi, the graph looks a bit like a pile of spider webs, not showing much information.

I wanted to show a couple of things in the graph: cluster closely related people, and highlight who are the well-connected people. To find related groups of people, you can use Gephi to analyze the modularity of the network, and then color nodes according to the discovered communities. To find the well-connected people, run the "Degree Power Law" statistic in Gephi, which will calculate the betweenness centrality for each person, which essentially computes how much of a hub they are.

These steps are neatly laid out in a great slide deck from Sociomantic Labs on analyzing Facebook social networks. Follow the tips there and you'll end up with a beautiful graph of your network that you can export to PDF from Gephi.

Social graph
Overview of my social graph: click to view the full PDF version

The final result for my network is shown above. If you download the full PDF, you'll notice there are several communities, which I'll explain for interest. The mass of pink is predominantly my O'Reilly contacts, dark green shows the Strata and data community, the lime green the Mono and GNOME worlds, mustard shows the XML and open source communities. The balance of purple is assorted technologist friends.

Finally my sporting interests are revealed: the light blue are cricket fans and commentators, the red Formula 1 motor racing. Unsurprisingly, Tim O'Reilly, Stephen Fry and Miguel de Icaza are big hubs in my network. Your own graphs will reveal similar clusters of people and interests.

If this has whetted your appetite, you can discover more about mining social networks at Matthew Russell's Strata session, Unleashing Twitter Data For Fun And Insight.

September 30 2010

Four short links: 30 September 2010

  1. Learn Python The Hard Way -- Zed Shaw's book on programming Python, written as 52 exercises: Each exercise is one or two pages and follows the exact same format. You type each one in (no copy-paste!), make it run, do the extra credit, and then move on. If you get stuck, at least type it in and skip the extra credit for later. This is brilliant—you learn by doing, and this book is all doing.
  2. When The Revolution Comes They Won't Recognize it (Anil Dash) -- nails the importance of Makers. Dale Dougherty and the dozens of others who have led Maker Faire, and the culture of "making", are in front of a movement of millions who are proactive about challenging the constrictions that law and corporations are trying to place on how they communicate, create and live. The lesson that simply making things is a radical political act has enormous precedence in political history.
  3. Truthy -- project tracking suspicious memes on Twitter.
  4. UK Open Government License -- standard license for open government information in the UK.

September 20 2010

Four short links: 20 September 2010

  1. The Tracks of Bizarre Robot Traders (The Atlantic) -- I love the idea that these mysterious effect-less trades might simply be there to slow down competitors' analytic systems because every millisecond matters.
  2. MS Paint Adventures -- a weird mashup of MS Paint and text adventure games.
  3. tablib -- a format-agnostic tabular dataset library for Python. (via joshua on delicious)
  4. Password Reuse (XKCD) -- so very true.

August 04 2010

Four short links: 4 August 2010

  1. FuXi -- Python-based, bi-directional logical reasoning system for the semantic web from the folks at the Open Knowledge Foundation. (via About Inferencing)
  2. Harness the Power of Being an Internet -- I learn by trying to build something, there's no other way I can discover the devils-in-the-details. Unfortunately that's an incredibly inefficient way to gain knowledge. I basically wander around stepping on every rake in the grass, while the A Students memorize someone else's route and carefully pick their way across the lawn without incident. My only saving graces are that every now and again I discover a better path, and faced with a completely new lawn I have an instinct for where the rakes are.
  3. Stack Overflow's Curated Folksonomy -- community-driven tag synonym system to reduce the chaos of different names for the same thing. (via Skud)
  4. Image Deblurring using Inertial Measurement Sensors (Microsoft Research) -- using Arduino to correct motion blur. (via Jon Oxer)

June 02 2010

Four short links: 2 June 2010

  1. Wikileaks Launched on Stolen Documents (Wired) -- Wired claims the first set of documents was obtained by running a Tor node that users connected to ("exit node") and saving the plaintext that was sent to the users, without their knowledge. Reminds me of the adage that nothing big in Silicon Valley starts without being some degree of evil first: YouTube turning a blind eye to copyright infringement, Facebook games and spam, etc.
  2. VC Investments in Education -- Cleantech investors are chasing a 3x larger market than Education and yet are putting 50-60x the money to work chasing those returns.
  3. Cells: A Massively Multi-Agent Python Programming Game -- a sweet-looking update on the old Core War game.
  4. Google IO 2010 Session Videos Online -- I'm keen to learn more about BigData and Prediction APIs, which seem to me an eminently sensible move by Google to play to their strengths.

May 10 2010

Four short links: 10 May 2010

  1. zxing -- barcode library for iPhone, Android, Java, and more.
  2. Guido's Python -- how the compiler and interpreter see your Python programs. It wasn't until I had this level of knowledge of Perl that I really know what the hell I was doing. (via Hacker News)
  3. UK Election Data -- this was posted on the eve of the UK election and talks about the new data they had this election. There's been a lot of talk about Internet use by candidates to whip up votes, and by government to boost citizens, but this is data that helps citizens decide who to vote for. Very cool.
  4. Why We Should Learn the Language of Data (Wired) -- We often say, rightly, that literacy is crucial to public life: If you can’t write, you can’t think. The same is now true in math. Statistics is the new grammar. (via imran on Twitter)

March 05 2010

Four short links: 5 March 2010

  1. Rapportive -- a simple social CRM built into Gmail. They replace the ads in Gmail with photos, bio, and info from social media sites. (via ReadWrite Web)
  2. Best Practices in Web Development with Django and Python -- great set of recommendations. (via Jon Udell's article on checklists)
  3. Think Like a Statistician Without The Math (Flowing Data) -- Finally, and this is the most important thing I've learned, always ask why. When you see a blip in a graph, you should wonder why it's there. If you find some correlation, you should think about whether or not it makes any sense. If it does make sense, then cool, but if not, dig deeper. Numbers are great, but you have to remember that when humans are involved, errors are always a possibility. This is basically how to be a scientist: know the big picture, study the details to find deviations, and always ask "why".
  4. WoW Armory Data Mining -- a blog devoted to data mining on the info from the Wow Amory, which has a lot of data taken from the servers. It's baseball statistics for World of Warcraft. Fascinating! (via Chris Lewis)

February 25 2010

Four short links: 25 February 2010

  1. like python -- lets you write Python in Valleygirl, LOLCAT, fratboy, and rap. Still not a handle on writing Perl in Latin. (via Hacker News)
  2. Belief In Climate Change Hinges On Worldview (NPR) -- applicable beyond climate change. Whether you get what you want depends on how it's framed and how it's delivered. The paper cited is available for PDF download.
  3. gheat -- add a heatmap layer to a Google Map. For more on its design and implementation, read Chad Whitacre's blog.
  4. TrueType VT220 Font -- turns out it's not as simple as a straight bitmap. This article explains how scanline gaps and a dot-stretching circuit create the look we old-timers remember. (via rgs on Delicious)

January 01 2010

Four short links: 1 January 2010

  1. Measuring Type -- clever way to measure which font uses more ink.
  2. Vowpal Wabbit -- fast learning software from Yahoo! Research and Hunch. Code available in git. (via zecharia on Delicious)
  3. Literature Review on Indexing Time-Series Data -- a graduate student's research work included this literature review of papers on indexing time-series data. (via jpatanooga on Delicious)
  4. igraph -- programming library for manipulating graph data, with the usual algorithms (minimum spanning tree, network flow, cliques, etc.) available in R, Python, and C.

December 29 2009

Four short links: 29 December 2009

  1. Turning The Page Online -- historic science books in high-resolution online. Hookes Micrografia was the first view of the microscopic world, and his astonishingly detailed and beautiful illustrations are there to view and print.
  2. Detailed Psychology of Trolls -- You might be surprised to learn that Trolls readily engage in long debates with fellow Trolls - people, that is, whom they know to be perverse and cunning conversation hackers. Apparently, this does not detract them from wasting hours on fruitless debates that are blatantly rigged and full of sophistry. Few Trolls would be happy with debating only fellow Trolls (semi-literate teenagers and hard-boiled fundamentalists are so much tastier - even though they, too, might be trolling you). Yet most of them, every once in a while, enjoy having an absurd argument with another pig-head. Good on the "know your enemy" basis. (via MindHacks)
  3. Theme Issue -- a Royal Society publication ran a special open access issue focusing on "personal perspectives of the life sciences", where top scientists write about what they think is important. It's good to see more toes dipped into open access, but I'd love to see more journals (particularly those of professions and associations) move to an entirely open access model. (via SciBlogs)
  4. Invent Your Own Computer Games with Python (2ed) -- free ebook that teaches how to program in Python, using games as the motivating examples. Nominally for 10-12 year old children, but (naturally) accessible to adults too. I have not read it, but approve of the attempt.

December 08 2009

Four short links: 8 December 2009

  1. Python's Moratorium -- Python language designers have declared a moratorium on enhancement proposals (feature requests) while the world's Python programmers get used to the last batch of New And Shiny they shipped. I'm reasonably sure that the ALGOL designers went through exactly the same discussions, and I know Perl did too. So, don’t be afraid of it - don’t think that Python is evolutionarily dead - it’s not. We’re taking a stability and adoption break, a breather. We’re doing this to help users and developers, not to just be able to say “no” to every random idea sent to python-ideas, and not because we’re done. Reminds me of Perl god Jarkko Hietaniemi's signature file: "There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen.
  2. This Week's Finds in Mathematical Physics -- I can't meaningfully contribute to the math, but golly them pictures are purty! (via Hacker News)
  3. x86 Assembly Encounter -- To use a construction industry metaphor, an average x86 assembler has the complexity and usefulness of a hammer, while the DSP world is using high-speed mag-rail blast-o-matic nail guns with automatic feeders and superconducting magnets. [...] I find it ridiculous that the most popular computing platform in the world does not have a decent assembler. What’s even worse, from the discussions I’ve seen on the net, people are mostly interested in how fast the assembler is (?!) rather than how much time it saves the programmer. (via Hacker News)
  4. Finding Tennis Courts in Aerial Photos -- more hacking with computer vision techniques and publicly-available data. This is going to lead to good things (and some unpleasant surprises, as that which was formerly "too hard to find" ceases to be so). (via Simon Willison)

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!