January 15 2014

Four short links: 15 January 2014

  1. Hackers Gain ‘Full Control’ of Critical SCADA Systems (IT News) — The vulnerabilities were discovered by Russian researchers who over the last year probed popular and high-end ICS and supervisory control and data acquisition (SCADA) systems used to control everything from home solar panel installations to critical national infrastructure. More on the Botnet of Things.
  2. mclMarkov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs (also known as networks) based on simulation of (stochastic) flow in graphs.
  3. Facebook to Launch Flipboard-like Reader (Recode) — what I’d actually like to see is Facebook join the open web by producing and consuming RSS/Atom/anything feeds, but that’s a long shot. I fear it’ll either limit you to whatever circle-jerk-of-prosperity paywall-penetrating content-for-advertising-eyeballs trades the Facebook execs have made, or else it’ll be a leech on the scrotum of the open web by consuming RSS without producing it. I’m all out of respect for empire-builders who think you’re a fool if you value the open web. AOL might have died, but its vision of content kings running the network is alive and well in the hands of Facebook and Google. I’ll gladly post about the actual product launch if it is neither partnership eyeball-abuse nor parasitism.
  4. Map Projections Illustrated with a Face (Flowing Data) — really neat, wish I’d had these when I was getting my head around map projections.

December 16 2013

Four short links: 17 December 2013

  1. WebGraph a framework for graph compression aimed at studying web graphs. It provides simple ways to manage very large graphs, exploiting modern compression techniques. (via Ben Lorica)
  2. Learn to Program with Minecraft PluginsYou’ll need to add features to the game itself: learn how to build plugins for your own Minecraft server using the Java programming language. You don’t need to know anything about programming to get started—-this book will teach you everything you need to know! Shameless Christmas stocking bait! (via Greg Borenstein)
  3. In Search of Perfection, Young Adults Turn to Adderall at Work (Al Jazeera) — “Adderall is just the tip of the iceberg,” Essig said. “There are lots more drugs coming down the pike. The way we set up our cultural model for dealing with psychologically performance-enhancing drugs is a real serious question.”
  4. Explain Shell — uses parsed manpages to explain a shell commandline. (via Tracy K Teal)

December 06 2013

Four short links: 6 December 2013

  1. Society of Mind — Marvin Minsky’s book now Creative-Commons licensed.
  2. Collaboration, Stars, and the Changing Organization of Science: Evidence from Evolutionary BiologyThe concentration of research output is declining at the department level but increasing at the individual level. [...] We speculate that this may be due to changing patterns of collaboration, perhaps caused by the rising burden of knowledge and the falling cost of communication, both of which increase the returns to collaboration. Indeed, we report evidence that the propensity to collaborate is rising over time. (via Sciblogs)
  3. As Engineers, We Must Consider the Ethical Implications of our Work (The Guardian) — applies to coders and designers as well.
  4. Eyewire — a game to crowdsource the mapping of 3D structure of neurons.

July 05 2013

Four short links: 5 July 2013

  1. Quantitative Analysis of the Full Bitcoin Transaction Graph (PDF) — We analyzed all these large transactions by following in detail the way these sums were accumulated and the way they were dispersed, and realized that almost all these large transactions were descendants of a single transaction which was carried out in November 2010. Finally, we noted that the subgraph which contains these large transactions along with their neighborhood has many strange looking structures which could be an attempt to conceal the existence and relationship between these transactions, but such an attempt can be foiled by following the money trail in a su*ciently persistent way. (via Alex Dong)
  2. Majority of Gamers Today Can’t Finish Level 1 of Super Mario Bros — Nintendo test, and the President of Nintendo said in a talk, We watched the replay videos of how the gamers performed and saw that many did not understand simple concepts like bottomless pits. Around 70 percent died to the first Goomba. Another 50 percent died twice. Many thought the coins were enemies and tried to avoid them. Also, most of them did not use the run button. There were many other depressing things we noted but I can not remember them at the moment. (via Beta Knowledge)
  3. Bloat-Aware Design for Big Data Applications (PDF) — (1) merging and organizing related small data record objects into few large objects (e.g., byte buffers) instead of representing them explicitly as one-object-per-record, and (2) manipulating data by directly accessing buffers (e.g., at the byte chunk level as opposed to the object level). The central goal of this design paradigm is to bound the number of objects in the application, instead of making it grow proportionally with the cardinality of the input data. (via Ben Lorica)
  4. Poderopedia (Github) — originally designed for investigative journalists, the open src software allows you to create and manage entity profile pages that include: short bio or summary, sheet of connections, long newsworthy profiles, maps of connections of an entity, documents related to the entity, sources of all the information and news river with external news about the entity. See the announcement and website.

May 13 2013

Four short links: 13 May 2013

  1. Exploiting a Bug in Google Glass — unbelievably detailed and yet easy-to-follow explanation of how the bug works, how the author found it, and how you can exploit it too. The second guide was slightly more technical, so when he returned a little later I asked him about the Debug Mode option. The reaction was interesting: he kind of looked at me, somewhat confused, and asked “wait, what version of the software does it report in Settings”? When I told him “XE4″ he clarified “XE4, not XE3″, which I verified. He had thought this feature had been removed from the production units.
  2. Probability Through Problems — motivating problems to hook students on probability questions, structured to cover high-school probability material.
  3. Connbox — love the section “The importance of legible products” where the physical UI interacts seamless with the digital device … it’s glorious. Three amazing videos.
  4. The Index-Based Subgraph Matching Algorithm (ISMA): Fast Subgraph Enumeration in Large Networks Using Optimized Search Trees (PLoSONE) — The central question in all these fields is to understand behavior at the level of the whole system from the topology of interactions between its individual constituents. In this respect, the existence of network motifs, small subgraph patterns which occur more often in a network than expected by chance, has turned out to be one of the defining properties of real-world complex networks, in particular biological networks. [...] An implementation of ISMA in Java is freely available.

April 01 2013

Four short links: 29 March 2013

  1. Titan 0.3 Out — graph database now has full-text, geo, and numeric-range index backends.
  2. Mozilla Security Community Do a Reddit AMA — if you wanted a list of sharp web security people to follow on Twitter, you could do a lot worse than this.
  3. Probabilistic Programming and Bayesian Methods for Hackers (Github) — An introduction to Bayesian methods + probabilistic programming in data analysis with a computation/understanding-first, mathematics-second point of view. All in pure Python. See also Why Probabilistic Programming Matters and Trends to Watch: Logic and Probabilistic Programming. (via Mike Loukides and Renee DiRestra)
  4. Open Source 3D-Printable Optics Equipment (PLOSone) — This study demonstrates an open-source optical library, which significantly reduces the costs associated with much optical equipment, while also enabling relatively easily adapted customizable designs. The cost reductions in general are over 97%, with some components representing only 1% of the current commercial investment for optical products of similar function. The results of this study make its clear that this method of scientific hardware development enables a much broader audience to participate in optical experimentation both as research and teaching platforms than previous proprietary methods.

November 23 2012

Four short links: 23 November 2012

  1. Trap Island — island on most maps doesn’t exist.
  2. Why I Work on Non-Partisan Tech (MySociety) — excellent essay. Obama won using big technology, but imagine if that effort, money, and technique were used to make things that were useful to the country. Political technology is not gov2.0.
  3. 3D Printing Patent Suits (MSNBC) — notable not just for incumbents keeping out low-cost competitors with patents, but also (as BoingBoing observed) Many of the key patents in 3D printing start expiring in 2013, and will continue to lapse through ’14 and ’15. Expect a big bang of 3D printer innovation, and massive price-drops, in the years to come. (via BoingBoing)
  4. GraphChican run very large graph computations on just a single machine, by using a novel algorithm for processing the graph from disk (SSD or hard drive). Programs for GraphChi are written in the vertex-centric model, proposed by GraphLab and Google’s Pregel. GraphChi runs vertex-centric programs asynchronously (i.e changes written to edges are immediately visible to subsequent computation), and in parallel. GraphChi also supports streaming graph updates and removal of edges from the graph.

August 10 2011

August 08 2011

Four short links: 8 August 2011

  1. Bulbflow -- a Python framework for graph databases: it's like an ORM for graphs. (via Joshua Schachter)
  2. Nomograms -- the lost art of graphical computing. (via John D Cook)
  3. Web Intents -- adding Android-style Intents to the web. Services register their intention to be able to handle an action on the user's behalf. Applications request to start an Action of a certain verb (share, edit, view, pick etc) and the system will find the appropriate Services for the user to use based on the user's preference.
  4. Finagle (GitHub) -- Twitter's asynchronous network stack for the JVM that you can use to build asynchronous Remote Procedure Call (RPC) clients and servers in Java, Scala, or any JVM-hosted language. Finagle provides a rich set of tools that are protocol independent.

July 01 2011

Four short links: 1 July 2011

  1. paper.js -- The Swiss Army Knife of Vector Graphics Scripting. MIT-licensed Javascript library that gives great demo.
  2. TileMill for Processing -- gorgeous custom maps in Processing. (via FlowingData)
  3. Research Assistant Wanted -- working with one of the authors of Mind Hacks on augmenting our existing senses with a form of "remote touch" generated by using artificial distance sensors, such as ultrasound, to stimulate tactile stimulators (vibrating pads) placed against the surface of the head.. (via Vaughn Bell)
  4. GoldenORB -- a cloud-based open source project for massive-scale graph analysis, built upon best-of-breed software from the Apache Hadoop project modeled after Google’s Pregel architecture. (via BigData)

June 22 2011

Four short links: 22 June 2011

  1. DOM Snitch -- an experimental Chrome extension that enables developers and testers to identify insecure practices commonly found in client-side code. See also the introductory post. (via Hacker News)
  2. Spark -- Hadoop-alike in Scala. Spark was initially developed for two applications where keeping data in memory helps: iterative algorithms, which are common in machine learning, and interactive data mining. In both cases, Spark can outperform Hadoop by 30x. However, you can use Spark's convenient API to for general data processing too. (via Hilary Mason)
  3. Bagel -- an implementation of the Pregel graph processing framework on Spark. (via Oliver Grisel)
  4. Week 315 (Matt Webb) -- read this entire post. It will make you smarter. The company’s decisions aren’t actually the shareholders’ decisions. A company has a culture which is not the simple sum of the opinions of the people in it. A CEO can never be said to perform an action in the way that a human body can be said to perform an action, like picking an apple. A company is a weird, complex thing, and rather than attempt (uselessly) to reduce it to people within it, it makes more sense - to me - to approach it as an alien being and attempt to understand its biology and momentums only with reference to itself. Having done that, we can then use metaphors to attempt to explain its behaviour: we can say that it follows profit, or it takes an innovative step, or that it is middle-aged, or that it treats the environment badly, or that it takes risks. None of these statements is literally true, but they can be useful to have in mind when attempting to negotiate with these bizarre, massive creatures. If anyone wonders why I link heavily to BERG's work, it's because they have some incredibly thoughtful and creative people who are focused and productive, and it's Webb's laser-like genius that makes it possible. They're doing a lot of subtle new things and it's a delight and privilege to watch them grow and reflect.

March 24 2011

Four short links: 24 March 2010

  1. Digital Subscription Prices -- the NY Times in context. Aie.
  2. Trinity -- Microsoft Research graph database. (via Hacker News)
  3. Data Science Toolkit -- prepackaged EC2 image of most useful data tools. (via Pete Warden)
  4. Snappy -- Google's open sourced compression library, as used in BigTable and MapReduce. Emphasis is on speed, with resulting lack of quality in filesize (20-100% bigger than zlib).

December 09 2010

Strata Gems: Make beautiful graphs of your Twitter network

We're publishing a new Strata Gem each day all the way through to December 24. Yesterday's Gem: Explore and visualize graphs with Gephi.

Strata 2011 Where better to start analyzing social networks than with your own? Using the graphing tool Gephi and a little bit of Python script, you can analyze your own Twitter network, revealing the inherent structure among those you follow. It's also a fun way to learn more about network analysis.

Inspired by the LinkedIn Gephi graphs, I analyzed my Twitter friend network. I took everybody that I followed on Twitter, and found out who among them followed each other. I've shared the Python code I used to do this on

To use the script, you need to create a Twitter application and use command-line OAuth authentication to get the tokens to plug into the script. Writing about that is a bit gnarly for this post, but the easiest way I've found to authenticate a script with OAuth is by using the oauth command-line tool that ships with the Ruby OAuth gem.

The output of my Twitter-reading tool is a graph, in GraphML, suitable for import into Gephi. The graph has a node for each person, and an edge for each "follows" relationship. On initial load into Gephi, the graph looks a bit like a pile of spider webs, not showing much information.

I wanted to show a couple of things in the graph: cluster closely related people, and highlight who are the well-connected people. To find related groups of people, you can use Gephi to analyze the modularity of the network, and then color nodes according to the discovered communities. To find the well-connected people, run the "Degree Power Law" statistic in Gephi, which will calculate the betweenness centrality for each person, which essentially computes how much of a hub they are.

These steps are neatly laid out in a great slide deck from Sociomantic Labs on analyzing Facebook social networks. Follow the tips there and you'll end up with a beautiful graph of your network that you can export to PDF from Gephi.

Social graph
Overview of my social graph: click to view the full PDF version

The final result for my network is shown above. If you download the full PDF, you'll notice there are several communities, which I'll explain for interest. The mass of pink is predominantly my O'Reilly contacts, dark green shows the Strata and data community, the lime green the Mono and GNOME worlds, mustard shows the XML and open source communities. The balance of purple is assorted technologist friends.

Finally my sporting interests are revealed: the light blue are cricket fans and commentators, the red Formula 1 motor racing. Unsurprisingly, Tim O'Reilly, Stephen Fry and Miguel de Icaza are big hubs in my network. Your own graphs will reveal similar clusters of people and interests.

If this has whetted your appetite, you can discover more about mining social networks at Matthew Russell's Strata session, Unleashing Twitter Data For Fun And Insight.

December 08 2010

Strata Gems: Explore and visualize graphs with Gephi

We're publishing a new Strata Gem each day all the way through to December 24. Yesterday's Gem: Five data blogs you should read.

Strata 2011If you need to explore your data as a graph, then Gephi is a great place to start. An open source project, Gephi is the ideal tool for exploring data and analyzing networks.

Gephi is available for Windows, Linux and OS X. You can get started by downloading and installing Gephi, and playing with one of the example data sets.

Gephi is a sophisticated tool. A "Photoshop for data", it offers a rich palette of features, including those specialized for social network analysis.

Gephi screenshot

Graphs can be loaded and created using many common graph file formats, and explored interactively. Hierarchical graphs such as social networks can be clustered in order to extract meaning. Gephi's layout algorithms automatically give shape to a graph to help exploration, and you can tinker with the colors and layout parameters to improve communication and appearance.

Following the Photoshop metaphor, one of the most powerful aspects of Gephi is that it is extensible through plugins. Though the plugin ecosystem is just getting started, existing plugins let you export a graph for publication on the web and experiment with additional layouts. The AlchemyAPI plugin uses natural language processing to identify real world entities from graph data, and shows the promise of connecting Gephi to web services.

Earlier this year, DJ Patil from LinkedIn brought Gephi-generated graphs of LinkedIn social networks to O'Reilly's Foo Camp. Aside from importing the data, very little manipulation was needed inside Gephi. In this video he explains the social networks of several participants.

November 01 2010

