Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

January 21 2014

Four short links: 21 January 2014

  1. On Being a Senior Engineer (Etsy) — Mature engineers know that no matter how complete, elegant, or superior their designs are, it won’t matter if no one wants to work alongside them because they are assholes.
  2. Control Theory (Coursera) — Learn about how to make mobile robots move in effective, safe, predictable, and collaborative ways using modern control theory. (via DIY Drones)
  3. US Moves Towards Open Access (WaPo) — Congress passed a budget that will make about half of taxpayer-funded research available to the public.
  4. NHS Patient Data Available for Companies to Buy (The Guardian) — Once live, organisations such as university research departments – but also insurers and drug companies – will be able to apply to the new Health and Social Care Information Centre (HSCIC) to gain access to the database, called care.data. If an application is approved then firms will have to pay to extract this information, which will be scrubbed of some personal identifiers but not enough to make the information completely anonymous – a process known as “pseudonymisation”. Recipe for disaster as it has been repeatedly shown that it’s easy to identify individuals, given enough scrubbed data. Can’t see why the NHS just doesn’t make it an app in Facebook. “Nat’s Prostate status: it’s complicated.”

September 09 2013

Four short links: 11 September 2013

  1. On the NSA — intelligent unpacking of what the NSA crypto-weakening allegations mean.
  2. Overview of the 2013 OWASP Top 10 — rundown of web evil to avoid. (via Ecryption)
  3. Easy 6502 — teaches 6502 assembler, with an emulator built into the book. This is what programming non-fiction books will look like in the future.
  4. Kochiku — distributing automated test suites for faster validation in continuous integration.

June 24 2013

Four short links: 24 June 2013

  1. Reading Runes in Animal Movement (YouTube) — accessible TEDxRiverTawe 2013 talk by Professor Rory Wilson, on his work tracking movements of animals in time and space. The value comes from high-resolution time series data: many samples/second, very granular.
  2. Best Science Writing Online 2012 (Amazon) — edited collection of the best blog posts on science from 2012. Some very good science writing happening online.
  3. Designing Effective Multimedia for Physics Education (PDF) — Derek Muller’s PhD thesis, summarised as “mythbusting beats lectures, hands down”. See also his TED@Sydney talk.
  4. Melomics — royalty-free computer-generated music, all genres, for sale (genius business model). Academic spinoff from Dr. Francisco J. Vico’s work at UMA in Spain.

May 29 2013

Four short links: 29 May 2013

  1. Quick Reads of Notable New Zealanders — notable for two reasons: (a) CC-NC-BY licensed, and (b) gorgeous gorgeous web design. Not what one normally associates with Government web sites!
  2. svg.js — Javascript library for making and munging SVG images. (via Nelson Minar)
  3. Linkbot: Create with Robots (Kickstarter) — accessible and expandable modular robot. Loaded w/ absolute encoding, accelerometer, rechargeable lithium ion battery and ZigBee. (via IEEE Spectrum)
  4. The Promise and Peril of Real-Time Corrections to Political Misperceptions (PDF) — paper presenting results of an experiment comparing the effects of real-time corrections to corrections that are presented after a short distractor task. Although real-time corrections are modestly more effective than delayed corrections overall, closer inspection reveals that this is only true among individuals predisposed to reject the false claim. In contrast, individuals whose attitudes are supported by the inaccurate information distrust the source more when corrections are presented in real time, yielding beliefs comparable to those never exposed to a correction. We find no evidence of realtime corrections encouraging counterargument. Strategies for reducing these biases are discussed. So much for the Google Glass bullshit detector transforming politics. (via Vaughan Bell)

May 23 2013

Four short links: 23 May 2013

  1. Kindle Worlds Fine Print — Amazon’s fanfic publishing system has a few flaws: no pr0n, no slash (crossovers), and Amazon Publishing will acquire all rights to your new stories, including global publication rights, for the term of copyright. I can’t see this attracting pinboard’s most passionate users.
  2. XBox One Won’t Allow Indies to Self-Publish GamesWhen it comes to self-publishing, Microsoft is the odd man out. Both Sony and Nintendo allow developers to publish their own games onto PlayStation Network and Nintendo Network, respectively. Microsoft’s position stands in stark contrast to Sony, which has been aggressively pursuing indie content for PS4. (via Andy Baio)
  3. 3D Printers for Peace Competition (Michigan Tech) — We are challenging the 3D printing community to design things that advance the cause of peace. This is an open-ended contest, but if you’d like some ideas, ask yourself what Mother Theresa, Martin Luther King, or Ghandi would make if they’d had access to 3D printing. (via BoingBoing)
  4. covimCollaborative editing for vim. My dream of massively multiplayer troff can finally be realised.

April 02 2013

Four short links: 2 April 2013

  1. Analyzing mbostock’s queue.js — beautiful walkthrough of a small library, showing the how and why of good coding.
  2. What Job Would You Hire a Textbook To Do? (Karl Fisch) — notes from a Discovery Education “Beyond the Textbook” event. The issues Karl highlights for textbooks (why digital, etc.) are there for all books as we create this new genre.
  3. Neutralizing Open Access (Glyn Moody) — the publishers appear to have captured the UK group implementing the UK’s open access policy. At every single step of the way, the RCUK policy has been weakened. From being the best and most progressive in the world, it’s now considerably weaker than policies already in action elsewhere in the world, and hardly represents an increment on their 2006 policy. What’s at stake? Opportunity to do science faster, to provide source access to research for the public, and to redirect back to research the millions of pounds spent on journal subscriptions.
  4. Turn the Raspberry Pi into a VPN Server (LinuxUser) — One possible scenario for wanting a cheap server that you can leave somewhere is if you have recently moved away from home and would like to be able to easily access all of the devices on the network at home, in a secure manner. This will enable you to send files directly to computers, diagnose problems and other useful things. You’ll also be leaving a powered USB hub connected to the Pi, so that you can tell someone to plug in their flash drive, hard drive etc and put files on it for them. This way, they can simply come and collect it later whenever the transfer has finished.

February 28 2013

Four short links: 28 February 2013

  1. Myth of the Free Internet (The Atlantic) — equity of access is an important issue, but this good point is marred by hanging it off the problematic (beer? speech? downloads?) “free”. I’m on the council of InternetNZ whose mission is to protect and promote the open and uncaptureable Internet. (A concept so good we had to make up a word for it)
  2. Periodic Table of the SmartPhone (PDF, big) — from Scientific American article on Rare Earth Minerals in the Smartphone comes a link to this neat infographic showing where rare earth elements are used in the iPhone. (via Om Malik)
  3. CrazyFlie Nano Preorders19g, 9cm x 9cm, 20min charge time for 7m flight time on this nano-quadrocopter. (via Wired)
  4. Changing Scientific Publishing (The Economist) — Nature buys an alternative journal publisher (30 titles in 14 scientific fields), which comes with an 80k-member social network for scientists. Macmillan are a clever bunch. (O’Reilly runs Science Foo Camp with Macmillan’s Digital Sciences and Google)

January 18 2013

Four short links: 18 January 2013

  1. Bruce Sterling InterviewIt changed my work profoundly when I realized I could talk to a global audience on the Internet, although I was legally limited from doing that by national publishing systems. The lack of any global book market has much reduced my interest in publishing books. National systems don’t “publish” me, but rather conceal me. This especially happens to writers outside the Anglophone market, but I know a lot of them, and I’ve become sensitized to their issues. It’s one of the general issues of globalization.
  2. bAdmin — database of default usernames and passwords for popular software. (via Reddit /r/netsec)
  3. Just Post It: The Lesson from Two Cases of Fabricated Data Detected by Statistics Alone (Uri Simonsohn) — I argue that requiring authors to post the raw data supporting their published results has, among many other benefits, that of making fraud much less likely to go undetected. I illustrate this point by describing two cases of fraud I identified exclusively through statistical analysis of reported means and standard deviations. Analyses of the raw data behind these provided invaluable confirmation of the initial suspicions, ruling out benign explanations (e.g., reporting errors, unusual distributions), identifying additional signs of fabrication, and also ruling out one of the suspected fraudster’s explanations for his anomalous results. (via The Atlantic)
  4. ScriptCraft — Javascript in Minecraft. Important because All The Kids play Minecraft. (via Javascript Weekly)

January 17 2013

Four short links: 17 January 2013

  1. Free Book Sifter — lists all the free books on Amazon, has RSS feeds and newsletters. (via BoingBoing)
  2. Whom the Gods Would Destroy, They First Give Realtime Analytics — a few key reasons why truly real-time analytics can open the door to a new type of (realtime!) bad decision making. [U]ser demographics could be different day over day. Or very likely, you could see a major difference in user behavior immediately upon releasing a change, only to watch it evaporate as users learn to use new functionality. Given all of these concerns, the conservative and reasonable stance is to only consider tests that last a few days or more.
  3. Web Book Boilerplate (Github) — uses plain old markdown and generates a well structured HTML version of your written words. Since it’s sitting on top of Pandoc and Grunt, you can easily make your books available for every platform. MIT-style license.
  4. Raspberry Pi Education Manual (PDF) — from Scratch to Python and HCI all via the Raspberry Pi. Intended to be informative and a series of lessons for teachers and students learning coding with the Raspberry Pi as their first device.

January 09 2013

Four short links: 9 January 2013

  1. BitCoin in 2012, By The NumbersOver the past year Bitcoin’s value when compared to the US Dollar, and most other currencies, increased steadily, though there was a large spike and subsequent dip in August. Interestingly, the current market cap is actually at a peak for 2012, exceeding the spike in August. This can be attributed to the fact that tens of thousands of Bitcoins have been introduced into the economy since August, though now at the slower rate of 25 per block.
  2. Man-Computer Symbiosis (JCR Licklider) — In short, it seems worthwhile to avoid argument with (other) enthusiasts for artificial intelligence by conceding dominance in the distant future of cerebration to machines alone. There will nevertheless be a fairly long interim during which the main intellectual advances will be made by men and computers working together in intimate association. Fascinating to read this 1960 paper on AI and the software/hardware augmentation of human knowledge work (just as the term “knowledge worker” was coined). (via Jim Stogdill)
  3. Papyrus — simple online editor and publisher for ebooks.
  4. howdoi (github) — commandline tool to search stackoverflow and show the code that best matches your request. This is genius.

January 06 2013

Mark Twain on influence

In 1905 Mark Twain wrestled with the sort of request that many readers here have undoubtedly encountered: a new writer with the most tenuous of connections (her uncle was briefly a neighbor in a Nevada mining town) asks Twain to use his influence to get  her manuscript published.

It never hurts to carry an introduction from a well-regarded intermediary, as long as your introducer can actually speak to the quality of your work. I think of Twain’s anguished reply every time I’m asked to recommend someone or something I don’t know — or am tempted to ask the same favor of someone else.

Twain’s message is ultimately optimistic: don’t simply try to accumulate influence. Instead, come up with a good idea and sell it on its merits. The world will listen.

The full text of Twain’s essay is below, via Project Gutenberg.

A HELPLESS SITUATION

Once or twice a year I get a letter of a certain pattern, a pattern that never materially changes, in form and substance, yet I cannot get used to that letter—it always astonishes me. It affects me as the locomotive always affects me: I saw to myself, “I have seen you a thousand times, you always look the same way, yet you are always a wonder, and you are always impossible; to contrive you is clearly beyond human genius—you can’t exist, you don’t exist, yet here you are!”

I have a letter of that kind by me, a very old one. I yearn to print it, and where is the harm? The writer of it is dead years ago, no doubt, and if I conceal her name and address—her this-world address—I am sure her shade will not mind. And with it I wish to print the answer which I wrote at the time but probably did not send. If it went—which is not likely—it went in the form of a copy, for I find the original still here, pigeonholed with the said letter. To that kind of letters we all write answers which we do not send, fearing to hurt where we have no desire to hurt; I have done it many a time, and this is doubtless a case of the sort.

THE LETTER

X———, California, JUNE 3, 1879.

Mr. S. L. Clemens, Hartford, Conn.:

Dear Sir,—You will doubtless be surprised to know who has presumed to write and ask a favor of you. Let your memory go back to your days in the Humboldt mines—’62-’63. You will remember, you and Clagett and Oliver and the old blacksmith Tillou lived in a lean-to which was half-way up the gulch, and there were six log cabins in the camp—strung pretty well separated up the gulch from its mouth at the desert to where the last claim was, at the divide. The lean-to you lived in was the one with a canvas roof that the cow fell down through one night, as told about by you in Roughing It—my uncle Simmons remembers it very well. He lived in the principal cabin, half-way up the divide, along with Dixon and Parker and Smith. It had two rooms, one for kitchen and the other for bunks, and was the only one that had. You and your party were there on the great night, the time they had dried-apple-pie, Uncle Simmons often speaks of it. It seems curious that dried-apple-pie should have seemed such a great thing, but it was, and it shows how far Humboldt was out of the world and difficult to get to, and how slim the regular bill of fare was. Sixteen years ago—it is a long time. I was a little girl then, only fourteen. I never saw you, I lived in Washoe. But Uncle Simmons ran across you every now and then, all during those weeks that you and party were there working your claim which was like the rest. The camp played out long and long ago, there wasn’t silver enough in it to make a button. You never saw my husband, but he was there after you left, and lived in that very lean-to, a bachelor then but married to me now. He often wishes there had been a photographer there in those days, he would have taken the lean-to. He got hurt in the old Hal Clayton claim that was abandoned like the others, putting in a blast and not climbing out quick enough, though he scrambled the best he could. It landed him clear down on the train and hit a Piute. For weeks they thought he would not get over it but he did, and is all right, now. Has been ever since. This is a long introduction but it is the only way I can make myself known. The favor I ask I feel assured your generous heart will grant: Give me some advice about a book I have written. I do not claim anything for it only it is mostly true and as interesting as most of the books of the times. I am unknown in the literary world and you know what that means unless one has some one of influence (like yourself) to help you by speaking a good word for you. I would like to place the book on royalty basis plan with any one you would suggest.

This is a secret from my husband and family. I intend it as a surprise in case I get it published.

Feeling you will take an interest in this and if possible write me a letter to some publisher, or, better still, if you could see them for me and then let me hear.

I appeal to you to grant me this favor. With deepest gratitude I think you for your attention.

One knows, without inquiring, that the twin of that embarrassing letter is forever and ever flying in this and that and the other direction across the continent in the mails, daily, nightly, hourly, unceasingly, unrestingly. It goes to every well-known merchant, and railway official, and manufacturer, and capitalist, and Mayor, and Congressman, and Governor, and editor, and publisher, and author, and broker, and banker—in a word, to every person who is supposed to have “influence.” It always follows the one pattern: “You do not know me, but you once knew a relative of mine,” etc., etc. We should all like to help the applicants, we should all be glad to do it, we should all like to return the sort of answer that is desired, but—Well, there is not a thing we can do that would be a help, for not in any instance does that letter ever come from anyone who can be helped. The struggler whom you could help does his own helping; it would not occur to him to apply to you, stranger. He has talent and knows it, and he goes into his fight eagerly and with energy and determination—all alone, preferring to be alone. That pathetic letter which comes to you from the incapable, the unhelpable—how do you who are familiar with it answer it? What do you find to say? You do not want to inflict a wound; you hunt ways to avoid that. What do you find? How do you get out of your hard place with a contend conscience? Do you try to explain? The old reply of mine to such a letter shows that I tried that once. Was I satisfied with the result? Possibly; and possibly not; probably not; almost certainly not. I have long ago forgotten all about it. But, anyway, I append my effort:

THE REPLY

I know Mr. H., and I will go to him, dear madam, if upon reflection you find you still desire it. There will be a conversation. I know the form it will take. It will be like this:

MR. H. How do her books strike you?

MR. CLEMENS. I am not acquainted with them.

H. Who has been her publisher?

C. I don’t know.

H. She has one, I suppose?

C. I—I think not.

H. Ah. You think this is her first book?

C. Yes—I suppose so. I think so.

H. What is it about? What is the character of it?

C. I believe I do not know.

H. Have you seen it?

C. Well—no, I haven’t.

H. Ah-h. How long have you known her?

C. I don’t know her.

H. Don’t know her?

C. No.

H. Ah-h. How did you come to be interested in her book, then?

C. Well, she—she wrote and asked me to find a publisher for her, and mentioned you.

H. Why should she apply to you instead of me?

C. She wished me to use my influence.

H. Dear me, what has influence to do with such a matter?

C. Well, I think she thought you would be more likely to examine her book if you were
influenced.

H. Why, what we are here for is to examine books—anybody’s book that comes along. It’s our business. Why should we turn away a book unexamined because it’s a stranger’s? It would be foolish. No publisher does it. On what ground did she request your influence, since you do not know her? She must have thought you knew her literature and could speak for it. Is that it?

C. No; she knew I didn’t.

H. Well, what then? She had a reason of some sort for believing you competent to recommend her literature, and also under obligations to do it?

C. Yes, I—I knew her uncle.

H. Knew her uncle?

C. Yes.

H. Upon my word! So, you knew her uncle; her uncle knows her literature; he endorses it to you;
the chain is complete, nothing further needed; you are satisfied, and therefore—

C. No, that isn’t all, there are other ties. I know the cabin her uncle lived in, in the mines; I knew his partners, too; also I came near knowing her husband before she married him, and I did know the abandoned shaft where a premature blast went off and he went flying through the air and clear down to the trail and hit an Indian in the back with almost fatal consequences.

H. To him, or to the Indian?

C. She didn’t say which it was.

H. (With a sigh). It certainly beats the band! You don’t know her, you don’t know her literature, you don’t know who got hurt when the blast went off, you don’t know a single thing for us to build an estimate of her book upon, so far as I—

C. I knew her uncle. You are forgetting her uncle.

H. Oh, what use is he? Did you know him long? How long was it?

C. Well, I don’t know that I really knew him, but I must have met him, anyway. I think it was that way; you can’t tell about these things, you know, except when they are recent.

H. Recent? When was all this?

C. Sixteen years ago.

H. What a basis to judge a book upon! As first you said you knew him, and now you don’t know whether you did or not.

C. Oh yes, I know him; anyway, I think I thought I did; I’m perfectly certain of it.

H. What makes you think you thought you knew him?

C. Why, she says I did, herself.

H. She says so!

C. Yes, she does, and I did know him, too, though I don’t remember it now.

H. Come—how can you know it when you don’t remember it.

C. I don’t know. That is, I don’t know the process, but I do know lots of things that I don’t remember, and remember lots of things that I don’t know. It’s so with every educated person.

H. (After a pause). Is your time valuable?

C. No—well, not very.

H. Mine is.

So I came away then, because he was looking tired. Overwork, I reckon; I never do that; I have seen the evil effects of it. My mother was always afraid I would overwork myself, but I never did.

Dear madam, you see how it would happen if I went there. He would ask me those questions, and I would try to answer them to suit him, and he would hunt me here and there and yonder and get me embarrassed more and more all the time, and at last he would look tired on account of overwork, and there it would end and nothing done. I wish I could be useful to you, but, you see, they do not care for uncles or any of those things; it doesn’t move them, it doesn’t have the least effect, they don’t care for anything but the literature itself, and they as good as despise influence. But they do care for books, and are eager to get them and examine them, no matter whence they come, nor from whose pen. If you will send yours to a publisher—any publisher—he will certainly examine it, I can assure you of that.

December 04 2012

The MOOC movement is not an indicator of educational evolution

Somehow, recently, a lot of people have taken an interest in the broadcast of canned educational materials, and this practice — under a term that proponents and detractors have settled on, massive open online course (MOOC) — is getting a publicity surge. I know that the series of online classes offered by Stanford proved to be extraordinarily popular, leading to the foundation of Udacity and a number of other companies. But I wish people would stop getting so excited over this transitional technology. The attention drowns out two truly significant trends in progressive education: do-it-yourself labs and peer-to-peer exchanges.

In the current opinion torrent, Clay Shirky considers MOOCs one of the big disruptive technologies of our age, and Joseph E. Aoun, president of Northeastern University, writes (in a Boston Globe subscription-only article) that traditional colleges will have to deal with the MOOC challenge. Jon Bruner points out on Radar that non-elite American institutions could use a good scare (although I know a lot of people whose lives were dramatically improved by attending such colleges). The December issue of Communications of the ACM offers Professor Richard A. DeMillo from the Georgia Institute of Technology assessing the possible role of MOOCs in changing education, along with an editorial by editor-in-chief Moshe Y. Vardi culminating with, “If I had my wish, I would wave a wand and make MOOCs disappear.”

There’s a popular metaphor for this early stage of innovation: we look back to the time when film-makers made the first moving pictures with professional performers by setting up cameras before stages in theaters. This era didn’t last long before visionaries such as Georges Méliès, D. W. Griffith, Sergei Eisenstein, and Luis Buñuel uncovered what the new medium could do for itself. How soon will colleges get tired of putting lectures online and offer courses that take advantage of new media?

Two more appealing trends are already big. One is DIY courses, as popularized in the book Fab by Neil Gershenfeld at the MIT Media Lab. O’Reilly’s own Make projects are part of this movement. Fab courses represent the polar opposite of MOOCs in many ways. They are delivered in small settings to students whose dedication, inspiration, and talent have to match those of the teacher — the course asks a lot of everybody. But from anecdotal reports, DIY courses have been shown to be very powerful growth mechanisms in environments ranging from the top institutions (like MIT) to slums around the world. Teenagers are even learning to play with biological matter in labs such as BioCurious.

Fundamentally, DIY is a way to capture the theory of learning by doing, which goes back at least to John Dewey at the turn of the 20th century. The availability of 3D makers, cheap materials, fab software, and instructions over the Internet lend the theory a new practice.

“I believe in everything never yet said.”–Rainer Maria Rilke, Das Stunden-Buch

The other major trend cracking the foundations of education is peer-to-peer information exchange. This, like learning by doing, has plenty of history. The symposia of Ancient Greece (illustrated in fictional form by Plato) and the Talmudic discussions that underlay the creation of modern Judaism over 2,000 years ago show that human beings have long been used to learning from each other. Peer information exchange raged on centuries later in cafés and salons, beer halls and sewing circles. Experts were important, and everybody could recognize the arrival of a true expert, but he or she was just first among equals. A lot of students who sign up for MOOCs probably benefit from the online discussion forums as much as from the canned lectures and readings.

Wikipedia is a prominent example of peer-to-peer information exchange, and one that promulgates the contributions of experts, but one that also has trouble with sustainability. (They’re holding one of their fund-raisers now, and it’s a good time to donate.) This leads me to ask what business model colleges can apply in the face of both MOOCs and peer-to-peer knowledge. How do you mobilize a whole community to educate each other, while maintaining the value of expertise?

This challenge — not just a business challenge, but really the challenge of tapping expertise effectively — happens to be one that O’Reilly is dealing with in the field of publishing. We introduced the equivalent of filmed stage shows in the mid-1990s when we created the Safari Bookshelf to provide our books on a subscription-based website. The innovation was in the delivery model, which also delivered a shock to a publishing industry dependent on print sales.

But we knew that Safari Bookshelf barely dipped into the power of the web, which has grown more and more with advances in HTML, JavaScript, and mobile devices. Safari Bookshelf is much more than a collection of web pages with book content now. As a training tool, the web has exploded with other experiments. We offer an interactive school of technology also.

So the field of education will probably see lots of blended models along the way. It’s worth noting that proponents of open content have called for licensing models that reinforce the open promise of the courses. Some courses ask students to write their own textbooks and share them — but one asks where they get the information with which to write their peer-produced textbooks. In an earlier article I examined the difficulties of creating free, open textbooks that are actually usable for teaching. Such dilemmas just show that the investment of large amounts of time by experts are still a critical part of education — but applying the broadcast model to them may be less and less relevant.

October 22 2012

Four short links: 22 October 2012

  1. jq — command-line tool for JSON data.
  2. GAFFTA — Gray Area Foundation For The Arts. Non-profit running workshops and building projects around technology-driven arts. (via Roger Dennis)
  3. Power Pwn — looks like a power strip, is actually chock-full of pen-testing tools, WiFi, bluetooth, and GSM. Beautifully evil. (via Jim Stogdill)
  4. Open Access Week — this week is Open Access week, raising awareness of the value of ubiquitous access to scientific publishing. (via Fabiana Kubke)

September 24 2012

Four short links: 24 September 2012

  1. Open Monograph Pressan open source software platform for managing the editorial workflow required to see monographs, edited volumes and, scholarly editions through internal and external review, editing, cataloguing, production, and publication. OMP will operate, as well, as a press website with catalog, distribution, and sales capacities. (via OKFN)
  2. Sensing Activity in Royal Shakespeare Theatre (NLTK) — sensing activity in the theatre, for graphing. Raw data available. (via Infovore)
  3. Why Journalists Love Reddit (GigaOM) — “Stories appear on Reddit, then half a day later they’re on Buzzfeed and Gawker, then they’re on the Washington Post, The Guardian and the New York Times. It’s a pretty established pattern.”
  4. Relatively Prime: The Toolbox — Kickstarted podcasts on mathematics. (via BoingBoing)

August 15 2012

Mining the astronomical literature

There is a huge debate right now about making academic literature freely accessible and moving toward open access. But what would be possible if people stopped talking about it and just dug in and got on with it?

NASA’s Astrophysics Data System (ADS), hosted by the Smithsonian Astrophysical Observatory (SAO), has quietly been working away since the mid-’90s. Without much, if any, fanfare amongst the other disciplines, it has moved astronomers into a world where access to the literature is just a given. It’s something they don’t have to think about all that much.

The ADS service provides access to abstracts for virtually all of the astronomical literature. But it also provides access to the full text of more than half a million papers, going right back to the start of peer-reviewed journals in the 1800s. The service has links to online data archives, along with reference and citation information for each of the papers, and it’s all searchable and downloadable.

Number of papers published in the three main astronomy journals each year
Number of papers published in the three main astronomy journals each year. CREDIT: Robert Simpson

The existence of the ADS, along with the arXiv pre-print server, has meant that most astronomers haven’t seen the inside of a brick-built library since the late 1990s.

It also makes astronomy almost uniquely well placed for interesting data mining experiments, experiments that hint at what the rest of academia could do if they followed astronomy’s lead. The fact that the discipline’s literature has been scanned, archived, indexed and catalogued, and placed behind a RESTful API makes it a treasure trove, both for hypothesis generation and sociological research.

For example, the .Astronomy series of conferences is a small workshop that brings together the best and the brightest of the technical community: researchers, developers, educators and communicators. Billed as “20% time for astronomers,” it gives these people space to think about how the new technologies affect both how research and communicating research to their peers and to the public is done.

[Disclosure: I'm a member of the advisory board to the .Astronomy conference, and I previously served as a member of the programme organising committee for the conference series.]

It should perhaps come as little surprise that one of the more interesting projects to come out of a hack day held as part of this year’s .Astronomy meeting in Heidelberg was work by Robert Simpson, Karen Masters and Sarah Kendrew that focused on data mining the astronomical literature.

The team grabbed and processed the titles and abstracts of all the papers from the Astrophysical Journal (ApJ), Astronomy & Astrophysics (A&A), and the Monthly Notices of the Royal Astronomical Society (MNRAS) since each of those journals started publication — and that’s 1827 in the case of MNRAS.

By the end of the day, they’d found some interesting results showing how various terms have trended over time. The results were similar to what’s found in Google Books’ Ngram Viewer.

The relative popularity of the names of telescopes in the literature
The relative popularity of the names of telescopes in the literature. Hubble, Chandra and Spitzer seem to have taken turns in hogging the limelight, much as COBE, WMAP and Planck have each contributed to our knowledge of the cosmic microwave background in successive decades. References to Planck are still on the rise. CREDIT: Robert Simpson.

After the meeting, however, Robert has taken his initial results and explored the astronomical literature and his new corpus of data on the literature. He’s explored various visualisations of the data, including word matrixes for related terms and for various astro-chemistry.

Correlation between terms related to Active Galactic Nuclei
Correlation between terms related to Active Galactic Nuclei (AGN). The opacity of each square represents the strength of the correlation between the terms. CREDIT: Robert Simpson.

He’s also taken a look at authorship in astronomy and is starting to find some interesting trends.

Fraction of astronomical papers published with one, two, three, four or more authors
Fraction of astronomical papers published with one, two, three, four or more authors. CREDIT: Robert Simpson

You can see that single-author papers dominated for most of the 20th century. Around 1960, we see the decline begin, as two- and three-author papers begin to become a significant chunk of the whole. In 1978, author papers become more prevalent than single-author papers.

Compare the number of active research astronomers to the number of papers published each year
Compare the number of “active” research astronomers to the number of papers published each year (across all the major journals). CREDIT: Robert Simpson.

Here we see that people begin to outpace papers in the 1960s. This may reflect the fact that as we get more technical as a field, and more specialised, it takes more people to write the same number of papers, which is a sort of interesting result all by itself.

Interview with Robert Simpson: Behind the project and what lies ahead

I recently talked with Rob about the work he, Karen Masters, and Sarah Kendrew did at the meeting, and the work he’s been doing since with the newly gathered data.

What made you think about data mining the ADS?

Robert Simpson: At the .Astronomy 4 Hack Day in July, Sarah Kendrew had the idea to try to do an astronomy version of BrainSCANr, a project that generates new hypotheses in the neuroscience literature. I’ve had a go at mining ADS and arXiv before, so it seemed like a great excuse to dive back in.

Do you think there might be actual science that could be done here?

Robert Simpson: Yes, in the form of finding questions that were unexpected. With such large volumes of peer-reviewed papers being produced daily in astronomy, there is a lot being said. Most researchers can only try to keep up with it all — my daily RSS feed from arXiv is next to useless, it’s so bloated. In amongst all that text, there must be connections and relationships that are being missed by the community at large, hidden in the chatter. Maybe we can develop simple techniques to highlight potential missed links, i.e. generate new hypotheses from the mass of words and data.

Are the results coming out of the work useful for auditing academics?

Robert Simpson: Well, perhaps, but that would be tricky territory in my opinion. I’ve only just begun to explore the data around authorship in astronomy. One thing that is clear is that we can see a big trend toward collaborative work. In 2012, only 6% of papers were single-author efforts, compared with 70+% in the 1950s.

The average number of authors per paper since 1827
The above plot shows the average number of authors, per paper since 1827. CREDIT: Robert Simpson.

We can measure how large groups are becoming, and who is part of which groups. In that sense, we can audit research groups, and maybe individual people. The big issue is keeping track of people through variations in their names and affiliations. Identifying authors is probably a solved problem if we look at ORCID.

What about citations? Can you draw any comparisons with h-index data?

Robert Simpson: I haven’t looked at h-index stuff specifically, at least not yet, but citations are fun. I looked at the trends surrounding the term “dark matter” and saw something interesting. Mentions of dark matter rise steadily after it first appears in the late ’70s.

Compare the term dark matter with related terms
Compare the term “dark matter” with a few other related terms: “cosmology,” “big bang,” “dark energy,” and “wmap.” You can see cosmology has been getting more popular since the 1990s, and dark energy is a recent addition. CREDIT: Robert Simpson.

In the data, astronomy becomes more and more obsessed with dark matter — the term appears in 1% of all papers by the end of the ’80s and 6% today.

Looking at citations changes the picture. The community is writing papers about dark matter more and more each year, but they are getting fewer citations than they used to (the peak for this was in the late ’90s). These trends are normalised, so the only regency effect I can think of is that dark matter papers take more than 10 years to become citable. Either that or dark matter studies are currently in a trough for impact.

Can you see where work is dropped by parts of the community and picked up again?

Robert Simpson: Not yet, but I see what you mean. I need to build a better picture of the community and its components.

Can you build a social graph of astronomers out of this data? What about (academic) family trees?

Robert Simpson: Identifying unique authors is my next step, followed by creating fingerprints of individuals at a given point in time. When do people create their first-author papers, when do they have the most impact in their careers, stuff like that.

What tools did you use? In hindsight, would you do it differently?

I’m using Ruby and Perl to grab the data, MySQL to store and query it, JavaScript to display it (Google Charts and D3.js). I may still move the database part to MongoDB because it was designed to store documents. Similarly, I may switch from ADS to arXiv as the data source. Using arXiv would allow me to grab the full text in many cases, even if it does introduce a peer-review issue.

What’s next?

Robert Simpson: My aim is still to attempt real hypothesis generation. I’ve begun the process by investigating correlations between terms in the literature, but I think the power will be in being able to compare all terms with all terms and looking for the unexpected. Terms may correlate indirectly (via a third term, for example), so the entire corpus needs to be processed and optimised to make it work comprehensively.

Science between the cracks

I’m really looking forward to seeing more results coming out of Robert’s work. This sort of analysis hasn’t really been possible before. It’s showing a lot of promise both from a sociological angle, with the ability to do research into how science is done and how that has changed, but also ultimately as a hypothesis engine — something that can generate new science in and of itself. This is just a hack day experiment. Imagine what could be done if the literature were more open and this sort of analysis could be done across fields?

Right now, a lot of the most interesting science is being done in the cracks between disciplines, but the hardest part of that sort of work is often trying to understand the literature of the discipline that isn’t your own. Robert’s project offers a lot of hope that this may soon become easier.

August 06 2012

Four short links: 6 August 2012

  1. Deepflight Kickstarter — built like an aircraft, this submersible flies underwater. Saw footage of it at scifoo, looked mind-bogglingly fun. They’re kickstarting the aero(hydro?)batics test of maneuverability and reward levels include trips in it.
  2. WeHi.tvexplains the discoveries of scientists at the Walter and Eliza Hall Institute through 3D animation. The beautiful work of Drew Berry, who also did animation for Bjork’s Biophilia music app.
  3. A Communications Primer — Ray and Charles Eave (“Powers of Ten”) lay out the work of Claude Shannon and Norbert Weiner and others for Mr and Ms Ordinary. (via Linda Doyle)
  4. Scientific Communication as Sequential Art (Bret Victor) — gloriously comprehensible rewrite (using interactive diagrams instead of math) of a classic social graph paper (Watts and Strogatz).
Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl