Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

January 21 2014

Four short links: 21 January 2014

  1. On Being a Senior Engineer (Etsy) — Mature engineers know that no matter how complete, elegant, or superior their designs are, it won’t matter if no one wants to work alongside them because they are assholes.
  2. Control Theory (Coursera) — Learn about how to make mobile robots move in effective, safe, predictable, and collaborative ways using modern control theory. (via DIY Drones)
  3. US Moves Towards Open Access (WaPo) — Congress passed a budget that will make about half of taxpayer-funded research available to the public.
  4. NHS Patient Data Available for Companies to Buy (The Guardian) — Once live, organisations such as university research departments – but also insurers and drug companies – will be able to apply to the new Health and Social Care Information Centre (HSCIC) to gain access to the database, called If an application is approved then firms will have to pay to extract this information, which will be scrubbed of some personal identifiers but not enough to make the information completely anonymous – a process known as “pseudonymisation”. Recipe for disaster as it has been repeatedly shown that it’s easy to identify individuals, given enough scrubbed data. Can’t see why the NHS just doesn’t make it an app in Facebook. “Nat’s Prostate status: it’s complicated.”

January 08 2014

Four short links: 8 January 2014

  1. Launching the Wolfram Connected Devices Project — Wolfram Alpha is cognition-as-a-service, which they hope to embed in devices. This data-powered Brain-in-the-Cloud play will pit them against Google, but G wants to own the devices and the apps and the eyeballs that watch them … interesting times ahead!
  2. How the USA Almost Killed the Internet (Wired) — “At first we were in an arms race with sophisticated criminals,” says Eric Grosse, Google’s head of security. “Then we found ourselves in an arms race with certain nation-state actors [with a reputation for cyberattacks]. And now we’re in an arms race with the best nation-state actors.”
  3. Intel Edison — SD-card sized, with low-power 22nm 400MHz Intel Quark processor with two cores, integrated Wi-Fi and Bluetooth.
  4. N00b 2 L33t, Now With Graphs (Tom Stafford) — open science research validating many of the findings on learning, tested experimentally via games. In the present study, we analyzed data from a very large sample (N = 854,064) of players of an online game involving rapid perception, decision making, and motor responding. Use of game data allowed us to connect, for the first time, rich details of training history with measures of performance from participants engaged for a sustained amount of time in effortful practice. We showed that lawful relations exist between practice amount and subsequent performance, and between practice spacing and subsequent performance. Our methodology allowed an in situ confirmation of results long established in the experimental literature on skill acquisition. Additionally, we showed that greater initial variation in performance is linked to higher subsequent performance, a result we link to the exploration/exploitation trade-off from the computational framework of reinforcement learning.

April 02 2013

Four short links: 2 April 2013

  1. Analyzing mbostock’s queue.js — beautiful walkthrough of a small library, showing the how and why of good coding.
  2. What Job Would You Hire a Textbook To Do? (Karl Fisch) — notes from a Discovery Education “Beyond the Textbook” event. The issues Karl highlights for textbooks (why digital, etc.) are there for all books as we create this new genre.
  3. Neutralizing Open Access (Glyn Moody) — the publishers appear to have captured the UK group implementing the UK’s open access policy. At every single step of the way, the RCUK policy has been weakened. From being the best and most progressive in the world, it’s now considerably weaker than policies already in action elsewhere in the world, and hardly represents an increment on their 2006 policy. What’s at stake? Opportunity to do science faster, to provide source access to research for the public, and to redirect back to research the millions of pounds spent on journal subscriptions.
  4. Turn the Raspberry Pi into a VPN Server (LinuxUser) — One possible scenario for wanting a cheap server that you can leave somewhere is if you have recently moved away from home and would like to be able to easily access all of the devices on the network at home, in a secure manner. This will enable you to send files directly to computers, diagnose problems and other useful things. You’ll also be leaving a powered USB hub connected to the Pi, so that you can tell someone to plug in their flash drive, hard drive etc and put files on it for them. This way, they can simply come and collect it later whenever the transfer has finished.

March 07 2013

March 05 2013

February 28 2013

Four short links: 28 February 2013

  1. Myth of the Free Internet (The Atlantic) — equity of access is an important issue, but this good point is marred by hanging it off the problematic (beer? speech? downloads?) “free”. I’m on the council of InternetNZ whose mission is to protect and promote the open and uncaptureable Internet. (A concept so good we had to make up a word for it)
  2. Periodic Table of the SmartPhone (PDF, big) — from Scientific American article on Rare Earth Minerals in the Smartphone comes a link to this neat infographic showing where rare earth elements are used in the iPhone. (via Om Malik)
  3. CrazyFlie Nano Preorders19g, 9cm x 9cm, 20min charge time for 7m flight time on this nano-quadrocopter. (via Wired)
  4. Changing Scientific Publishing (The Economist) — Nature buys an alternative journal publisher (30 titles in 14 scientific fields), which comes with an 80k-member social network for scientists. Macmillan are a clever bunch. (O’Reilly runs Science Foo Camp with Macmillan’s Digital Sciences and Google)

February 22 2013

White House moves to increase public access to scientific research online

Today, the White House responded to a We The People e-petition that asked for free online access to taxpayer-funded research.

open-access-smallopen-access-smallAs part of the response, John Holdren, the director of the White House Office of Science and Technology Policy, released a memorandum today directing agencies with “more than $100 million in research and development expenditures to develop plans to make the results of federally-funded research publically available free of charge within 12 months after original publication.”

The Obama administration has been considering access to federally funded scientific research for years, including a report to Congress in March 2012. The relevant e-petition, which had gathered more than 65,000 signatures, had gone unanswered since May of last year.

As Hayley Tsukayama notes in the Washington Post, the White House acknowledged the open access policies of the National Institutes of Health as a successful model for sharing research.

“This is a big win for researchers, taxpayers, and everyone who depends on research for new medicines, useful technologies, or effective public policies,” said Peter Suber, Director of the Public Knowledge Open Access Project, in a release. “Assuring public access to non-classified publicly-funded research is a long-standing interest of Public Knowledge, and we thank the Obama Administration for taking this significant step.”

Every federal agency covered by this memomorandum will eventually need to “ensure that the public can read, download, and analyze in digital form final peer-reviewed manuscripts or final published documents within a timeframe that is appropriate for each type of research conducted or sponsored by the agency.”

An open government success story?

From the day they were announced, one of the biggest question marks about We The People e-petitions has always been whether the administration would make policy changes or take public stances it had not before on a given issue.

While the memorandum and the potential outcomes from its release come with caveats, from a $100 million threshold to national security or economic competition, this answer from the director of the White House Office of Science Policy accompanied by a memorandum directing agencies to make a plan for public access to research is a substantive outcome.

While there are many reasons to be critical of some open government initiatives, it certainly appears that today, We The People were heard in the halls of government.

An earlier version of this post appears on the Radar Tumblr, including tweets regarding the policy change. Photo Credit: ajc1 on Flickr.

Reposted bycheg00 cheg00

October 22 2012

Four short links: 22 October 2012

  1. jq — command-line tool for JSON data.
  2. GAFFTA — Gray Area Foundation For The Arts. Non-profit running workshops and building projects around technology-driven arts. (via Roger Dennis)
  3. Power Pwn — looks like a power strip, is actually chock-full of pen-testing tools, WiFi, bluetooth, and GSM. Beautifully evil. (via Jim Stogdill)
  4. Open Access Week — this week is Open Access week, raising awareness of the value of ubiquitous access to scientific publishing. (via Fabiana Kubke)

August 15 2012

Mining the astronomical literature

There is a huge debate right now about making academic literature freely accessible and moving toward open access. But what would be possible if people stopped talking about it and just dug in and got on with it?

NASA’s Astrophysics Data System (ADS), hosted by the Smithsonian Astrophysical Observatory (SAO), has quietly been working away since the mid-’90s. Without much, if any, fanfare amongst the other disciplines, it has moved astronomers into a world where access to the literature is just a given. It’s something they don’t have to think about all that much.

The ADS service provides access to abstracts for virtually all of the astronomical literature. But it also provides access to the full text of more than half a million papers, going right back to the start of peer-reviewed journals in the 1800s. The service has links to online data archives, along with reference and citation information for each of the papers, and it’s all searchable and downloadable.

Number of papers published in the three main astronomy journals each year
Number of papers published in the three main astronomy journals each year. CREDIT: Robert Simpson

The existence of the ADS, along with the arXiv pre-print server, has meant that most astronomers haven’t seen the inside of a brick-built library since the late 1990s.

It also makes astronomy almost uniquely well placed for interesting data mining experiments, experiments that hint at what the rest of academia could do if they followed astronomy’s lead. The fact that the discipline’s literature has been scanned, archived, indexed and catalogued, and placed behind a RESTful API makes it a treasure trove, both for hypothesis generation and sociological research.

For example, the .Astronomy series of conferences is a small workshop that brings together the best and the brightest of the technical community: researchers, developers, educators and communicators. Billed as “20% time for astronomers,” it gives these people space to think about how the new technologies affect both how research and communicating research to their peers and to the public is done.

[Disclosure: I'm a member of the advisory board to the .Astronomy conference, and I previously served as a member of the programme organising committee for the conference series.]

It should perhaps come as little surprise that one of the more interesting projects to come out of a hack day held as part of this year’s .Astronomy meeting in Heidelberg was work by Robert Simpson, Karen Masters and Sarah Kendrew that focused on data mining the astronomical literature.

The team grabbed and processed the titles and abstracts of all the papers from the Astrophysical Journal (ApJ), Astronomy & Astrophysics (A&A), and the Monthly Notices of the Royal Astronomical Society (MNRAS) since each of those journals started publication — and that’s 1827 in the case of MNRAS.

By the end of the day, they’d found some interesting results showing how various terms have trended over time. The results were similar to what’s found in Google Books’ Ngram Viewer.

The relative popularity of the names of telescopes in the literature
The relative popularity of the names of telescopes in the literature. Hubble, Chandra and Spitzer seem to have taken turns in hogging the limelight, much as COBE, WMAP and Planck have each contributed to our knowledge of the cosmic microwave background in successive decades. References to Planck are still on the rise. CREDIT: Robert Simpson.

After the meeting, however, Robert has taken his initial results and explored the astronomical literature and his new corpus of data on the literature. He’s explored various visualisations of the data, including word matrixes for related terms and for various astro-chemistry.

Correlation between terms related to Active Galactic Nuclei
Correlation between terms related to Active Galactic Nuclei (AGN). The opacity of each square represents the strength of the correlation between the terms. CREDIT: Robert Simpson.

He’s also taken a look at authorship in astronomy and is starting to find some interesting trends.

Fraction of astronomical papers published with one, two, three, four or more authors
Fraction of astronomical papers published with one, two, three, four or more authors. CREDIT: Robert Simpson

You can see that single-author papers dominated for most of the 20th century. Around 1960, we see the decline begin, as two- and three-author papers begin to become a significant chunk of the whole. In 1978, author papers become more prevalent than single-author papers.

Compare the number of active research astronomers to the number of papers published each year
Compare the number of “active” research astronomers to the number of papers published each year (across all the major journals). CREDIT: Robert Simpson.

Here we see that people begin to outpace papers in the 1960s. This may reflect the fact that as we get more technical as a field, and more specialised, it takes more people to write the same number of papers, which is a sort of interesting result all by itself.

Interview with Robert Simpson: Behind the project and what lies ahead

I recently talked with Rob about the work he, Karen Masters, and Sarah Kendrew did at the meeting, and the work he’s been doing since with the newly gathered data.

What made you think about data mining the ADS?

Robert Simpson: At the .Astronomy 4 Hack Day in July, Sarah Kendrew had the idea to try to do an astronomy version of BrainSCANr, a project that generates new hypotheses in the neuroscience literature. I’ve had a go at mining ADS and arXiv before, so it seemed like a great excuse to dive back in.

Do you think there might be actual science that could be done here?

Robert Simpson: Yes, in the form of finding questions that were unexpected. With such large volumes of peer-reviewed papers being produced daily in astronomy, there is a lot being said. Most researchers can only try to keep up with it all — my daily RSS feed from arXiv is next to useless, it’s so bloated. In amongst all that text, there must be connections and relationships that are being missed by the community at large, hidden in the chatter. Maybe we can develop simple techniques to highlight potential missed links, i.e. generate new hypotheses from the mass of words and data.

Are the results coming out of the work useful for auditing academics?

Robert Simpson: Well, perhaps, but that would be tricky territory in my opinion. I’ve only just begun to explore the data around authorship in astronomy. One thing that is clear is that we can see a big trend toward collaborative work. In 2012, only 6% of papers were single-author efforts, compared with 70+% in the 1950s.

The average number of authors per paper since 1827
The above plot shows the average number of authors, per paper since 1827. CREDIT: Robert Simpson.

We can measure how large groups are becoming, and who is part of which groups. In that sense, we can audit research groups, and maybe individual people. The big issue is keeping track of people through variations in their names and affiliations. Identifying authors is probably a solved problem if we look at ORCID.

What about citations? Can you draw any comparisons with h-index data?

Robert Simpson: I haven’t looked at h-index stuff specifically, at least not yet, but citations are fun. I looked at the trends surrounding the term “dark matter” and saw something interesting. Mentions of dark matter rise steadily after it first appears in the late ’70s.

Compare the term dark matter with related terms
Compare the term “dark matter” with a few other related terms: “cosmology,” “big bang,” “dark energy,” and “wmap.” You can see cosmology has been getting more popular since the 1990s, and dark energy is a recent addition. CREDIT: Robert Simpson.

In the data, astronomy becomes more and more obsessed with dark matter — the term appears in 1% of all papers by the end of the ’80s and 6% today.

Looking at citations changes the picture. The community is writing papers about dark matter more and more each year, but they are getting fewer citations than they used to (the peak for this was in the late ’90s). These trends are normalised, so the only regency effect I can think of is that dark matter papers take more than 10 years to become citable. Either that or dark matter studies are currently in a trough for impact.

Can you see where work is dropped by parts of the community and picked up again?

Robert Simpson: Not yet, but I see what you mean. I need to build a better picture of the community and its components.

Can you build a social graph of astronomers out of this data? What about (academic) family trees?

Robert Simpson: Identifying unique authors is my next step, followed by creating fingerprints of individuals at a given point in time. When do people create their first-author papers, when do they have the most impact in their careers, stuff like that.

What tools did you use? In hindsight, would you do it differently?

I’m using Ruby and Perl to grab the data, MySQL to store and query it, JavaScript to display it (Google Charts and D3.js). I may still move the database part to MongoDB because it was designed to store documents. Similarly, I may switch from ADS to arXiv as the data source. Using arXiv would allow me to grab the full text in many cases, even if it does introduce a peer-review issue.

What’s next?

Robert Simpson: My aim is still to attempt real hypothesis generation. I’ve begun the process by investigating correlations between terms in the literature, but I think the power will be in being able to compare all terms with all terms and looking for the unexpected. Terms may correlate indirectly (via a third term, for example), so the entire corpus needs to be processed and optimised to make it work comprehensively.

Science between the cracks

I’m really looking forward to seeing more results coming out of Robert’s work. This sort of analysis hasn’t really been possible before. It’s showing a lot of promise both from a sociological angle, with the ability to do research into how science is done and how that has changed, but also ultimately as a hypothesis engine — something that can generate new science in and of itself. This is just a hack day experiment. Imagine what could be done if the literature were more open and this sort of analysis could be done across fields?

Right now, a lot of the most interesting science is being done in the cracks between disciplines, but the hardest part of that sort of work is often trying to understand the literature of the discipline that isn’t your own. Robert’s project offers a lot of hope that this may soon become easier.

August 09 2012

The risks and rewards of a health data commons

As I wrote earlier this year in an ebook on data for the public good, while the idea of data as a currency is still in its infancy, it’s important to think about where the future is taking us and our personal data.

If the Obama administration’s smart disclosure initiatives gather steam, more citizens will be able to do more than think about personal data: they’ll be able to access their financial, health, education, or energy data. In the U.S. federal government, the Blue Button initiative, which initially enabled veterans to download personal health data, is now spreading to all federal employees, and it also earned adoption at private institutions like Aetna and Kaiser Permanente. Putting health data to work stands to benefit hundreds of millions of people. The Locker Project, which provides people with the ability to move and store personal data, is another approach to watch.

The promise of more access to personal data, however, is balanced by accompanying risks. Smartphones, tablets, and flash drives, after all, are lost or stolen every day. Given the potential of mhealth, and big data and health care information technology, researchers and policy makers alike are moving forward with their applications. As they do so, conversations and rulemaking about health care privacy will need to take into account not just data collection or retention but context and use.

Put simply, businesses must confront the ethical issues tied to massive aggregation and data analysis. Given that context, Fred Trotter’s post on who owns health data is a crucial read. As Fred highlights, the real issue is not ownership, per se, but “What rights do patients have regarding health care data that refers to them?”

Would, for instance, those rights include the ability to donate personal data to a data commons, much in the same way organs are donated now for research? That question isn’t exactly hypothetical, as the following interview with John Wilbanks highlights.

Wilbanks, a senior fellow at the Kauffman Foundation and director of the Consent to Research Project, has been an advocate for open data and open access for years, including a stint at Creative Commons; a fellowship at the World Wide Web Consortium; and experience in the academic, business, and legislative worlds. Wilbanks will be speaking at the Strata Rx Conference in October.

Our interview, lightly edited for content and clarity, follows.

Where did you start your career? Where has it taken you?

John WilbanksJohn Wilbanks: I got into all of this, in many ways, because I studied philosophy 20 years ago. What I studied inside of philosophy was semantics. In the ’90s, that was actually sort of pointless because there wasn’t much semantic stuff happening computationally.

In the late ’90s, I started playing around with biotech data, mainly because I was dating a biologist. I was sort of shocked at how the data was being represented. It wasn’t being represented in a way that was very semantic, in my opinion. I started a software company and we ran that for a while, [and then] sold it during the crash.

I went to the Worldwide Web Consortium, where I spent a year helping start their Semantic Web for Life Sciences project. While I was there, Creative Commons (CC) asked me to come and start their science project because I had known a lot of those guys. When I started my company, I was at the Berkman Center at Harvard Law School, and that’s where Creative Commons emerged from, so I knew the people. I knew the policy and I had gone off and had this bioinformatics software adventure.

I spent most of the last eight years at CC working on trying to build different commons in science. We looked at open access to scientific literature, which is probably where we had the most success because that’s copyright-centric. We looked at patents. We looked at physical laboratory materials, like stem cells in mice. We looked at different legal regimes to share those things. And we looked at data. We looked at both the technology aspects and legal aspects of sharing data and making it useful.

A couple of times over those years, we almost pivoted from science to health because science is so institutional that it’s really hard for any of the individual players to create sharing systems. It’s not like software, where anyone with a PC and an Internet connection can contribute to free software, or Flickr, where anybody with a digital camera can license something under CC. Most scientists are actually restricted by their institutions. They can’t share, even if they want to.

Health kept being interesting because it was the individual patients who had a motivation to actually create something different than the system did. At the same time, we were watching and seeing the capacity of individuals to capture data about themselves exploding. So, at the same time that the capacity of the system to capture data about you exploded, your own capacity to capture data exploded.

That, to me, started taking on some of the interesting contours that make Creative Commons successful, which was that you didn’t need a large number of people. You didn’t need a very large percentage of Wikipedia users to create Wikipedia. You didn’t need a large percentage of free software users to create free software. If this capacity to generate data about your health was exploding, you didn’t need a very large percentage of those people to create an awesome data resource: you needed to create the legal and technical systems for the people who did choose to share to make that sharing useful.

Since Creative Commons is really a copyright-centric organization, I left because the power on which you’re going to build a commons of health data is going to be privacy power, not copyright power. What I do now is work on informed consent, which is the legal system you need to work with instead of copyright licenses, as well as the technologies that then store, clean, and forward user-generated data to computational health and computational disease research.

What are the major barriers to people being able to donate their data in the same way they might donate their organs?

John Wilbanks: Right now, it looks an awful lot like getting onto the Internet before there was the web. The big ISPs kind of dominated the early adopters of computer technologies. You had AOL. You had CompuServe. You had Prodigy. And they didn’t communicate with each other. You couldn’t send email from AOL to CompuServe.

What you have now depends on the kind of data. If the data that interests you is your genotype, you’re probably a 23andMe customer and you’ve got a bunch of your data at 23andMe. If you are the kind of person who has a chronic illness and likes to share information about that illness, you’re probably a customer at PatientsLikeMe. But those two systems don’t interoperate. You can’t send data from one to the other very effectively or really at all.

On top of that, the system has data about you. Your insurance company has your billing records. Your physician has your medical records. Your pharmacy has your pharmacy records. And if you do quantified self, you’ve got your own set of data streams. You’ve got your Fitbit, the data coming off of your smartphone, and your meal data.

Almost all of these are basically populating different silos. In some cases, you have the right to download certain pieces of the data. For the most part, you don’t. It’s really hard for you, as an individual, to build your own, multidimensional picture of your data, whereas it’s actually fairly easy for all of those companies to sell your data to one another. There’s not a lot of technology that lets you share.

What are some of the early signals we’re seeing about data usage moving into actual regulatory language?

John Wilbanks: The regulatory language actually makes it fairly hard to do contextual privacy waiving, in a Creative Commons sense. It’s hard to do granular permissions around privacy in the way you can do granular conditional copyright grants because you don’t have intellectual property. The only legal tool you have is a contract, and the contracts don’t have a lot of teeth.

It’s pretty hard to do anything beyond a gift. It’s more like organ donation, where you don’t get to decide where the organs go. What I’m working on is basically a donation, not a conditional gift. The regulatory environment makes it quite hard to do anything besides that.

There was a public comment period that just finished. It’s an announcement of proposed rulemaking on what’s called the Common Rule, which is the Department of Health and Human Services privacy language. It was looking to re-examine the rules around letting de-identified data or anonymized data out for widespread use. They got a bunch of comments.

There’s controversy as to how de-identified data can actually be and still be useful. There is going to be, probably, a three-to-five year process where they rewrite the Common Rule and it’ll be more modern. No one knows how modern, but it will be at least more modern when that finishes.

Then there’s another piece in the US — HIPAA — which creates a totally separate regime. In some ways, it is the same as the Common Rule, but not always. I don’t think that’s going to get opened up. The way HIPAA works is that they have 17 direct identifiers that are labeled as identifying information. If you strip those out, it’s considered de-identified.

There’s an 18th bucket, which is anything else that can reasonably identify people. It’s really hard to hit. Right now, your genome is not considered to fall under that. I would be willing to bet within a year or two, it will be.

From a regulatory perspective, you’ve got these overlapping regimes that don’t quite fit and both of them are moving targets. That creates a lot of uncertainty from an investment perspective or from an analytics perspective.

How are you thinking about a “health data commons,” in terms of weighing potential risks against potential social good?

John Wilbanks: I think that that’s a personal judgment as to the risk-benefit decision. Part of the difficulty is that the regulations are very syntactic — “This is what re-identification is” — whereas the concept of harm, benefit, or risk is actually something that’s deeply personal. If you are sick, if you have cancer or a rare disease, you have a very different idea of what risk is compared to somebody who thinks of him or herself as healthy.

What we see — and this is born out in the Framingham Heart Study and all sorts of other longitudinal surveys — is that people’s attitudes toward risk and benefit change depending on their circumstances. Their own context really affects what they think is risky and what they think isn’t risky.

I believe that the early data donors are likely to be people for whom there isn’t a lot of risk perceived because the health system already knows that they’re sick. The health system is already denying them coverage, denying their requests for PET scans, denying their requests for access to care. That’s based on actuarial tables, not on their personal data. It’s based on their medical history.

If you’re in that group of people, then the perceived risk is actually pretty low compared to the idea that your data might actually get used or to the idea that you’re no longer passive. Even if it’s just a donation, you’re doing something outside of the system that’s accelerating the odds of getting something discovered. I think that’s the natural group.

If you think back to the numbers of users who are required to create free software or Wikipedia, to create a cultural commons, a very low percentage is needed to create a useful resource.

Depending on who you talk to, somewhere between 5-10% of all Americans either have a rare disease, have it in their first order family, or have a friend with a rare disease. Each individual disease might not have very many people suffering from it, but if you net them all up, it’s a lot of people. Getting several hundred thousand to a few million people enrolled is not an outrageous idea.

When you look at the existing examples of where such commons have come together, what have been the most important concrete positive outcomes for society?

John Wilbanks: I don’t think we have really even started to see them because most people don’t have computable data about themselves. Most people, if they have any data about themselves, have scans of their medical records.
What we really know is that there’s an opportunity cost to not trying, which is that the existing system is really inefficient, very bad at discovering drugs, and very bad at getting those drugs to market in a timely basis.

That’s one of the reasons we’re doing this is as an experiment. We would like to see exactly how effective big computational approaches are on health data. The problem is that there are two ways to get there.

One is through a set of monopoly companies coming together and working together. That’s how semiconductors work. The other is through an open network approach. There’s not a lot of evidence that things besides these two approaches work. Government intervention is probably not going to work.

Obviously, I come down on the open network side. But there’s an implicit belief, I think, both in the people who are pushing the cooperating monopolies approach and the people who are pushing the open networks approach, that there’s enormous power in the big-data-driven approach. We’re just leaving that on the table right now by not having enough data aggregated.

The benefits to health that will come out will be the ability to increasingly, by looking at a multidimensional picture of a person, predict with some confidence whether or not a drug will work, or whether they’re going to get sick, or how sick they’re going to get, or what lifestyle changes they can make to mitigate an illness. Right now, basically, we really don’t know very much.

Pretty Simple Data Privacy

John Wilbanks discussed “Pretty Simple Data Privacy” during a Strata Online Conference in January 2012. His presentation begins at the 7:18 mark in the following video:

Strata Rx — Strata Rx, being held Oct. 16-17 in San Francisco, is the first conference to bring data science to the urgent issues confronting health care.

Save 20% on registration with the code RADAR20

Photo: Science Commons

October 07 2011

Das Urheberrecht und seine Auslegung treibt seltsame Blüten

Ein neues Urteil des Landgerichts Stuttgart zu § 52a UrhG hat Prof. Rainer Kuhlen dazu veranlasst, zu zivilem Ungehorsam aufzurufen.

Was war geschehen? Das Landgericht Stuttgart (Urteil vom 27.09.2011, Az.: 17 O 671/10) hat die Fernuniversität Hagen zur Unterlassung verurteilt, u.a. weil man Studierenden im Rahmen einer geschlossenen Benutzergruppe Auszüge aus einem Lehrbuch als PDF-Datei zur Verfügung gestellt hat.

Dies kann zwar urheberrechtlich grundsätzlich nach § 52a Abs. 1 Nr. 1 UrhG zulässig sein – was auch das LG Stuttgart konzidiert – aber nach Ansicht des Gerichts nicht in Form einer PDF-Datei, die die Studenten zudem auf ihrem Rechner speichern können. Denn der Gesetzgeber, so das Landgericht, wollte mit § 52a UrhG nur eine Nutzung ermöglichen, die der analogen Nutzung vergleichbar ist. Die Speicherung auf den Computern der Studenten stellt aber eine qualitativ höherwertige Form der Vervielfältigung als die analoge Nutzung dar, weil das abgespeicherte Werk sogleich in die Textverabeitung übernommen werden kann. Man hätte deshalb ein anderes Dateiformat wählen müssen.

Diese Urteilsbegründung ist m.E. falsch und auch gänzlich praxisfern, weil sie weder vom Wortlaut noch von der ratio der Vorschrift gedeckt ist. Man stellt sich hier unweigerlich auch die Frage, welche Form der Nutzung denn der analogen Nutzung entsprechen würde. Die PDF-Datei ist eine derjenigen Umsetzungen, die einer analogen Kopie noch am ehesten entsprechen. Dass man Dateien grundsätzlich speichern kann, liegt in der Natur der Sache. Die Fernuni Hagen hat mittlerweile offenbar von PDF-Dateien auf Flash-Lösungen umgestellt.

Unabhängig davon, ob das Landgericht Stuttgart das geltende Recht zutreffend anwendet oder nicht, zeigt der Fall aber auch, dass es der Gesetzgeber bislang verabsäumt hat, im Interesse der Allgemeinheit ein bildungsfreundliches Urheberrecht zu schaffen.

Vielmehr wird das Urheberrecht laufend zu Gunsten der Rechteinhaber (Verlage, Plattenfirmen, Filmverleihgesellschaften)  verschärft. Die Lobbyismusmaschinerie der Rechteinhaber hat die Politik in Berlin und Brüssel fest im Griff. Und dies geht zu Lasten des Gemeinwohls. Speziell eine Regelung wie § 52a UrhG – die in dieser Form erst 2003 in das Urheberrechtsgesetz eingefügt wurde – die eine gesetzliche Beschränkung der Rechte des Urhebers zugunsten von Unterricht und Forschung vorsieht, springt noch deutlich zu kurz.

An dieser Stelle ist leider nicht wirklich eine Besserung in Sicht, solange die Bürger nicht auf die Barrikaden gehen. Und deshalb ist die Aufregung Kuhlens sehr gut nachvollziehbar. Derartigen gesetzgeberischen Kleinmut kann sich eine Wissens- und Informationsgesellschaft auf Dauer nicht leisten.

March 17 2011

Gesetzesentwurf zur Zweitverwertung wissenschaftlicher Werke

Die SPD-Fraktion hat einen Gesetzesentwurf zur Schaffung eines Zweitverwertungsrechts für wissenschaftliche Arbeiten, die im Rahmen einer überwiegend mit öffentlichen Mitteln finanzierten Lehr- und Forschungstätigkeit entstanden sind, vorgelegt. Hierzu soll die neue Vorschrift eines § 38a UrhG geschaffen werden, die dem Urheber eine nichtkommerzielle Zweitverwertung seines Werks nach Ablauf von 6 Monaten bei Periodika und 12 Monaten bei Sammelwerken erlaubt. Diese Zweitverwertung ist allerdings auf eine öffentliche Zugänglichmachung beschränkt, womit primär eine Online-Veröffentlichung gestattet werden soll.

Das praktische Problem dürfte hier u.a. darin bestehen, im Einzelfall festzustellen, wann eine Arbeit im Rahmen einer Tätigkeit entstanden ist, die mindestens zur Hälfte mit öffentlichen Mitteln finanziert wurde.

November 12 2009

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!