Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

September 29 2011

Four short links: 29 September 2011

  1. Princeton Open Access Report (PDF) -- academics will need written permission to assign copyright of a paper to a journal. Of course, the faculty already had exclusive rights in the scholarly articles they write; the main effect of this new policy is to prevent them from giving away all their rights when they publish in a journal. (via CC Huang)
  2. Good Faith Collaboration -- a book on Wikipedia's culture, from MIT Press. Distributed, appropriately, under a Creative Commons Non-Commercial Share-Alike license.
  3. The Local-Global Flip -- an EDGE conversation (or monologue) by Jaron Lanier that contains more thought-provocation per column-inch than anything else you'll read this week. [I]ncreasing efficiency by itself doesn't employ people. There is a difference between saving and making money when you're unemployed. Once you're already rich, saving money and making money is the same thing, but for people who are on the bottom or even in the middle classes, saving money doesn't help you if you don't have the money to save in the first place. and The beauty of money is it creates a system of people leaving each other alone by mutual agreement. It's the only invention that does that that I'm aware of. In a world of finite limits where you don't have an infinite West you can expand into, money is the thing that gives you a little bit of peace and quiet, where you can say, "It's my money, I'm spending it". and I'm astonished at how readily a great many people I know, young people, have accepted a reduced economic prospect and limited freedoms in any substantial sense, and basically traded them for being able to screw around online. There are just a lot of people who feel that being able to get their video or their tweet seen by somebody once in a while gets them enough ego gratification that it's okay with them to still be living with their parents in their 30s, and that's such a strange tradeoff. And if you project that forward, obviously it does become a problem. are things I'm still chewing on, many days after first reading.
  4. Trolled by Gerry Sussman (Bryan O'Sullivan) -- Bryan gave a tutorial on Haskell to a conference on leading-edge programming languages and distributed systems. At one point, Gerry had a pretty amusing epigram to offer. "Haskell is the best of the obsolete programming languages!" he pronounced, with a mischievous look. Now, I know when I’m being trolled, so I said nothing and waited a moment, whereupon he continued, "but don’t take it the wrong way—I think they’re all obsolete!"

August 16 2011

Data science is a pipeline between academic disciplines

We talk a lot about the ways in which data science affects various businesses, organizations, and professions, but how are we actually preparing future data scientists? What training, if any, do university students get in this area? The answer may be obvious if students focus on math, statistics or hard science majors, but what about other disciplines?

I recently spoke with Drew Conway (@drewconway) about data science and academia, particularly in regards to social sciences. Conway, a PhD candidate in political science at New York University, will expand on some of these topics during a session at next month's Strata Conference in New York.

Our interview follows.

How has the work of academia — particularly political science — been affected by technology, open data, and open source?

Drew ConwayDrew Conway: There are fundamentally two separate questions in here, so I will try to address both of them. First is the question of how academic research has changed as a result of these technologies. And for my part, I can only really speak for how they have affected social science research. The open data movement has impacted research most notably in compressing the amount of time a researcher goes from the moment of inception ("hmm, that would be interesting to look at!") to actually looking at data and searching for interesting patterns. This is especially true of the open data movement happening at the local, state and federal government levels.

Only a few years ago, the task of identifying, collecting, and normalizing these data would have taken months, if not years. This meant that a researcher could have spent all of that time and effort only to find out that their hypothesis was wrong and that — in fact — there was nothing to be found in a given dataset. The richness of data made available through open data allows for a much more rapid research cycle, and hopefully a greater breadth of topics being researched.

Open source has also had a tremendous impact on how academics do research. First, open source tools for performing statistical analysis, such as R and Python, have robust communities around them. Academics can develop and share code within their niche research area, and as a result the entire community benefits from their effort. Moreover, the philosophy of open source has started to enter into the framework of research. That is, academics are becoming much more open to the idea of sharing data and code at early stages of a research project. Also, many journals in the social sciences are now requiring that authors provide replication code and data.

The second piece of the question is how these technologies affect the dissemination of research. In this case blogs have becoming the de facto source for early access to new research, or scientific debate. In my own discipline, The Monkey Cage is most political scientists' first source for new research. What is fantastic about the Monkey Cage, and other academic blogs, is that they are not only ready by other academics. Journalists, policy makers, and engaged citizens can also interact with academics in this way — something that was not possible before these academic blogs became mainstream.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science -- from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code STN11RAD

Let's sidestep the history of the discipline and debates about what constitutes a hard or soft science. But as its name suggests, "political science" has long been interested in models, statistics, quantifiable data and so on. Has the discipline been affected by the rise of data science and big data?

Drew Conway: The impact of big data has been slow, but there are a few champions who are doing really interesting work. Political science, at its core, is most interested in understanding how people collectively make decisions, and as researchers we attempt to build models and collect data to that end. As such, the massive data on social interactions being generated by social media services like Facebook and Twitter present unprecedented opportunities for research.

While some academics have been able to leverage this data for interesting work, there seems to be a clash between these services' terms of service and with the desire for scientists to collect data and generate reproducible findings from this data. I wrote about my own experience using Twitter data for research, but there are many others researchers from all disciplines that have run into similar problems.

With respect to how academics have been impacted by data science, I think the impact has mostly flowed in the other direction. One major component of data science is the ability to extract insight from data using tools from math, statistics and computer science. Most of this is informed by the work of academics, and not the other way around. That said, as more academic researchers become interested in examining large-scale datasets (on the order of Twitter or Facebook), many of the technical skills of data science will have to be acquired by academics.

How does data science change the work of the grad student — in terms of necessary skills but also in terms of access to information/informants?

Drew Conway: Unfortunately, having sophisticated technical skills, i.e., those of a data scientist, are still undervalued in academia. Being involved in open-source projects, or producing statistical software is not something that will help a graduate student land a high-profile academic job, or help a young faculty member get tenure. Publications are still the currency of success, and that — as I mentioned — clashes with the data-sharing policies of many large social media services.

Graduate students and faculty do themselves a disservice by not actively staying technically relevant. As so much more data gets pushed into the open, I believe basic data hacking skills — scraping, cleaning, and visualization — will be prerequisites to any academic research project. But, then again, I've always been a weird academic, double majoring in computer science and political science as an undergrad

How does the rise of data science and its spread beyond the realm of math and statistics change the world of technology, either from an academic or entrepreneurial perspective?

Drew Conway: From an entrepreneurial perspective I think it has dramatically changed the way new businesses think about building a team. Whether it is at Strata, or any of the other conferences in the same vein, you will see a glut of job openings or panels on how to "build a data team." At present, people who have the blend of skills I associate with data science — hacking, math/stats, and substantive expertise — are a rare commodity. This dearth of talent, however, will be short-lived.

I see in my undergrads many more students who grew up with data and computing as ubiquitous parts of their lives. They're interested in pursuing routes of study that provide them with data science skills, both in terms of technical competence, and also in creative outlets such as interactive design.

How does "human subjects compliance" work when you're talking about "data" versus "people" — that's an odd distinction, of course, and an inaccurate one at that. But I'm curious if some of the rules and regulations that govern research on humans account for research on humans' data.

Drew Conway: I think it is an excellent question, and one that academe is still struggling to deal with. In some sense, mining social data that is freely available on the Internet provides researchers a way to sidestep traditional IRB regulation. I don't think there's anything ethically questionable about recording observations that are freely made public. That's akin to observing the meanderings of people in a park.

Where things get interesting is when researchers use crowd sourcing technology, like Mechanical Turk, as a survey mechanism. Here, this is much more of a gray area. I suppose, technically, the Amazon terms of services covers researchers, but ethically this is something that would seem to me to fall within the scope of an IRB. Unfortunately, the likely outcome is that institutions won't attempt to understand the difference until some problem arises.

This interview was edited and condensed.

Related:

July 21 2011

Strata Week: When does data access become data theft?

Here are a few of the data stories that caught my eye this week.

Aaron Swartz and the politics of violating a database's TOS

JSTORAaron Swartz, best known as an early Reddit-er and the founder of the progressive non-profit Demand Progress, was charged on Tuesday of multiple felony counts for the illegal download of some 4 million academic journal articles from the MIT Library.

The indictment against Swartz (a full copy is here) details the steps he took to procure a laptop and register it on the MIT network, all in the name of securing access to JSTOR. JSTOR is an online database of academic journals, providing full text search and access to library patrons at both academic and public universities.

Swartz accessed the JSTOR database via MIT and proceeded to devise a mechanism to download a massive number of documents. It isn't clear what his intentions were for these — Swartz has been involved previously with open data efforts. Was he planning to liberate the JSTOR database? Or, as others have suggested, was he in the middle of an academic project that required a massive dataset?

The government has made it clear this is "stealing." JSTOR, the library, and the university are less willing to comment or condemn.

Kevin Webb asks an important question in a post reprinted by Reuters. What's the difference between what Swartz did and what Google does?

What's missing from the news articles about Swartz's arrest is a realization that the methods of collection and analysis he's used are exactly what makes companies like Google valuable to its shareholders and its users. The difference is that Google can throw the weight of its name behind its scrapers ...

Although Swartz did allegedly download data from JSTOR in such quantities that it violates a Terms of Service agreement, many questions remain: Why does this constitute stealing? How much data does one need to take to be at risk of accusations of theft and fraud? For data scientists, not just for activists, these are very real questions.

Update: GigaOm's Janko Roettgers reports that a torrent with 18,592 scientific publications — all of them apparently from JSTOR — was uploaded to The Pirate Bay.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science -- from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 20% on registration with the code STN11RAD

Microsoft releases its big data toolkit for scientists

Although we're all creating massive amounts of data, for scholars and scientists that data creation and analysis can quickly run afoul of the limitations of university computing centers. To that end, Microsoft Research this week unveiled Daytona, a tool designed to help scientists with big data computation.

Created by the eXtreme Computing Group, the tool lets scholars and scientists use Microsoft's Azure platform to work with large datasets. According to Roger Barga, an architect in the eXtreme Computing Group:

Daytona has a very simple, easy-to-use programming interface for developers to write machine-learning and data-analytics algorithms. They don't have to know too much about distributed computing or how they're going to spread the computation out, and they don't need to know the specifics of Windows Azure.

Daytona is meant to be an alternative to Hadoop or MapReduce (although it does utilize the latter), but with an emphasis on ease-of-use. Daytona comes with code samples and programming guides to get people up and running.

The eXtreme Computing Group has also built Excel Datascope, which as the name suggests is a tool that offers data analytics from Excel.

While making it easier for academics to perform big data analysis is an honorable goal, I can't help but ask (as a recovering academic myself) when will academy realize that the skills needed to work with these datasets warrant formal attention? Scholars need to be trained to manage this information. That way, it isn't just a matter of making it "easier," but making these tools better.

The state of open data in Canada

Code for America program director David Eaves has taken a look at the state of open data licenses in Canada in order to assess what works, what doesn't work, and where to go from here.

Eaves examines how the Canadian government (provincial and otherwise) has made strides toward opening up data to its citizens, developers, and others. But as Eaves makes clear in his post, it isn't as simple as just "opening" data as a gesture, but rather making sure data is readily accessible and usable.

"Licenses matter because they determine how you are able to use government data — a public asset," he writes. "As I outlined in the three laws of open data, data is only open if it can be found, be played with and be shared." Eaves contends that licensing is particularly important, as this can limit what sorts of restrictions are put on the sharing of data and, in turn, on the sorts of apps one can build using it.

What do we want then? Eaves lists these attributes:

  • Open: there should maximum freedom for reuse
  • Secure: it offers governments appropriate protections for privacy and security
  • Simplicity: to keep down legal costs, and make it easier for everyone to understand
  • Standardized: so my work is accessible across jurisdictions
  • Stable: so I know that the government won't change the rules on me

When it comes to the "where do we go from here" aspect, Eaves isn't optimistic. He notes that while some municipalities may have opened their datasets, the federal government — in Canada and elsewhere — seems unprepared to fully engage with the developer and open data communities.

Got data news?

Feel free to email me.



Related:


March 09 2011

One foot in college, one foot in business

screenshot.png In a recent interview, Joe Hellerstein, a professor in the UC Berkeley computer science department, talked about the disconnect between open source innovation and development. The problem, he said, doesn't lie with funding, but with engineering and professional development:

As I was coming up as a student, really interesting open source was coming out of universities. I'm thinking of things like the Ingres and Postgres database projects at Berkeley and the Mach operating system at Carnegie Mellon. These are things that today are parts of commercial products, but they began as blue-sky research. What has changed now is there's more professionally done open source. It's professional, but it's further disconnected from research.

A lot of the open source that's very important is really "me-too" software — so Linux was a clone of Unix, and Hadoop is a clone of Google's MapReduce. There's a bit of a disconnect between the innovation side, which the universities are good at, and the professionalism of open source that we expect today, which the companies are good at. The question is, can we put those back together through some sort of industrial-academic partnership? I'm hopeful that can be done, but we need to change our way of business.

Hellerstein pointed to the MADlib project being conducted between his group at Berkeley and the project sponsor EMC Greenplum as an example of a new partnership model that could close the gap between innovation and development.

Our sponsor would have been happy to donate money to my research funds, but I said, "You know, what I really need is engineering time."

The thing I cannot do on campus is run a professional engineering shop. There are no career incentives for people to be programmers at the university. But a company has processes and expertise, and they can hire really good people who have a career path in the company. Can we find an arrangement where those people are working on open source code in collaboration with the people at the university?

It's a different way of doing research funding. The company's contributions are not financial. The contributions are in engineering sweat. It's an interesting experiment, and it's going well so far.

In the interview Hellerstein also discusses MAD data analysis and where we are in the industrial revolution of data. The full interview is available in the following video:



Related:




April 27 2009

Play fullscreen
The Global Financial Crisis: the implications for city and regional planning
Play fullscreen
LBJ Page Turners Series: Nadine Eckhardt

April 24 2009

Play fullscreen
Imagining India - The Idea of a Renewed Nation
Play fullscreen
Nuclear Weapons: Getting Past Russia and the NPT and on to Zero

April 23 2009

Play fullscreen
Challenges in Civil Liberties - on Uchannel Permalink


Topical Background
In reaction to the 9/11 attacks, the Bush administration enacted a series of strong counter-terrorism measures. These policies included aggressive detention procedures, extraordinary rendition of prisoners to various countries, harsh interrogation tactics, and a sweeping domestic and international surveillance policy. While these anti-terrorist policies were all pursued in the name of protecting the country, some contended that they represented a serious threat to civil liberties. The American Civil Liberties Union (ACLU), the nation's oldest and largest civil liberties organization, vigorously opposed these policies from their inception, fighting them in courtrooms and legislative bodies, with varying levels of success.

Both supporters and opponents of former President Bush are closely watching the Obama Administration to see what policies he will pursue in the ongoing war on terrorism. President Obama has already made significant changes, such as his executive order closing the U.S. military prison at Guantanamo Bay within a year and his order prohibiting the C.I.A. from using coercive interrogation methods. Will Obama's policies in the war on terrorism be consistent with civil liberties? Can the new administration adequately protect the country from future terrorist attacks without infringing upon traditional civil liberties?

April 22 2009

April 21 2009

Play fullscreen
UN Secretary-General Keynote: "The Imperative for a New Multilateralism"

April 20 2009

Play fullscreen
Defending Human Rights in Times of Terror
Play fullscreen
U.S. v. Hamdan: Military Commissions Sixty-Six Years after Quirin

April 17 2009

Play fullscreen
The Public Domain: enclosing the commons of the mind

April 16 2009

Play fullscreen
The Tyranny of Oil: The World`s Most Powerful Industry, and What We Must Do to Stop It

Antonia Juhascz associate fellow with the Institute for Policy Studies, a fellow with Oil Change International, and a senior analyst for Foreign Policy In Focus

(Nov 20, 2008 at the University of Chicago. Courtesy of CHIASMOS)

The author of The Bush Agenda: Invading the World, One Economy at a Time (2006), Juhasz has also written extensively on various aspects of globalization. Her articles and commentary on politics and policy have appeared in New York Times, International Herald Tribune, Los Angeles Times, Miami Herald, Petroleum Review Magazine, In These Times, and Washington Post, among other sources.

From the World Beyond the Headlines Series.

© 2008, The University of Chicago

--------------------------------------
by @uchannel: permalink

for more informations go to Antonia Juhasz Website - she gave a lot of interviews, e.g. Democracy Now , The REAL News Network, etc.
Reposted bySigalon Sigalon

April 15 2009

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl