Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

May 22 2012

Data journalism research at Columbia aims to close data science skills gap

Successfully applying data science to the practice of journalism requires more than providing context and finding clarity in vasts amount of unstructured data: it will require media organizations to think differently about how they work and who they venerate. It will mean evolving towards a multidisciplinary approach to delivering stories, where reporters, videographers, news application developers, interactive designers, editors and community moderators collaborate on storytelling, instead of being segregated by departments or buildings.

The role models for this emerging practice of data journalism won't be found on broadcast television or on the lists of the top journalists over the past century. They're drawn from the increasing pool of people who are building new breeds of newsrooms and extending the practice of computational journalism. They see the reporting that provisions their journalism as data, a body of work that can itself can be collected, analyzed, shared and used to create longitudinal insights about the ways that society, industry or government are changing. (Or not, as the case may be.)

In a recent interview, Emily Bell (@EmilyBell), director of the Tow Center for Digital Journalism at the Columbia University School of Journalism, offered her perspective about what's needed to train the data journalists of the future and the changes that still need to occur in media organizations to maximize their potential. In this context, while the role of institutions and "journalism education are themselves evolving, they both will still fundamentally matter for "what's next," as practitioners adapt to changing newsonomics.

Our discussion took place in the context of a notable investment in the future of data journalism: a $2 million research grant to Columbia University from the Knight Foundation to research and distribute best practices for digital reportage, data visualizations and measuring impact. Bell explained more about what how the research effort will help newsrooms determine what's next on the Knight Foundation's blog:

The knowledge gap that exists between the cutting edge of data science, how information spreads, its effects on people who consume information and the average newsroom is wide. We want to encourage those with the skills in these fields and an interest and knowledge in journalism to produce research projects and ideas that will both help explain this world and also provide guidance for journalism in the tricky area of ‘what next’. It is an aim to produce work which is widely accessible and immediately relevant to both those producing journalism and also those learning the skills of journalism.

We are focusing on funding research projects which relate to the transparency of public information and its intersection with journalism, research into what might broadly be termed data journalism, and the third area of ‘impact’ or, more simply put, what works and what doesn’t.

Our interview, lightly edited for content and clarity, follows.

What did you do before you became director of the Tow Center for Digital Journalism?

I spent ten years where I was editor-in-chief of The Guardian website. During the last four of those, I was also overall director of digital content for all The Guardian properties. That included things like mobile applications, et cetera, but from the editorial side.

Over the course of that decade, you saw one or two things change online, in terms of what journalists could do, the tools available to them and the news consumption habits of people. You also saw the media industry change, in terms of the business models and institutions that support journalism as we think of it. What are the biggest challenges and opportunities for the future journalism?

For newspapers, there was an early warning system: that newspaper circulation has not really consistently risen since the early 1980s. We had a long trajectory of increased production and actually, an overall systemic decline which has been masked by a very, very healthy advertising market, which really went on an incredible bull run with a more static pictures, and just "widen the pipe," which I think fooled a lot of journalism outlets and publishers into thinking that that was the real disruption.

And, of course, it wasn’t.

The real disruption was the ability of anybody anywhere to upload multimedia content and share it with anybody else who was on a connected device. That was the thing that really hit hard, when you look at 2004 onwards.

What journalism has to do is reinvent its processes, its business models and its skillsets to function in a world where human capital does not scale well, in terms of sifting, presenting and explaining all of this information. That’s really the key to it.

The skills that journalists need to do that -- including identifying a story, knowing why something is important and putting it in context -- are incredibly important. But how you do that, which particular elements you now use to tell that story are changing.

Those now include the skills of understanding the platform that you’re operating on and the technologies which are shaping your audiences’ behaviors and the world of data.

By data, I don’t just mean large caches of numbers you might be given or might be released by institutions: I mean that the data thrown off by all of our activity, all the time, is simply transforming the speed and the scope of what can be explained and reported on and identified as stories at a really astonishing speed. If you don’t have the fundamental tools to understand why that change is important and you don’t have the tools to help you interpret and get those stories out to a wide public, then you’re going to struggle to be a sustainable journalist.

The challenge for sustainable journalism going forward is not so different from what exists in other industries: there's a skills gap. Data scientists and data journalists use almost the exact same tools. What are the tools and skills that are needed to make sense of all of this data that you talked about? What will you do to catalog and educate students about them?

It's interesting when you say that the skills of these clients are very similar, which is absolutely right. First of all, you have a basic level of numeracy needed - and maybe not just a basic level, but a more sophisticated understanding of statistical analysis. That’s not something which is routinely taught in journalism schools but that I think will increasingly have to be.

The second thing is having some coding skills or some computer science understanding to help with identifying the best, most efficient tools and the various ways that data is manipulated.

The third thing is that when you’re talking about 'data scientists,' it’s really a combination of those skills. Adding data doesn’t mean you don't have to have other journalism skills which do not change: understanding context, understanding what the story might be, and knowing how to derive that from the data that you’re given or the data that exists. If it’s straightforward, how do you collect it? How do you analyze it? How do you interpret them and present it?

It’s easy to say, but it’s difficult to do. It’s particularly difficult to reorient the skillsets of an industry which have very much resided around the idea of a written story and an ability with editing. Even in the places where I would say there’s sophisticated use of data in journalism, it’s still a minority sport.

I’ve talked to several heads of data in large news organizations and they’ve said, “We have this huge skills gap because we can find plenty of people who can do the math; we can find plenty of people who are data scientists; we can’t find enough people who have those skills but also have a passion or an interest in telling stories in a journalistic context and making those relatable.”

You need a mindset which is about putting this in the context of the story and spotting stories, as well having creative and interesting ideas about how you can actually collect this material for your own stories. It’s not a passive kind of processing function if you’re a data journalist: it’s an active speaking, inquiring and discovery process. I think that that’s something which is actually available to all journalists.

Think about just local information and how local reporters go out and speak to people every day on the beat, collect information, et cetera. At the moment, most get from those entities don’t structure the information in a way that will help them find patterns and build new stories in the future.

This is not just about an amazing graphic that the New York Times does with census data over the past 150 years. This is about almost every story. Almost every story has some component of reusability or a component where you can collect the data in a way that helps your reporting in the future.

To do that requires a level of knowledge about the tools that you’re using, like coding, Google Refine or Fusion Tables. There are lots of freely available tools out there that are making this easier. But, if you don’t have the mindset that approaches, understands and knows why this is going to help you and make you a better reporter, then it’s sometimes hard to motivate journalists to see why they might want to grab on.

The other thing to say, which is really important, is there is currently a lack of both jobs and role models for people to point to and say, “I want to be that person.”

I think the final thing I would say to the industry is we’re getting a lot of smart journalists now. We are one of the schools where all of our digital concentrations from students this year include a basic grounding in data journalism. Every single one of them. We have an advanced course taught by Susan McGregor in data visualization. But we’re producing people from the school now, who are being hired to do these jobs, and the people who are hiring them are saying, “Write your own job description because we know we want you to do something, we just don’t quite know what it is. Can you tell us?”

You can’t cookie-cutter these people out of schools and drop them into existing roles in news trends because those are still developing. What we’re seeing are some very smart reporters with data-centric mindsets and also the ability to do these stories -- but they want to be out reporting. They don’t want to be confined to a desk and a spreadsheet. Some editors usually find that very hard to understand, “Well, what does that job look like?”

I think that this is where working with the industry, we can start to figure some of these things out, produce some experimental work or stories, and do some of the thinking in the classroom that helps people figure out what this whole new world is going to look like.

What do journalism schools need to do to close this 'skills gap?' How do they need to respond to changing business models? What combination of education, training and hands-on experience must they provide?

One of the first things they need to do is identify the problem clearly and be honest about it. I like to think that we’ve done that at Columbia, although I’m not a data journalist. I don’t have a background in it. I’m a writer. I am, if you like, completely the old school.

But one of the things I did do at The Guardian was helped people who early on said to me, “Some of this transformation means that we have to think about data as being a core part of what we do.” Because of the political context and the position I was in, I was able to recognize that that was an important thing that they were saying and we could push through changes and adoption in those areas of the newsroom.

That’s how The Guardian became interested in data. It’s the same in journalism school. One of the early things that we talked about [at Columbia] was how we needed to shift some of what the school did on its axis and acknowledge that this was going to be key part of what we do in the future. Once we acknowledged that that is something we had to work towards, [we hired] Susan McGregor from the Wall Street Journal’s Interactive Team. She’s an expert in data journalism and has an MA in technology in education.

If you say to me, “Well, what’s the ground vision here?” I would say the same thing I would say to anybody: over time, and hopefully not too long a course of time, we want to attract a type of student that is interested and capable in this approach. That means getting out and motivating and talking to people. It means producing attractive examples which high school children and undergraduate programs think about [in their studies]. It means talking to the CS [computer science] programs -- and, in fact, more about talking to those programs and math majors than you would be talking to the liberal arts professors or the historians or the lawyers or the people who have traditionally been involved.

I think that has an effect: it starts to show people who are oriented towards storytelling but have capabilities which are align more with data science skill sets that there’s a real task for them. We can’t message that early enough as an industry. We can’t message it early enough as an educator to get people into those tracks. We have to really make sure that the teaching is high quality and that we’re not just carried away with the idea of the new thing, we need to think pretty deeply about how we get those skills.

What sort of basic sort of statistical teaching do you need? What are the skills you need for data visualization? How do you need to introduce design as well as computer science skills into the classroom, in a way which makes sense for stories? How do you tier that understanding?

You're always going to produce superstars. Hopefully, we’ll be producing superstars in this arena soon as well.

We need to take the mission seriously. Then we need to build resources around it. And that’s difficult for educational organizations because it takes time to introduce new courses. It takes time to signal that this is something you think is important.

I think we’ve done a reasonable job of that so far at Columbia, but we’ve got a lot further to go. It's important that institutions like Columbia do take the lead and demonstrate that we think this is something that has to be a core curriculum component.

That’s hard, because journalism schools are known for producing writers. They’re known for different types of narratives. They are not necessarily lauded for producing math or computer science majors. That has to change.

Related:

November 07 2011

Three game characteristics that can be applied to education

In a related post, I talked about what the notion of gamification as applied to education might mean on three levels. In particular, I described the lessons that might be learned by the field of education from the different types of gaming encountered in World of Warcraft and Minecraft — two very different online multiplayer games. In this post, I look at the technology roadmap that can support these three levels of application in real schools.

Level 1: Leveling up and questing

The first level is one where leveling, questing, and leaderboards can help motivate students to engage more with their schoolwork. Like a gamer who chooses his or her own path and pace to "level up," a student will choose his or her own path and pace to learn a standard curriculum and be able to prove advancement to that next level through performance on tests.

The technology to be successful at this level exists today — the obstacle is cost, and the payoff is more students demonstrating success on state tests, closing the achievement gap. To work, this model calls for a mobile device with plenty of bandwidth for every student and software that lets the student level up at his or her own pace. The software can be an online course or something more sophisticated and engaging. The idea is that with software support to allow personalization for each student, teachers will have more time to spend with individual students and small groups to help them succeed with whatever unique challenges they are working on that day.

Despite the numerous challenges to achieve this level in reality, this is actually the easiest of the three levels.

First, this level is easy because objective standards of "better" exist — higher scores on standardized state tests. A school can try various online classes or drills, or adaptive software with its particular students, and standardized test scores will provide the data regarding what worked best for them.

Second, this level is easy because the technology infrastructure degrades gracefully — it still works even if students don't have a device of their own. The first gains will come from just allowing students to work at their own pace on shared school computers. Since real schools are likely to have an uneven and years-long transition from the shared computer labs that most schools have today to ubiquitous computing environments, schools can make every penny count by creating an IT roadmap that supports self-paced leveling. In short, this will involve transitioning to cloud-based services as quickly as possible and increasing computer-to-student ratios and bandwidth as budgets allow.

Third, this level is easy because there are already processes in place for evolving the definition of "better." For more than 40 states, current standards are being replaced by the Common Core Standards developed through an initiative by the Council of Chief State School Officers and the National Governors' Association. The Common Core Assessments that are being developed to support these standards not only raise the bar for existing basic skills, but create assessments for higher-order thinking skills. By following their IT roadmaps, schools will be able to swap out current online tests for more sophisticated online tests over time, with no new technology architecture needed to participate in that continual improvement. If they have chosen cloud-based software that is easy to opt into and out of, they can experiment with new applications at will to see which ones best help their students perform on these increasingly sophisticated tests.

Level 2: Group collaboration

The second level is more like the World of Warcraft gameplay called "raiding" — group collaboration to achieve a shared goal. In Warcraft, that could involve downing a boss while in school it could be a collaboration on a book about local ecology. To the degree that work (or play) happens digitally, leaders (or teachers) can get rich insight into everyone's contributions and participation.

This level is hard. First of all, there is no agreement on what these collaboration and communication skills should look like. Second, there are, consequently, no assessments for these skills available. Third, there is no software developed to interpret collaboration based on the digital tracks left by students working together online. Fourth, there are no standards for how to balance student privacy with such data collection.

For all these reasons, the full burden falls on the teacher to create shared goals for students; create collaboration environments; and observe, analyze, and measure their skills. Fortunately, the same technology architecture that supported the comparatively easy first level of personalizing learning (above), can support the teacher in these tasks. By using project management tools and shared authoring tools, such as Google Docs and wikis that generate histories as students edit their shared work, a teacher can get pretty good first-order information on the timeline and magnitude and quality of each student's digital contributions. That's a big improvement over trying to be everywhere at once to observe each group's work.

Also, the same assessment groups that are working toward improved digital assessments for basic skills and higher-order thinking are also targeting 21st-century skills. If structured carefully, these digital assessments will also flow seamlessly into an IT roadmap for schools that is moving toward a ubiquitous computing environment.

Level 3: Play

The third level is less like traditional gamification and more about play. Rather than using Warcraft dynamics, it focuses on open-ended exploration — more like the game Minecraft. It already shows up in education through inquiry and the arts, and is more focused on developing questions than finding answers.

This is the expert level. This level confounds traditional approaches of measuring success — how do you measure the value of a question, or a journey, or artistic expression? If there are no outcomes that we know how to measure, then is the activity even a valid one for schools?

Still, teachers, critics and experts evaluate art all the time. Perhaps the artistic tradition of portfolios will serve the role of capturing open-ended student work that isn't readily reduced to performance on a test. The student work itself, including student reflections on the journey of creating that work, may in its entirety be interpreted and understood by an audience of teachers, college admissions arbiters, employers, friends, family, experts and critics.

I've written previously about the notion of a student digital backpack wherein students and families own their data and which can include everything from test scores to rich digital portfolios. Although the need for standard privacy and data-sharing policies is as yet unmet, and the structure of such backpacks may not yet be fully conceived, the good news, once again, is that the technology degrades gracefully. An IT roadmap that includes cloud-based, student-controlled portfolios today will support a migration to systems that provide privacy management and evolving mechanisms for demonstrating achievements, performance, and student work in the future.

It is a fairly small technical shift, though a potentially significant conceptual leap, for schools to change from the current kinds of planning that tends to include lots of locally maintained servers and fixed computer labs to planning for mobile devices and cloud computing provided as a service to schools. Regardless of the hardware, software, and bandwidth a school currently has available, planning for this emergent infrastructure will provide critically needed flexibility over the next decade.

There are many examples that highlight this need, but the lens of gaming and gamification make a point that can be overlooked when discussing the use of technology in education: we learn best by doing, we learn best in authentic situations, we learn best socially, and we learn best playfully. These elements can be seen in the best classrooms, regardless of whether technology is involved — from gold stars for recognizing achievements, to students collaborating on a meaningful community project, to young people engaging in open-ended inquiry. The risk is that as we move to more digitally supported and mediated teaching and learning, these best traditions and practices might be lost. Thoughtful roadmapping of technology that supports both Warcraft-like and Minecraft-like student work can help keep these practices central.

Related:

November 04 2011

The maker movement's potential for education, jobs and innovation is growing

Dale DoughertyDale Dougherty (@dalepd), one of the co-founders of O'Reilly Media, was honored at the White House yesterday as a "Champion of Change." This White House initiative profiles Americans who are helping their fellow citizens "meet the challenges of the 21st century." The recognition came as part of what the White House is calling "Make it in America," which convenes people from around the country to discuss American manufacturing and jobs.

"This is so completely deserved," wrote Tim O'Reilly on Google+. "When you see kids at Maker Faire suddenly turned on to science and math because they want to make things, when you see them dragging their parents around with eyes shining, you realize just how dull our education system has made some of the most exciting and interesting stuff in the world. Dale has taken a huge step towards changing that. I'm honored to have worked with Dale now for more than 25 years, making big ideas happen. He's a genius."

The event was streamed online at WhiteHouse.gov/live. Video of the event is up on YouTube, where you can watch Dougherty's comments, beginning at 58:18. Most of the other speakers focused on energy, transportation or other economic issues. Dougherty went in a different direction. "You're sort of the anti-Washington message, in that you guys just hang out and do great stuff," said U.S. CTO Aneesh Chopra when introducing Dougherty.

"I started this magazine called 'MAKE'," Dougherty said. "It's sort of a 21st-century 'Popular Mechanics,' and it really meant to describe how to make things for fun and play. [We] started an event called MakerFaire, just bringing people together to see what they make in their basements, their garages, and what they're doing with technology. It really kind of came from the technology side into what you might call manufacturing, but people are building robots, people are building new forms of lighting, people are building … new forms of things that are just in their heads," he said.

"You mentioned tinkering," said Dougherty, responding to an earlier comment by Chopra. "Tinkering was once a solid middle-class skill. It was how you made your life better. You got a better home, you fixed your car, you did a lot of things. We've kind of lost some of that, and tinkering is on the fringe instead of in the middle today.

The software community is influencing manufacturing today, said Dougherty, including new ways of thinking about it. "It's a culture. I think when you look at 'MAKE' and MakerFaire, this is a new culture, and it is a way to kind of redefine what this means." It's about seeing manufacturing as a "creative enterprise," not something "where you're told to do something but where you're invited to solve a problem or figure things out."

This emergent culture is one in which makers create because of passion and personal interest. "People are building robots because they want to," Dougherty said. "It's an expression of who they are and what they love to do. When you get these people together, they really turn each other on, and they turn on other people."

I caught up with Dougherty and talked with him about the White House event and what's happening more broadly in the maker space. Our interview follows.

What does this recognition mean to you?

Dale Dougherty: I see it as a recognition for the maker movement and the can-do spirit of makers. I'm proud of what makers are doing, so I appreciated the opportunity to tell this story to business and government leaders. Makers are the champions of change.

How fast is the maker community growing?

Dale Dougherty: It's hard to put a number on the spread of an idea. The key thing is that it continues to spread and more people are getting connected. I know that the maker audience is getting younger every year, which is a good sign. That means we've involved more families and young people.

What's particularly exciting to you in the maker movement right now?

Dale Dougherty: Kits. We just wrapped up a special issue of "MAKE" on kits. Kits are a very interesting alternative to packaged consumer products. They provide parts and instructions for you to make something yourself. There's such a broad range of kits available that I wanted to bring them together in one issue. We have a great lead article by MIT researcher and economist, Michael Schrage, on how kits drive innovation. I didn't know, for example, that the first steam engine was sold as a kit. So were the first personal computers. Today we're looking at 3-D printers such as the Makerbot. We're also looking at the RallyFighter, a kit car from Local Motors, which you can build in their new microfactory in Arizona. Also, Jose Gomez-Marquez of MIT writes about DIY medical devices and how they can be hacked by medical practitioners in third-world countries to produce custom solutions.

What does making mean for education?

Dale Dougherty: Making is learning. Remember John Dewey's phrase "learn by doing." It's a hundred-year-old educational philosophy based on experiential learning that seems forgotten, if not forbidden, today. I see a huge opportunity to change the nature of our educational system.

How is the maker movement currently influencing government?

Dale Dougherty: The DIY mindset seems essential for a democratic society, especially one that is undergoing constant change. Think of Ralph Waldo Emerson's famous essay, "Self-Reliance." Taking responsibility for yourself and your community is critical. You can't have a democracy without participation. Everything we can do for ourselves we should do and not wait or expect others to do it for us. If you want things to change, step up and make it happen.

The theme of the Washington meeting was "Make It in America." America is the leading manufacturing economy, but that lead is shrinking. As one speaker said, we have to refute the idea that manufacturing is "dirty, dangerous and disappearing."

Do we want to remain a country that makes things? There are obvious reasons many would like that answer to be 'yes,' but the biggest reason is that manufacturing has historically been a source of middle class jobs.

Some folks asked how to influence people so that they value manufacturing in American and how to get young kids interested in careers in manufacturing. One answer I have is that you have to get more people participating, to think of manufacturing as something that we all do, not just a few. We want to get people to see themselves as makers. This is the broad democratic invitation of the maker movement.

Flipping this a bit, how should the maker movement influence government?

Dale Dougherty: I see four things that the maker movement can bring:

  1. Openness — Once you get started doing something, you find others doing similar things. This creates opportunities for sharing and learning together. Collaboration just seems baked into the maker movement. Let's work together.
  2. Willingness to take risks — Let's not avoid risks. Let's not fear failure. Let's move ahead and learn from what experiences we have. The most important thing is iterating, making things better, learning new ways of doing things.
  3. Creativity — What excites many people is the opportunity to do creative work. If we can't define work as creative, maybe it won't get done.
  4. Personal — Technology has become personal. It's something we can use and shape to our own goals. Making is personal; what you make is an expression of who you are. It means something and that meaning can be shared in public.

What lies ahead in the space? DIY solar, bioreactors, hacking cars?

Dale Dougherty: That's what we'd all like to know. I don't spend too much time thinking about the future. There's so much going on right now.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl