Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 26 2014

Sponsored post
Reposted byLegendaryy Legendaryy

January 17 2014

Zambian Police Go After ‘Watchdog’ for Publishing Draft Constitution

Lusaka skyline. Photo by Mike Lee via Flickr (CC BY-NC-ND 2.0)

Lusaka at dusk. Photo by Mike Lee via Flickr (CC BY-NC-ND 2.0)

Zambian police forces say they will employ “international legal provisions” to take into custody the operators of citizen news websites that authorities claim are threatening the security of the state.

The terse statement was issued a few hours after independent news site the Zambian Watchdog published a draft constitution that the government has written but neglected to release to the public. This violates the Terms of Reference of the Constitution Technical Committee which was appointed shortly after President Michael Sata’s Patriotic Front (PF) won the 2011 elections.

The statement issued by the police public relations unit and reported by Zambia Reports reads:

Unfortunately, some unscrupulous people have taken advantage of the cyberspace to commit crimes on the internet through defamatory comments and remarks posted on websites especially through the electronic media in the name of press freedom which end up infringing a number of state security provisions.

As such, the Police shall employ local, regional and international legal provisions to pursue the authors and publishers of such criminal, libelous, defamatory, treasonous and seditious statements and bring them to book.

Echoing recent words of Minister of Information and Broadcasting Mwansa Kapeya, who spoke of the government unmasking the identities of people behind certain citizen news websites, the police statement added:

So far, other investigations into the identities of the perpetrators of such crimes are underway and we shall expose all the people involved in these malicious and borderline treacherous activities hiding behind the anonymity of the internet.

Kapeya, a former broadcaster himself, was quoted saying:

We are concerned about some of the news that is being published by online publication most of it amounts to abuse of the social media. A lot of things are said about government officials and the President without [them being] given a chance to respond.

Another minister, Yamfwa Mukanga, in charge of communication, recently said the government was working on a law to regulate online media and hold “them” (websites and services) accountable for their actions:

We have to find a way of controlling them because they are tarnishing the image of our country. Of late, we have seen a lot of things published by online media that are [e]very negative because they publish anything.

The Zambian Watchdog reported that the government was secretly working on a law that would criminalize the act of reading the Zambian Watchdog and other similar sites. Quoting a government source, the Watchdog reported:

The Watchdog is just too advanced for the PF and because of the huge costs involved in blocking it, government now wants to pass a law in the next parliament to criminalise whoever accesses or contributes to the site because by then all data of the sim card will already have been captured. They want the attorney general to complain on behalf of the government and then later it will go to cabinet.

Commenting under the story, Watchdog reader Czar said [Watchdog comments do not have permalinks]:

That “law” is meant to scare semi illiterates. Will Sata and his gang manage to monitor every device that is used to browse the Internet. Don’t they know that you can browse anonymously using a proxy? If China has failed to do this, how will Sata and his gang succeed. Don’t they have better things to do?

Observers widely suspect that the Zambian government has been trying to shut down critical news websites such as the Zambian Watchdog and Zambia Reports for over a year. This isn't the first time government officials have spoken dismissively of the Watchdog — in July, Vice President Guy Scott said he would “celebrate” if the Watchdog were shut down. In separate statements, the government has also threatened to close down social media sites such as Facebook and Twitter.

January 13 2014

Hong Kong 2013: A Burgeoning New Media Sector and a Backward Government

Edward Snowden supporters rally in Hong Kong. Photo by Voice of America. Released to public domain.

Edward Snowden supporters rally in Hong Kong. Photo by Voice of America. Released to public domain.

Written by Michelle Fong and translated by Sharon Loh, the original version of this article was published on in Chinese. 

Many new media initiatives, both commercial and citizen, have blossomed in Hong Kong over the past two years. These newly founded online media outlets have strong potential to transform not only the professional media sector, but also political processes in Hong Kong, as grassroots voices gain more attention both from the public and from political leaders. Below is an incomplete list:

Burgeoning New Media Initiatives

Hong Kong Dash – a collective blog operated by student activists, established after the anti-national education campaign in Hong Kong in 2012

The House News – a commercial news portal, following the Huffington news model, curating news and offering commentaries to readers

Pentoy – the online version of local newspaper, Mingpao, commentary page

Urban Diarist – an online magazine to record oral history in Hong Kong, sponsored by an architecture firm as a corporate social responsibility project

Post 852 – a newly launched “breaking views” platform formed by a group of media workers who collectively resigned from a local newspaper, Hong Kong Economic Journal

Bastille Post – an online news portal partially funded by media corporations, Singtao News Corporation Limited. The Group's founder and chairman, Charles Ho Tsu Kwok, is also a member of the Standing Committee of the Chinese People's Political Consultative Conference.

Hong Kong SOW – a social enterprise with an online platform that showcases the practice of “solutions” journalism. The social enterprise was founded by Vincent Wong, director of Strategic Planning of HK Commercial Broadcasting.

Some groups are also making use of Facebook pages to distribute topical news:

Tai Kung Pao: a distributor of labor news.

United Social Press: a page run by social activists, reporting and distributing news related to local social movements.

Online news outlets sidelined by government

With the new media sector is clearly increasing in strength and numbers, the Hong Kong government has been unable to keep up with the changing landscape. Many independent media projects have faced limitations on their work, particularly when seeking to cover government events — obtaining press passes has been a constant challenge.

Last year, citizen news portal's contributing reporters were kicked out of several press events by government civil servants. These included the second public forum on population policy and the 2013 summit on district administration. In another incident, Home Affairs Department staff barred House News reporters from entering a public consultation where HK mayor Leung Chun-ying was present. The staff claimed that the venue had limited space and was only open to the mainstream media. The Information Service Department, an authority responsible for handling government press conferences and news releases, has routinely refused to send press invitations to online news outlets as they are not recognized as proper media institutions.

In response to this out-dated approach, Hong Kong In-Media, an independent and citizen media advocacy group affiliated with, issued several statements demanding that the Information Service Department review its policies with an eye towards the changing media landscape, and to place particular attention on the definitions of the terms “media” and “news organization”. The agency has thus far refused to make any changes to its terms.

Technological innovation has resulted in the introduction of new media forms, from newspaper to radio and TV to the Internet — now an essential part of people's everyday lives. If we were to define the notion of “mainstream media” based on audience, many online news outlets would have out-numbered print media such as the pro-Beijing newspapers Wen Wei Po and Tai Kung Pao. It is backward and ridiculous for the government to limit its definition of “media” merely to printed media.

Malicious hacking a persistent threat

Although government restrictions are a substantial barrier for these new groups, online media's biggest enemy is hackers. Last year, a number of online news platforms weathered malicious hacker attacks. suffered Distributed Denial of Service (DDoS) attacks in May 2013, with a large number of HTTP requests coming from China. A few months later, in September, The House News became the next DDoS attack victim. Amnesty International Hong Kong‘s website was hacked around the same time. The hackers replaced some images on the sites with pornographic photos. SocREC, a social movement documentary video team had its Youtube account stolen in October. Hackers deleted over one thousand videos published under their account.

Internet freedom and privacy in HK and around the world

Government plans to pass the controversial Copyright (Amendment) Bill failed in 2012. To address public concern over the potential criminalization of parody, the government put forward a public consultation on the exemption of legal liability for parody in the Copyright (Amendment) Bill in October 2013. So far, major copyright holders and concerned citizens are divided in their opinions on the issue. But civil society has managed to put together a counter proposal calling for the exemption of legal liability on all non-profit user generated content.

Last but not least, the most significant event of 2013 concerning Internet freedom was the series of documents leaked by Edward Snowden that revealed the massive online surveillance practices of the US National Security Agency. As Hong Kong was the first stop in Snowden's escape route, Hong Kong In-Media quickly assumed a coordinator role in the organization of local support including producing a public statement and organizing a rally to condemn US spying activities.

Building public awareness about online privacy
Last August, the Journalism and Media Studies Centre of The Hong Kong University and Google Inc. worked together to launch the Hong Kong Transparency Report. The report showed that between 2010 to 2013, various government departments had made more than ten thousand requests for users’ personal data and more than seven thousand content deletion requests to local Internet service providers (ISPs) without a court order. A majority of the requests, 86 percent, came from the Hong Kong Police.

The Chief Executive's political reform package, slated to include universal suffrage in Hong Kong beginning in 2017, will be announced in 2014. As civil society prepares to exercise mass civil actions and independent press coverage to promote a fair candidate nomination process, conventional mainstream media are facing substantial political pressure to censor and tailor their content. In the coming years, we believe Internet-based independent and citizen media will play a crucial role in the democratization process.

January 08 2014

WikiLeaks Supporters Shocked by Visit With Syria's Assad

Wikileaks Party

Wikileaks Party
Photo: Courtesy Takver (Flickr CC BY-SA 2.0)

Many WikiLeaks supporters were caught unawares when members of the Wikileaks Party met with Syrian President Bashar al-Assad in late December.

The small Australian delegation to Syria included party chairman John Shipton, father of party founder Julian Assange, along with representatives from the Sydney-based lobby group, Hands off Syria. Journalist Chris Ray, who was in the room for the 45-minute meeting, reported that the two groups “reject foreign military support for Syrian rebels and advocate a political solution to the crisis.”

The WikiLeaks platform was quick to distance itself from the initiative on Twitter.

Major Australian political parties have condemned the meeting. In a somewhat curious response, Shipton threatened to sue Australian Prime Minister Tony Abbott and Foreign Minister Julie Bishop for defamation, over critiques they made to national press concerning the Syria visit. The Australian Broadcasting Corporation reported Bishop as saying,

It's an extraordinarily reckless thing for an organisation registered as a political party in Australia to try and insert itself in the appalling conflict in Syria for their own political ends.

When news of the meeting first hit the Internet, it became clear that many supporters of the WikiLeaks transparency platform knew little about the party to begin with. The WikiLeaks party, although institutionally separate from the platform, was created in 2013 to support Julian Assange's candidacy for the Australian Senate. In the September Federal elections, Assange led a group of New South Wales Senate candidates, with a number of Wikileaks Party members standing in other states. Assange and his party endured a dismal electoral failure, gaining less than 1.0% of the Senate vote.

During the campaign, with its figurehead still stuck in the Ecuadorian embassy in London, the party suffered from broad cleavages among members. One candidate and other party members resigned over allocation of voting preferences to right-wing parties. Australian technology website Delimiter commented in August:

Is the party purely a vehicle for WikiLeaks founder Julian Assange to get elected to the Federal Senate, and thus earn himself a ticket out of the Ecuadorian Embassy in London? Or is it a legitimate new political movement in Australia, which will achieve legitimacy beyond Assange personally?

Perhaps the journey to Syria was an attempt to broaden the party’s political profile. According to the latest Wikipedia entry for the Wikileaks Party:

Shipton subsequently stated that the meeting with al-Assad was “just a matter of good manners”, and that the delegation had also met with members of the Syrian opposition.

Despite John Shipton and Wikipedia indicating that the delegation also met with the Syrian opposition, details have not become available yet. Accompanying journalist Chris Ray did not mention the meetings in his post. Responding to WikiLeaks initial tweet, Wikileaks Party National Council Kellie Tranter tweeted that she too had no prior knowledge of the meeting.

This brought more questions about the party’s future:

Well-known commentator on the Middle East, Antony Loewenstein, showed his disappointment in the party, which he has given his support:

The tweet and accompanying link brought several contrary views. Loewenstein further explained his concerns on his blog:

As a Wikileaks supporter since 2006, right from the beginning (and I remain a public backer of the organisation), it’s tragic to see the Wikileaks Party in Australia, after a disastrous 2013 election campaign, descend into political grandstanding.

The Wikileaks Support Forum has been a centre of debate. Journalist Jess Hill was especially active in taking the party to task. The conversation became heated:

This tweet should act as a warning to all in the twitterverse:

Doubtless, Shipton and other delegation members will face many questions when they return to Australia.

December 10 2013

The public front of the free software campaign: part I

At a recent meeting of the MIT Open Source Planning Tools Group, I had the pleasure of hosting Zak Rogoff — campaigns manager at the Free Software Foundation — for an open-ended discussion on the potential for free and open tools for urban planners, community development organizations, and citizen activists. The conversation ranged over broad terrain in an “exploratory mode,” perhaps uncovering more questions than answers, but we did succeed in identifying some of the more common software (and other) tools needed by planners, designers, developers, and advocates, and shared some thoughts on the current state of FOSS options and their relative levels of adoption.

Included were the usual suspects — LibreOffice for documents, spreadsheets, and presentations; QGIS and OpenStreetMap for mapping; and (my favorite) R for statistical analysis — but we began to explore other areas as well, trying to get a sense of what more advanced tools (and data) planners use for, say, regional economic forecasts, climate change modeling, or real-time transportation management. (Since the event took place in the Department of Urban Studies & Planning at MIT, we mostly centered on planning-related tasks, but we also touched on some tangential non-planning needs of public agencies, and the potential for FOSS solutions there: assessor’s databases, 911 systems, library catalogs, educational software, health care exchanges, and so on.)

Importantly, we agreed from the start that to deliver on the promise of free software, planners must also secure free and open data — and free, fair, and open standards: without access to data — the raw material of the act of planning — our tools become useless, full of empty promise.

Emerging from the discussion, moreover, was a realization of what seemed to be a natural fit between the philosophy of the free and open source software movement and the overall goals of government and nonprofit planning groups, most notably along the following lines:

  • The ideal (and requirement) of thrift: Despite what you might hear on the street, most government agencies do not exist to waste taxpayer money; in fact, even well-funded agencies generally do not have enough funds to meet all the demands we place on them, and budgets are typically stretched pretty thin. On the “community” side, we see similar budgetary constraints for planners and advocates working in NGOs and community-based organizations, where every dollar that goes into purchasing (or upgrading) proprietary software, subscribing to private datasets, and renewing licenses means one less dollar to spend on program activities on the ground. Added to this, ever since the Progressive Era, governments have been required by law to seek the lowest-cost option when spending the public’s money, and we have created an entire bureaucracy of regulations, procurement procedures, and oversight authorities to enforce these requirements. (Yes, yes, I know: the same people who complain about government waste often want to eliminate “red tape” like this…)  When FOSS options meet the specifications of government contracts, it’s hard to see why they wouldn’t be in fact required under these procurement standards; of course, they often fail to meet the one part of the procurement specification that names a particular program; in essence, such practices “rig” bids in favor of proprietary software.  (One future avenue worth exploring might be to argue for performance-based bid specifications in government procurement.)
  • The concomitant goal of empowerment: Beyond simply saving money, planning and development organizations often want to actually do something; they exist to protect what we have (breathable air and clean drinking water, historic and cultural resources, property values), fix what is broken (vacant lots and buildings, outmoded and failing infrastructure, unsafe neighborhoods), and develop what we need (affordable housing, healthy food networks, good jobs, effective public services). Importantly, as part of the process, planners generally seek to empower the communities they are working in (at least since the 1970s); to extend-by-paraphrase Marshall McLuhan, “the process is the purpose,” and there is little point in working “in the public interest” while simultaneously robbing that same public of its voice, its community power, and its rights of democratic participation. So, where’s the tie-in to FOSS? The key here is to avoid the problem Marx diagnosed as “alienation of the workers from the means of production.” (Recent world events notwithstanding, Marx was still sometimes correct, and he really put his finger on it with this one.) When software code is provided in a free and open format, users and coders can become partners in the development cycle; better still, “open-source” can also become “open-ended,” as different groups are empowered to modify and enhance the programs they use. Without permanent, reliable, affordable — and, some would argue, customizable — access to tools and data, planners and citizens (the “workers,” in this case) become alienated from the means of producing plans for their future.
  • The value of transparency and openness: A third area of philosophical alignment between free software and public planners relates to the importance both groups place on transparency. To some extent — at least in the context of government planners — this aspect seems to combine elements of the previous two: just as government agencies are required under procurement laws to be cost-conscious, they are required under public records and open meeting laws to be transparent. Similarly, in the same way that community empowerment requires access to the tools of planning, it also requires access to the information of planning: in order for democratic participation to be meaningful, the public must have access to information about what decisions are being made, when, by whom, and why (based on what rationale?). Transparency — not just the privilege of “being informed,” but rather the right to examine and audit all the files — is the only way to ensure this access. In short, even if it is not free, we expect our government to be open source.
  • The virtuous efficiency of cooperation and sharing: With a few misguided exceptions (for example, when engaging in “tragedy of the commons” battles over shared resources, or manipulated into “race-to-the-bottom” regional bidding wars to attract sports teams or industrial development), governments and community-based organizations generally do not exist in the same competitive environment as private companies. If one agency or neighborhood develops a new tool or has a smart idea to solve a persistent problem, there is no harm — and much benefit — to sharing it with other places. In this way, the natural inclination of public and non-profit agencies bears a striking resemblance to the share-and-share-alike ethos of open source software developers. (The crucial difference being that, often, government and community-based agencies are too busy actually working “in the trenches” to develop networks for shared learning and knowledge transfer, but the interest is certainly there.)

Added to all this, recent government software challenges hint at the potential benefit of a FOSS development model. For example, given the botched rollout of the online health care insurance exchanges (which some have blamed on proprietary software models, and/or the difficulty of building the new public system on top of existing locked private code), groups like FSF have been presented with a “teachable moment” about the virtues of free and open solutions. Of course, given the current track record of adoption (spotty at best), the recognition of these lines of natural alignment begs the question, “Given all this potential and all these shared values, why haven’t more public and non-profit groups embraced free and open software to advance their work?” Our conversation began to address this question in a frank and honest way, enumerating deficiencies in the existing tools and gaps in the adoption pipeline, but quickly pivoted to a more positive framing, suggesting new — and, potentially, quite productive — fronts for the campaign for free and open source software, which I will present in part two. Stay tuned.

January 29 2013

Four short links: 29 January 2013

  1. FISA Amendment Hits Non-CitizensFISAAA essentially makes it lawful for the US to conduct purely political surveillance on foreigners’ data accessible in US Cloud providers. [...] [A] US judiciary subcommittee on FISAAA in 2008 stated that the Fourth Amendment has no relevance to non-US persons. Americans, think about how you’d feel keeping your email, CRM, accounts, and presentations on Russian or Chinese servers given the trust you have in those regimes. That’s how the rest of the world feels about American-provided services. Which jurisdiction isn’t constantly into invasive snooping, yet still has great bandwidth?
  2. Tim Berners-Lee Opposes Government Snooping“The whole thing seems to me fraught with massive dangers and I don’t think it’s a good idea,” he said in reply to a question about the Australian government’s data retention plan.
  3. Google’s Approach to Government Requests for Information (Google Blog) — they’ve raised the dialogue about civil liberties by being so open about the requests for information they receive. Telcos and banks still regard these requests as a dirty secret that can’t be talked about, whereas Google gets headlines in NPR and CBS for it.
  4. Open Internet Tools Projectsupports and incubates a collection of free and open source projects that enable anonymous, secure, reliable, and unrestricted communication on the Internet. Its goal is to enable people to talk directly to each other without being censored, surveilled or restricted.

January 28 2013

Four short links: 28 January 2013

  1. Aaron’s Army — powerful words from Carl Malamud. Aaron was part of an army of citizens that believes democracy only works when the citizenry are informed, when we know about our rights—and our obligations. An army that believes we must make justice and knowledge available to all—not just the well born or those that have grabbed the reigns of power—so that we may govern ourselves more wisely.
  2. Vaurien the Chaos TCP Monkeya project at Netflix to enhance the infrastructure tolerance. The Chaos Monkey will randomly shut down some servers or block some network connections, and the system is supposed to survive to these events. It’s a way to verify the high availability and tolerance of the system. (via Pete Warden)
  3. Foto Forensics — tool which uses image processing algorithms to help you identify doctoring in images. The creator’s deconstruction of Victoria’s Secret catalogue model photos is impressive. (via Nelson Minar)
  4. All Trials Registered — Ben Goldacre steps up his campaign to ensure trial data is reported and used accurately. I’m astonished that there are people who would withhold data, obfuscate results, or opt out of the system entirely, let alone that those people would vigorously assert that they are, in fact, professional scientists.

October 19 2012

Thin walls and traffic cameras

2008 06 11 - 3313b - Silver Spring - 16th St Circle Traffic Camera by thisisbossi, on Flickr2008 06 11 - 3313b - Silver Spring - 16th St Circle Traffic Camera by thisisbossi, on FlickrA couple of years ago, I spoke with a European Union diplomat who shall remain nameless about the governing body’s attitude toward privacy.

“Do you know why the French hate traffic cameras?” he asked me. “It’s because it makes it hard for them to cheat on their spouses.”

He contended that while it was possible for a couple to overlook subtle signs of infidelity — a brush of lipstick on a collar, a stray hair, or the smell of a man’s cologne — the hard proof of a speeding ticket given on the way to an afternoon tryst couldn’t be ignored.

Humans live in these grey areas. A 65 mph speed limit is really a suggestion; it’s up to the officers to enforce that limit. That allows for context: a reckless teen might get pulled over for going 70, but a careful driver can go 75 without incident.

But a computer that’s programmed to issue tickets to speeders doesn’t have that ambiguity. And its accusations are hard to ignore because they’re factual, rooted in hard data and numbers.

Did big data kill privacy?

With the rise of a data-driven society, it’s tempting to pronounce privacy dead. Each time we connect to a new service or network, we’re agreeing to leave a digital breadcrumb trail behind us. And increasingly, not connecting makes us social pariahs, leaving others to wonder what we have to hide.

But maybe privacy is a fiction. For millennia — before the rise of city-states — we lived in villages. Gossip, hearsay, and whisperings heard through thin-walled huts were the norm.

Shared moral values and social pressure helped groups to compete better against other groups, helping to evolve the societies and religions that dominate the world today. Humans thrive in part because of our groupish nature — which is why moral psychologist Jonathan Haidt says we’re 90% chimp and 10% bee. We might have evolved as selfish individuals, but we conquered the Earth as selfish teams.

In other words, being private is relatively new, perhaps only transient, and gossip helped us get here.

Prediction isn’t invasion

Much of what we see as technology’s invasion of privacy is really just prediction. As we connect the world’s databases — tying together smartphones, loyalty programs, medical records, and the other constellations in the galaxy of our online lives — we’re doing something that looks a lot like invading privacy. But it’s not.

Big data doesn’t peer into your browser history or look through your bedside table to figure out what porn you like; rather, it infers your taste in smut from the kind of music you like. Big data doesn’t administer a pregnancy test; instead, it guesses you’re pregnant because of what you buy. Many of big data’s predictions are a boon, helping us to fight disease, devote resources to the right problems, and pinpoint ways to help the disadvantaged.

Is prediction an invasion of privacy? Not really. Companies will compete based on their ability to guess what’s going to happen. We’re simply taking the inefficiency out of the way we’ve dealt with risk in the past. Algorithms can be wrong. Prediction is only a problem when we cross the moral Rubicon of prejudice: treating you differently because of those predictions, changing the starting conditions for unfair reasons.

Unfortunately, big data’s predictions are often frighteningly accurate, so the temptation to treat them as fact is almost overwhelming. Policing looks like thoughtcrime. And tomorrow, a just society is a skeptical one.

We’re leakier than we know

Picture by Michael Vroegop (vrogy) on FlickrPicture by Michael Vroegop (vrogy) on FlickrLong before the Internet, we left a breadcrumb trail of personal details behind us: call history, credit-card receipts, car mileage, bank records, music purchases, library check-outs, and so on.

But until big data, baking the breadcrumbs back into a loaf was hard. Paper records were messy, and physical copies were hard to collect. Unless you were being pursued by an army of investigators, the patterns of your life remained hidden in plain sight. We weren’t really private — we just felt like we were, and it was too hard for others to prove otherwise without a lot of work.

No more. Big data represents a radical drop in the cost of tying together vast amounts of disparate data quickly. Digital records are clean, easy to analyze, and trivial to copy. That means the illusion of personal privacy is vanishing — but we should remember that it’s always been an illusion.

Our digital lives make this even more true. We’re probably not aware of what’s being collected as we surf the web — but it’s pretty easy to tell where someone’s been through browser trickery, cross-site advertising, and the like. So when a politician calls for your vote, they may know more about you than you want. But let’s not confound promiscuous surfing behavior — leaving more breadcrumbs — with an improved ability to bake those crumbs back into a loaf.

Big data didn’t force us to overshare; it’s just better at noticing when we do and deriving meaning from it. And because of this, it’s back to thin-walled huts and gossip. Only this time, because it’s digital and machine-driven, there are a couple of important twists to consider.

This ain’t your ancestors’ privacy

There are two key differences, however, between our ancestors’ gossip-filled, thin-walled villages and today’s global digital village.

First, consider the two-way flow of gossip. A thousand years ago, word-of-mouth worked both ways. Someone who told tales too often risked ostracism. We could confront our accusers. Social mores were a careful balance of shame and approval, with checks and balances.

That balance is gone. We can’t confront our digital accusers. If we’re denied a loan, we lack the tools to understand why. Often, we aren’t even aware that we’ve been painted with a digital scarlet letter. As one Oxford professor put it, “nobody knows the offer they didn’t receive.”

Big data is whispering things about us — both inferred predictions and assembled truths — and we don’t even know it.

Second, everyone knew gossip was imperfect. We’ve all played “broken telephone” and seen how easily many mouths distort a message. We’re skeptical of a single truth. We’ve learned to forgive, to question.

The same studies that show groups should ostracize those who don’t chip in also suggest that the best strategy of all is to forgive occasionally — just in case the initial failure was an honest mistake. In other words, when dealing with whispered truths, we lived life in a grey area.

Unfortunately, digital accusations — like those made by traffic cameras — leave little room for mercy and tolerance because they lack that grey area in which much of human interaction thrives. If we’re going to build data-driven systems, then those systems need grey areas.

New rules for the new transparency

In the timeline of human history, privacy is relatively recent. It may even be that privacy was an anomaly, that our social natures rely on leakage to thrive, and that we’re nearing the end of a transient time where the walls between us gave us the illusion of secrecy.

But now that technology is tearing down those walls, we need checks and balances to ensure that we don’t let predictions become prejudices. Even when those predictions are based in fact, we must build both context and mercy into the data-driven decisions that govern our quantified future.

This post originally appeared on Solve for Interesting. This version has been lightly edited.

Photos: Traffic camera, Silver Spring – 16th St Circle Traffic Camera by thisisbossi, on Flickr; leaky cup, The cup that can only be half-full. by vrogy, on Flickr.


October 16 2012

Four short links: 16 October 2012

  1. — news app for iPhone, which lets you track updates and further news on a given story. (via Andy Baio)
  2. DataWrangler (Stanford) — an interactive tool for data cleaning and transformation. Spend less time formatting and more time analyzing your data. From the Stanford Visualization Group.
  3. Responsivator — see how websites look at different screen sizes.
  4. Accountable Algorithms (Ed Felten) — When we talk about making an algorithmic public process open, we mean two separate things. First, we want transparency: the public knows what the algorithm is. Second, we want the execution of the algorithm to be accountable: the public can check to make sure that the algorithm was executed correctly in a particular case. Transparency is addressed by traditional open government principles; but accountability is different.

September 27 2012

Four short links: 27 September 2012

  1. Paying for Developers is a Bad Idea (Charlie Kindel) — The companies that make the most profit are those who build virtuous platform cycles. There are no proof points in history of virtuous platform cycles being created when the platform provider incents developers to target the platform by paying them. Paying developers to target your platform is a sign of desperation. Doing so means developers have no skin in the game. A platform where developers do not have skin in the game is artificially propped up and will not succeed in the long run. A thesis illustrated with his experience at Microsoft.
  2. Learnable Programming (Bret Victor) — deconstructs Khan Academy’s coding learning environment, and explains Victor’s take on learning to program. A good system is designed to encourage particular ways of thinking, with all features carefully and cohesively designed around that purpose. This essay will present many features! The trick is to see through them — to see the underlying design principles that they represent, and understand how these principles enable the programmer to think. (via Layton Duncan)
  3. Tablet as External Display for Android Smartphones — new app, in beta, letting you remote-control via a tablet. (via Tab Times)
  4. Clay Shirky: How The Internet Will (One Day) Transform Government (TED Talk) — There’s no democracy worth the name that doesn’t have a transparency move, but transparency is openness in only one direction, and being given a dashboard without a steering wheel has never been the core promise a democracy makes to its citizens.

September 13 2012

Four short links: 13 September 2012

  1. Patterns for Research in Machine Learning — every single piece of advice should be tattooed under the eyelids of every beginning programmer, regardless of the field.
  2. Milton Friedman’s ThermostatEverybody knows that if you press down on the gas pedal the car goes faster, other things equal, right? And everybody knows that if a car is going uphill the car goes slower, other things equal, right? But suppose you were someone who didn’t know those two things. And you were a passenger in a car watching the driver trying to keep a constant speed on a hilly road. You would see the gas pedal going up and down. You would see the car going downhill and uphill. But if the driver were skilled, and the car powerful enough, you would see the speed stay constant. So, if you were simply looking at this particular “data generating process”, you could easily conclude: “Look! The position of the gas pedal has no effect on the speed!”; and “Look! Whether the car is going uphill or downhill has no effect on the speed!”; and “All you guys who think that gas pedals and hills affect speed are wrong!” (via Dr Data’s Blog)
  3. Transparency Doesn’t Kill Kittens (O’Reilly Radar) — Atul Gawande says, cystic fibrosis … had data for 40 years on the performance of the centers around the country that take care of kids with cystic fibrosis. They shared the data privately [...] They just told you where you stood relative to everybody else and they didn’t make that information public. About four or five years ago, they began making that information public. It’s now available on the Internet. You can see the rating of every center in the country for cystic fibrosis. Several of the centers had said, “We’re going to pull out because this isn’t fair.” Nobody ended up pulling out. They did not lose patients in hoards and go bankrupt unfairly. They were able to see from one another who was doing well and then go visit and learn from one and other.
  4. 3D Printing: The Coolest Way to Visualize Sound — just what it says. (via Infovore)

August 17 2012

Four short links: 17 August 2012

  1. What Twitter’s API Anouncement Could Have Said (Anil Dash) — read this and learn. Anil shows how powerful it is to communicate from the perspective of the reader. People don’t care about your business model or platform changes except as it applies to them. Focus on what you’re doing for the user, because that’s why you make every change–right? Your average “we’ve changed things” message focuses on the platform not the user: “*we* changed things for *our* reasons” and the implicit message is because *we* have all the power”. Anil’s is “you just got this Christmas present, because we are always striving to make things better for you!”. If it’s deceitful bullshit smeared over an offensive money grab, the reader will smell it. But if you’re living life right, you’re telling the truth. And they can smell that, too.
  2. Goodbye, Everyblock — Adrian Holovaty is moving on and ready, once more, to make something awesome.
  3. Turkopticon — transparency about crappy microemployers for people who work on Mechanical Turk. (via Beta Knowledge)
  4. Digital Natives, 10 Years After (PDF) — we need to move away from this fetish of insisting in naming this generation the Digital/Net/Google Generation because those terms don’t describe them, and have the potential of keeping this group of students from realizing personal growth by assuming that they’ve already grown in areas that they so clearly have not.

May 22 2012

Data journalism research at Columbia aims to close data science skills gap

Successfully applying data science to the practice of journalism requires more than providing context and finding clarity in vasts amount of unstructured data: it will require media organizations to think differently about how they work and who they venerate. It will mean evolving towards a multidisciplinary approach to delivering stories, where reporters, videographers, news application developers, interactive designers, editors and community moderators collaborate on storytelling, instead of being segregated by departments or buildings.

The role models for this emerging practice of data journalism won't be found on broadcast television or on the lists of the top journalists over the past century. They're drawn from the increasing pool of people who are building new breeds of newsrooms and extending the practice of computational journalism. They see the reporting that provisions their journalism as data, a body of work that can itself can be collected, analyzed, shared and used to create longitudinal insights about the ways that society, industry or government are changing. (Or not, as the case may be.)

In a recent interview, Emily Bell (@EmilyBell), director of the Tow Center for Digital Journalism at the Columbia University School of Journalism, offered her perspective about what's needed to train the data journalists of the future and the changes that still need to occur in media organizations to maximize their potential. In this context, while the role of institutions and "journalism education are themselves evolving, they both will still fundamentally matter for "what's next," as practitioners adapt to changing newsonomics.

Our discussion took place in the context of a notable investment in the future of data journalism: a $2 million research grant to Columbia University from the Knight Foundation to research and distribute best practices for digital reportage, data visualizations and measuring impact. Bell explained more about what how the research effort will help newsrooms determine what's next on the Knight Foundation's blog:

The knowledge gap that exists between the cutting edge of data science, how information spreads, its effects on people who consume information and the average newsroom is wide. We want to encourage those with the skills in these fields and an interest and knowledge in journalism to produce research projects and ideas that will both help explain this world and also provide guidance for journalism in the tricky area of ‘what next’. It is an aim to produce work which is widely accessible and immediately relevant to both those producing journalism and also those learning the skills of journalism.

We are focusing on funding research projects which relate to the transparency of public information and its intersection with journalism, research into what might broadly be termed data journalism, and the third area of ‘impact’ or, more simply put, what works and what doesn’t.

Our interview, lightly edited for content and clarity, follows.

What did you do before you became director of the Tow Center for Digital Journalism?

I spent ten years where I was editor-in-chief of The Guardian website. During the last four of those, I was also overall director of digital content for all The Guardian properties. That included things like mobile applications, et cetera, but from the editorial side.

Over the course of that decade, you saw one or two things change online, in terms of what journalists could do, the tools available to them and the news consumption habits of people. You also saw the media industry change, in terms of the business models and institutions that support journalism as we think of it. What are the biggest challenges and opportunities for the future journalism?

For newspapers, there was an early warning system: that newspaper circulation has not really consistently risen since the early 1980s. We had a long trajectory of increased production and actually, an overall systemic decline which has been masked by a very, very healthy advertising market, which really went on an incredible bull run with a more static pictures, and just "widen the pipe," which I think fooled a lot of journalism outlets and publishers into thinking that that was the real disruption.

And, of course, it wasn’t.

The real disruption was the ability of anybody anywhere to upload multimedia content and share it with anybody else who was on a connected device. That was the thing that really hit hard, when you look at 2004 onwards.

What journalism has to do is reinvent its processes, its business models and its skillsets to function in a world where human capital does not scale well, in terms of sifting, presenting and explaining all of this information. That’s really the key to it.

The skills that journalists need to do that -- including identifying a story, knowing why something is important and putting it in context -- are incredibly important. But how you do that, which particular elements you now use to tell that story are changing.

Those now include the skills of understanding the platform that you’re operating on and the technologies which are shaping your audiences’ behaviors and the world of data.

By data, I don’t just mean large caches of numbers you might be given or might be released by institutions: I mean that the data thrown off by all of our activity, all the time, is simply transforming the speed and the scope of what can be explained and reported on and identified as stories at a really astonishing speed. If you don’t have the fundamental tools to understand why that change is important and you don’t have the tools to help you interpret and get those stories out to a wide public, then you’re going to struggle to be a sustainable journalist.

The challenge for sustainable journalism going forward is not so different from what exists in other industries: there's a skills gap. Data scientists and data journalists use almost the exact same tools. What are the tools and skills that are needed to make sense of all of this data that you talked about? What will you do to catalog and educate students about them?

It's interesting when you say that the skills of these clients are very similar, which is absolutely right. First of all, you have a basic level of numeracy needed - and maybe not just a basic level, but a more sophisticated understanding of statistical analysis. That’s not something which is routinely taught in journalism schools but that I think will increasingly have to be.

The second thing is having some coding skills or some computer science understanding to help with identifying the best, most efficient tools and the various ways that data is manipulated.

The third thing is that when you’re talking about 'data scientists,' it’s really a combination of those skills. Adding data doesn’t mean you don't have to have other journalism skills which do not change: understanding context, understanding what the story might be, and knowing how to derive that from the data that you’re given or the data that exists. If it’s straightforward, how do you collect it? How do you analyze it? How do you interpret them and present it?

It’s easy to say, but it’s difficult to do. It’s particularly difficult to reorient the skillsets of an industry which have very much resided around the idea of a written story and an ability with editing. Even in the places where I would say there’s sophisticated use of data in journalism, it’s still a minority sport.

I’ve talked to several heads of data in large news organizations and they’ve said, “We have this huge skills gap because we can find plenty of people who can do the math; we can find plenty of people who are data scientists; we can’t find enough people who have those skills but also have a passion or an interest in telling stories in a journalistic context and making those relatable.”

You need a mindset which is about putting this in the context of the story and spotting stories, as well having creative and interesting ideas about how you can actually collect this material for your own stories. It’s not a passive kind of processing function if you’re a data journalist: it’s an active speaking, inquiring and discovery process. I think that that’s something which is actually available to all journalists.

Think about just local information and how local reporters go out and speak to people every day on the beat, collect information, et cetera. At the moment, most get from those entities don’t structure the information in a way that will help them find patterns and build new stories in the future.

This is not just about an amazing graphic that the New York Times does with census data over the past 150 years. This is about almost every story. Almost every story has some component of reusability or a component where you can collect the data in a way that helps your reporting in the future.

To do that requires a level of knowledge about the tools that you’re using, like coding, Google Refine or Fusion Tables. There are lots of freely available tools out there that are making this easier. But, if you don’t have the mindset that approaches, understands and knows why this is going to help you and make you a better reporter, then it’s sometimes hard to motivate journalists to see why they might want to grab on.

The other thing to say, which is really important, is there is currently a lack of both jobs and role models for people to point to and say, “I want to be that person.”

I think the final thing I would say to the industry is we’re getting a lot of smart journalists now. We are one of the schools where all of our digital concentrations from students this year include a basic grounding in data journalism. Every single one of them. We have an advanced course taught by Susan McGregor in data visualization. But we’re producing people from the school now, who are being hired to do these jobs, and the people who are hiring them are saying, “Write your own job description because we know we want you to do something, we just don’t quite know what it is. Can you tell us?”

You can’t cookie-cutter these people out of schools and drop them into existing roles in news trends because those are still developing. What we’re seeing are some very smart reporters with data-centric mindsets and also the ability to do these stories -- but they want to be out reporting. They don’t want to be confined to a desk and a spreadsheet. Some editors usually find that very hard to understand, “Well, what does that job look like?”

I think that this is where working with the industry, we can start to figure some of these things out, produce some experimental work or stories, and do some of the thinking in the classroom that helps people figure out what this whole new world is going to look like.

What do journalism schools need to do to close this 'skills gap?' How do they need to respond to changing business models? What combination of education, training and hands-on experience must they provide?

One of the first things they need to do is identify the problem clearly and be honest about it. I like to think that we’ve done that at Columbia, although I’m not a data journalist. I don’t have a background in it. I’m a writer. I am, if you like, completely the old school.

But one of the things I did do at The Guardian was helped people who early on said to me, “Some of this transformation means that we have to think about data as being a core part of what we do.” Because of the political context and the position I was in, I was able to recognize that that was an important thing that they were saying and we could push through changes and adoption in those areas of the newsroom.

That’s how The Guardian became interested in data. It’s the same in journalism school. One of the early things that we talked about [at Columbia] was how we needed to shift some of what the school did on its axis and acknowledge that this was going to be key part of what we do in the future. Once we acknowledged that that is something we had to work towards, [we hired] Susan McGregor from the Wall Street Journal’s Interactive Team. She’s an expert in data journalism and has an MA in technology in education.

If you say to me, “Well, what’s the ground vision here?” I would say the same thing I would say to anybody: over time, and hopefully not too long a course of time, we want to attract a type of student that is interested and capable in this approach. That means getting out and motivating and talking to people. It means producing attractive examples which high school children and undergraduate programs think about [in their studies]. It means talking to the CS [computer science] programs -- and, in fact, more about talking to those programs and math majors than you would be talking to the liberal arts professors or the historians or the lawyers or the people who have traditionally been involved.

I think that has an effect: it starts to show people who are oriented towards storytelling but have capabilities which are align more with data science skill sets that there’s a real task for them. We can’t message that early enough as an industry. We can’t message it early enough as an educator to get people into those tracks. We have to really make sure that the teaching is high quality and that we’re not just carried away with the idea of the new thing, we need to think pretty deeply about how we get those skills.

What sort of basic sort of statistical teaching do you need? What are the skills you need for data visualization? How do you need to introduce design as well as computer science skills into the classroom, in a way which makes sense for stories? How do you tier that understanding?

You're always going to produce superstars. Hopefully, we’ll be producing superstars in this arena soon as well.

We need to take the mission seriously. Then we need to build resources around it. And that’s difficult for educational organizations because it takes time to introduce new courses. It takes time to signal that this is something you think is important.

I think we’ve done a reasonable job of that so far at Columbia, but we’ve got a lot further to go. It's important that institutions like Columbia do take the lead and demonstrate that we think this is something that has to be a core curriculum component.

That’s hard, because journalism schools are known for producing writers. They’re known for different types of narratives. They are not necessarily lauded for producing math or computer science majors. That has to change.


April 17 2012

Four short links: 17 April 2012

  1. Penguins Counted From Space (Reuters) -- I love the unintended flow-on effects of technological progress. Nobody funded satellites because they'd help us get an accurate picture of wildlife in the Antarctic, but yet here we are. The street finds a use ...
  2. What Makes a Super-Spreader? -- A super-spreader is a person who transmits an infection to a significantly greater number of other people than the average infected person. The occurrence of a super spreader early in an outbreak can be the difference between a local outbreak that fizzles out and a regional epidemic. Cory, Waxy, Gruber, Ms BrainPickings Popova: I'm looking at you. (via BoingBoing)
  3. The Internet Did Not Kill Reading Books (The Atlantic) -- reading probably hasn't declined to the horrific levels of the 1950s.
  4. Data Transparency Hacks -- projects that came from the WSJ Data Transparency Codeathon.

November 04 2011

Top Stories: October 31-November 4, 2011

Here's a look at the top stories published across O'Reilly sites this week.

How I automated my writing career
You scale content businesses by increasing the number of people who create the content ... or so conventional wisdom says. Learn how a former author is using software to simulate and expand human-quality writing.

What does privacy mean in an age of big data?
Ironclad digital privacy isn't realistic, argues "Privacy and Big Data" co-author Terence Craig. What we need instead are laws and commitments founded on transparency.

If your data practices were made public, would you be nervous?
Solon Barocas, a doctoral student at New York University, discusses consumer perceptions of data mining and how companies and data scientists can shape data mining's reputation.

Five ways to improve publishing conferences
Keynotes and panel discussions may not be the best way to program conferences. What if organizers instead structured events more like a great curriculum?

Anthropology extracts the true nature of tech
Genevieve Bell, director of interaction and experience research at Intel, talks about how anthropology can inform business decisions and product design.

Tools of Change for Publishing, being held February 13-15 in New York, is where the publishing and tech industries converge. Register to attend TOC 2012.

November 02 2011

What does privacy mean in an age of big data?

As we do more online — shop, browse, chat, check in, "like" — it's clear that we're leaving behind an immense trail of data about ourselves. Safeguards offer some level of protection, but technology can always be cracked and the goals of data aggregators can shift. So if digital data is and always will be a moving target, how does that shape our expectations for privacy? Terence Craig (@terencecraig), co-author of "Privacy and Big Data," examines this question and related issues in the following interview.

Your book argues that by focusing on how advertisers are using our data, we might be missing some of the bigger picture. What are we missing, specifically?

Terence CraigTerence Craig: One of the things I tell people is I really don't care if companies get more efficient at selling me soap. What I do care about is the amount of information that is being aggregated to sell me soap and what uses that data might be put toward in the future.

One of the points that co-author Mary Ludloff and I tried to make in the book is that the reasons behind data collection have nothing to do with how that data will eventually be used. There's way too much attention being paid to "intrusions of privacy" as opposed to the problem that once data is out there, it's out there. And potentially, it's out there as long as electronic civilization exists. How that data will be used is anybody's guess.

What's your take on the promise of anonymity often associated with data collection?

Terence Craig: It's fundamentally irresponsible for anyone who collects data to claim they can anonymize that data. We've seen the Netflix de-anonymization, the AOL search release, and others. There's been several cases where medical data has been released for laudatory goals, but that data has been de-anonymized rather quickly. For example, the Electronic Frontier Foundation has a piece that explains how a researcher was able to connect an anonymized medical record to former Massachusetts governor William Weld. And in relation to that, a Harvard genome project tries to make sure people understand the privacy risks of participating.

If we assume that companies have good will toward their consumers' data — and I'll assume that most large corporations do — these companies can still be hacked. They can be taken advantage of by bad employees. They can be required by governments to provide backdoors into their systems. Ultimately, all of this is risky for consumers.

Assuming that data can't be anonymized and companies don't have malicious plans for our personal data, what expectations can we have for privacy?

Terence Craig: We've moved back to our evolutionary default for privacy, which is essentially none. Hunter-gatherers didn't have privacy. In small rural villages with shared huts between multi-generational families, privacy just wasn't really available there.

The question is how do we address a society that mirrors our beginnings, but comes with one big difference? Before, anyone who knew the intimate details of our lives were people we had met physically, and they were often related to us. But now the geographical boundary has been erased by the Internet, so what does that mean? And how are we as a society going to evolve to deal with that?

With that in mind, I've given up on the idea of digital privacy as a goal. I think you have to if you want to reap the rewards of being a full participant in a digitized society. What's important is for us to make sure we have transparency from the large institutions that are aggregating data. We need these institutions to understand what they're doing with data and to share that with people so we, in aggregate, can agree whether or not this is a legitimate use of our data. We need transparency so that we — consumers, citizens — can start to control the process. Transparency is what's important. The idea that we can keep the data hidden or private, well ... that horse has left the stable.

What's the role of governments here, both in terms of the data they keep but also the laws they pass about data?

Terence Craig: Basically anything the government collects, I believe should be made available. After all, governments are some of the largest aggregators of data from all sorts of people. They either purchase it or they demand it for security needs from primary collectors like Google, Facebook, and the cell phone companies — the millions of requests law enforcement agencies sent to Sprint in 2008-2009 was a big story we mentioned in the book. So, it's important that governments reveal what they're doing with this information.

Obviously, there's got to be a balance between transparency and operational security needs. What I want is to have a general idea of: "Here's what we — the government — are doing with all of the data. Here's all of the data we've collected through various means. Here's what we're doing with it. Is that okay?" That's the sort of legislation I would like, but you don't see that anywhere at this point.

This interview was edited and condensed.

Privacy and Big Data — This book introduces you to the players in the personal data game, and explains the stark differences in how the U.S., Europe, and the rest of the world approach the privacy issue.


Reposted byschlingelulexElbenfreund

October 27 2011

Strata Week: IBM puts Hadoop in the cloud

Here are a few of the data stories that caught my attention this week.

IBM's cloud-based Hadoop offering looks to make data analytics easier

IBM HadoopAt its conference in Las Vegas this week, IBM made a number of major big-data announcements, including making its Hadoop-based product InfoSphere BigInsights available immediately via the company's SmartCloud platform. InfoSphere BigInsights was unveiled earlier this year, and it is hardly the first offering that Big Blue is making to help its customers handle big data. The last few weeks have seen other major players also move toward Hadoop offerings — namely Oracle and Microsoft — but IBM is offering its service in the cloud, something that those other companies aren't yet doing. (For its part, Microsoft does say that a Hadoop service will come to Azure by the end of the year.)

IBM joins Amazon Web Services as the only other company currently offering Hadoop in the cloud, notes GigaOm's Derrick Harris. "Big data — and Hadoop, in particular — has largely been relegated to on-premise deployments because of the sheer amount of data involved," he writes, "but the cloud will be a more natural home for those workloads as companies begin analyzing more data that originates on the web."

Harris also points out that IBM's Hadoop offering is "fairly unique" insofar as it targets businesses rather than programmers. IBM itself contends that "bringing big data analytics to the cloud means clients can capture and analyze any data without the need for Hadoop skills, or having to install, run, or maintain hardware and software."

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Cleaning up location data with Factual Resolve

The data platform Factual launched a new API for developers this week that tackles one of the more frustrating problems with location data: incomplete records. Called Factual Resolve, the new offering is, according to a company blog post, an "entity resolution API that can complete partial records, match one entity against another, and aid in de-duping and normalizing datasets."

Developers using Resolve tell it what they know about an entity (say, a venue name) and the API can return the rest of the information that Factual knows based on its database of U.S. places — address, category, latitude and longitude, and so on.

Tyler Bell, Factual's director of product, discussed the intersection of location and big data at this year's Where 2.0 conference. The full interview is contained in the following video:

Google and governments' data requests

As part of its efforts toward better transparency, Google has updated its Government Requests tool this week with information about the number of requests the company has received for user data since the beginning of 2011.

This is the first time that Google is disclosing not just the number of requests, but the number of user accounts specified as well. It's also made the raw data available so that interested developers and researchers can study and visualize the information.

According to Google, requests from U.S. government officials for content removal were up 70% in this reporting period (January-June 2011) versus the previous six months. And the number of user data requests was up by 29% compared to the previous reporting period. Google also says it received requests from local law enforcement agencies to take down various YouTube videos — one on police brutality, one that was allegedly defamatory — but Google says that it did not comply. But of the 5,950 user data requests (impacting some 11,000 user accounts) submitted between January and June 2011, Google says that it has complied with 93%, either fully or partially.

The U.S. was hardly the only government making an increased number of requests to Google. Spain, South Korea, and the U.K., for example, also made more requests. Several countries, including Sri Lanka and the Cook Islands, made their first requests.

Got data news?

Feel free to email me.


September 29 2011

Play fullscreen
GERMANY / WHAT in the WORLD ? - YouTube


// oAnth:

The statement is really worth watching: an IMO successful attempt of a short resume about the international pro democracy and anti austerity + corruption movement and its possible political new implications, as they showed up by the recent results of the German Pirate Party in the city parliament of Berlin (status of a German state/Land ) with a sudden jump from zero to almost nine percent.
Reposted fromFreXxX FreXxX viakrekk krekk

September 20 2011

Cooking the data

At this week's Strata Conference in New York, there's a lot of discussion about data transparency. As masses of easily available, quickly analyzed data transform businesses, that data can also change how we regulate and legislate the world.

Data transparency holds promise. It should, in theory, weed out corruption and level the playing field. Rather than regulating what a company can do, for example, we can regulate what it must share with the world — and then let the world deal with the consequences, whether by boycott, activism, or class-action lawsuit. It's something the Leading Edge Forum's Michael Nelson described as a form of digital libertarianism: pacts of transparency between businesses and consumers, or between governments and citizens. He calls it "Mutually Assured Disclosure.

It's certainly encouraging to think that corruption and shenanigans wither under the harsh light of data. With information out in the open, it should be easy for interested parties to review the numbers — using cheap clouds and intuitive visualizations — and spot the cheaters.

Does data really blow its own whistle?

Google Maps, GreeceThe first problem open data advocates run into is that of getting real information. Look at Greece: 324 Athenians reported having swimming pools on their taxes. When the government used Google Maps to try and count how many there really were, they found 16,974 of them — despite efforts by citizens to camouflage their pools under green tarpaulins. So even if activists can use widely available data to create change, that data may be wrong.

One way around this is to get your own data. The barriers to data collection have vanished with the advent of social networks, ubiquitous computing, and other innovations. Just as Greek tax officials can use Google Earth to understand tax evasion, so organizations like Asthmapolis can crowdsource data — in this case, by attaching GPS receivers to asthma inhalers — and use the information to shape public policy.

Can we tell when the data is wrong?

Data in hand, it needs to be properly analyzed. That's not as easy as it sounds.

With software development, it's easy to see the results. If the coder's work isn't effective, the finished product is buggy, unusable, slow, and incompatible. On the other hand, a lazy data scientist produces wrong results that may not be obvious to anyone. Detecting fraud or error in data sets can be tough. At Strata Summit, LinkedIn's Monica Rogati highlighted a number of common errors that analysts make when interpreting and reporting their research; as more and more people start to work with numbers, more and more make mistakes. Statistics is often counter-intuitive. (Want a good example? Try the Monty Hall problem.)

Will we know if we've got bad data, whether from malice, omission, or accident? It's possible to detect fraud in some cases. Modeling datasets often reveals problems with the data, and statisticians have tricks that can help. Benford's Law, for example, says that "for naturally occurring data, you get more ones than twos, more twos than threes, and so on, all the way down to nine." Point the law certain kinds of datasets, and you know how likely it is that the contents are a lie.

Will we act on it?

Open data is no good unless it leads to action. Most proponents of transparency believe that change logically flows from proof. In government, at least, current public policy suggests otherwise. On critical global issues like climate science and evolution, despite overwhelming, peer-reviewed data, we're still deadlocked on whether to teach creationism, or whether climate change is real. Don't like the numbers you're getting? Call them corrupt. Threaten to take away funding. If the infographic is the new stump speech, then questioning the data is the new rebuttal.

Simply having transparency doesn't lead to change. Without effective checks and balances, and without real punishments, shining the harsh light of data won't do anything. This makes class action lawyers and hacktivists unlikely allies: Lawsuits, social media campaigns, and boycotts are often the only way to induce change from otherwise unregulated industries.

Data transparency is an arms race. In a world of full disclosure, cooking the data is the new cooking the books. How many of today's data scientists will become tomorrow's forensic accountants, locked in a war with the fraudulent and the ignorant? Open data and transparency aren't enough: we need True Data, not Big Data, as well as regulators and lawmakers willing to act on it.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
No Soup for you

Don't be the product, buy the product!

YES, I want to SOUP ●UP for ...