Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

October 11 2013

Safe-Harbour-Verstöße, Tracking bei Mailprovidern, Cloudspeicher im Selbstbau

Datenschutzverstöße bei „Safe Harbour“, Tracking bei Mailprovidern, Streaming aus Musikerperspektive, Gauck zum Datenschutz, Datenleck bei Adobe und Cloudspeicher im Selbstbau. Die Cloud-Links der Woche:

Safe Harbour: Viele Verstöße gegen Datenschutz-Selbstregulierung

Viele US-Unternehmen verstoßen gegen die selbstauferlegten Verpflichtungen der Safe-Harbour-Vereinbarung, die den Export von Nutzerdaten aus Europa in die USA regelt. Das ist das Fazit eines Berichts (PDF) von Christopher Connoly, Chef der Datenschutzberatung Galexia, im Rahmen einer Anhörung im Innenausschuss des EU-Parlaments. So nennt der Bericht unter anderem 427 Verstöße im laufenden Jahr bei den US-Unternehmen, ein knappes Drittel mehr als 2010. Gänzlich neu ist der Befund Connolys nicht, die Safe-Harbour-Vereinbarung geriet zuletzt jedoch verstärkt in die Kritik. Worum es bei „Safe Harbour” geht, erläutert Jan Schallaböck bei iRights hier genauer.

Deutsche Mail-Provider lassen Tracking zu

Das Magazin c’t hat untersucht, bei welchen Mailprovidern Nutzer vom Absender beim Lesen beobachtet werden können. Technisch gesprochen: welche Anbieter Trackingpixel zulassen. Mit den vor allem von gewerblichen Absendern eingesetzten kleinen Bilddateien können diese nachprüfen, wann, womit und wo eine Mail gelesen wird. Demnach ist die Option bei T-Online, GMX, Web.de, Freenet und 1und1 standardmäßig aktiviert, zumindest beim Webmailer von 1und1 lässt sie sich aber abstellen. Positiv kommen in diesem Fall Yahoo und Google weg, bei denen die Option als Standard abgeschaltet ist. Ebenfalls untersucht wurden gängige Mailprogramme. Heise Security fasst die Ergebnisse zusammen.

Streamingdienste und die Künstler: Anbieter bleiben auf Daten sitzen

Der Musikwirtschaftsforscher Peter Tschmuck hat Streamingdienste wie Spotify, Amazon Cloud Drive oder Rhapsody als Einnahmequelle für Künstler untersucht und Statistiken ausgewertet. Sein Fazit: „Realistischerweise können Musikschaffende Streaming nicht als relevante Einkommensquelle ansehen. Nichtsdestotrotz sollten diese Plattformen als wichtiges Promotionstool für die Verbreitung der eigenen Werke angesehen werden.” Perspektivisch würden aber vor allem die von Streamingdiensten gesammelten Daten für Musikschaffende wichtig. Diese behalten jedoch in aller Regel die Plattformen.

Bundespräsident Gauck: Datenschutz so wichtig wie Umweltschutz

In einer Rede zum Tag der deutschen Einheit hat Bundespräsident Joachim Gauck auch das Thema Datenschutz behandelt. „So sollte der Datenschutz für den Erhalt der Privatsphäre so wichtig werden wie der Umweltschutz für den Erhalt der Lebensgrundlagen”, sagte Gauck. Dabei bezog sich der ehemalige Beauftragte für die Stasi-Unterlagen auch auf die Überwachungs- und Spionageaffäre und forderte „Gesetze, Konventionen und gesellschaftliche Verabredungen”, die dem digitalen Wandel Rechnung tragen.

Adobe: Datenleck bei Kundendaten und Sourcecode

Wie zuerst vom Sicherheitsforscher Brian Krebs berichtet, haben sich Angreifer bei einem Einbruch in das Unternehmensnetzwerk von Adobe Nutzerdaten wie Login-Information, Kreditkartendaten, verschlüsselte Passwörter und Programmcode beschafft. Betroffen sind offenbar Nutzer des Programms Coldfusion sowie Konten für Revel und Creative Cloud. Adobe erklärte, für Nutzer bestehe kein erhöhtes Risiko, betroffene Anwender würden benachrichtigt.

Podcast: Wozu Cloud im Selbstbau?

Marcus Richter hat sich mit dem Mikrorechner Raspberry Pi und der Owncloud-Software einen Cloudspeicher im Selbstbau-Modus zugelegt und eine Anleitung kompiliert. Mit erdgeist vom Chaos Computer Club unterhält er sich gut eine Stunde im Monoxyd-Podcast über die Gründe und Erfahrungen dabei. Hintergründe zum Cloud-im-Selbstbau-Trend auch hier bei iRights.

September 27 2013

Kritik an der Selbstregulierung der Online-Werbewirtschaft

Internet-Nutzer sollen das Erscheinen und die Aktivitäten „nutzungsbasierter Online-Werbung“ per Widerspruch steuern können. Datenschützer und Verbraucherverbände monieren das von der Werbewirtschaft angewandte Opt-out-Verfahren als verbraucherunfreundlich und nicht-EU-Richtlinien-konform.

„Nutzungsbasierte Werbung“ gibt es schon lange. Sie greift auf massenhaft gesammelte und analysierte Daten über das Surf-Verhalten breiter Nutzerschichten zurück, gleicht diese mit ebenfalls aus Suchanfragen und Website-Besuchen ermittelten Interessenprofilen des jeweiligen Nutzers ab, und spielt ihm dann dazu vermutlich passende Werbung in seinem Browser oder in der App ein. Wer beispielsweise häufig Informationen und Seiten zu Hollywood-Action-Filmen sucht oder besucht, könnte in seinem Browser früher oder später eine Banner- oder Pop-up-Werbung zur nächsten Kino-Premiere oder DVD-Veröffentlichung dieses Genres sehen. Durchaus nützlich, mitunter sehr willkommen, aber gewiss auch mit Bedacht zu handhaben, Stichworte: Cookies und Tracking.

Vergleichsweise neu ist, dass man das Erscheinen nutzungsbasierter Online-Werbung im Browser und deren Aktivitäten dahinter nun als Internet-Nutzer selbsttätig regeln und steuern können soll. Hierfür richtete die Online-Werbewirtschaft ein zentralisiertes, sogenanntes „Präferenzmanagementsystem“ ein, und zwar auf der Website „Your Online Choices“. Dort kann man einzelnen Werbe-Anbietern verbieten oder  erlauben, dass im Browser nutzungsbasierte oder auch „verhaltensorientierte“ Werbung eingespielt wird.

Dieses Selbstregulierungssystem ist in den USA und Europa schon länger installiert. Doch weder das als Markenzeichnen geschützte Piktogramm, das die nutzungsbasierte Online-Werbung – „Online Behavorial Advertising“, OBA – kennzeichnen soll, noch das daran geknüpfte Widerspruchsverfahren sind hier bekannt. Anfang August startete der vom Zentralverband der deutschen Werbewirtschaft geleitete Deutsche Datenschutz Online-Rat, kurz DDOW, eine achtwöchige Aufklärungs- und Motivations-Kampagne.

Cookies im Fokus

Für das Sammeln und Weitergeben der Daten setzen die Online-Werber Kleinprogramme oder Textdateien ein: Cookies nisten sich hinter den Kulissen des Browsers ein und arbeiten dort unentwegt der nutzungsbasierten Werbung zu. Die konspirativen Daten-Kollekturen sind vielen Nutzern nicht geheuer oder nicht recht – gute Gründe für Daten- und Verbraucherschützer, mit Werbeverbänden und Politik um Gesetze und Ordnungen zu ringen. Und das sogar europaweit, beispielsweise die als „EU-Cookie-Richtlinie“ bekannte Direktive vom November 2009. Sie reguliert unter anderem den Einsatz und das Benehmen von Cookies für verhaltensbasierte Werbung und das Tracking.

Zur Beachtung: Heutzutage kann man als Nutzer in den gängigen Browsern Cookies von sogenannten „Drittanbietern“ generell den Zutritt verwehren, die digitalen Eckensteher also pauschal aussperren. Und genau das ist wiederum der Online-Werbung nicht recht. Sie betrachtet die Personalisierung der Werbung anhand (hochgerechneter) Interessenprofile als nutzwertig für alle Beteiligten.

„Online-Werbung ist eine unverzichtbare Finanzierungsgrundlage für vielfältige, gerade auch kostenlose Dienste- und Inhaltsangebote im Internet. Dabei ist die Schaltung von Werbemitteln in Online-Medien aufgrund der spezifischen Kommunikationsbedingungen des Internets in besonderer Weise auf die Beachtung von Zielgruppenpräferenzen angewiesen“, erklärt dazu Matthias Wahl, Sprecher des Deutschen Datenschutzrats Online-Werbung (DDOW). Die Werbeverbände bemühen sich aus nachvollziehbarem, geschäftlichem Interesse darum, die Beachtung und das angeknackste Vertrauen in die nutzungsbasierte Werbung wieder zu vergrößern.

So weisen ihre Vertreter gerne darauf hin, dass es bei nutzungsbasierter Werbung oft zu einem Missverständnis komme. Entgegen der vermeintlichen öffentlichen Meinung verwende diese Werbeform nur anonyme beziehungsweise pseudonyme Nutzungsdaten über besuchte Webseiten, nicht aber über die konkreten Nutzer. Gleichwohl scheint die Werbeindustrie dem Mistrauen vieler Verbraucher gegenüber der von Interessenprofilen geleiteten OBA-Werbung entgegenzukommen und entwickelte das als Selbstregulierungsmaßnahme der Branche bezeichnete „Präferenzmanagement“.

Streitpunkt Opt-out-Verfahren

An diesem Verfahren monieren Datenschützer und Verbraucherverbände, dass es sich um ein Opt-out-Verfahren handelt. „Opt-out“ meint, dass man als Verbraucher aktiv die entsprechende Widerspruchsseite aufsuchen muss, um dort nutzungsbasierter Werbung einzeln zu widersprechen, dem jeweiligen Werbe-Anbieter sozusagen (Browser-) also Hausverbot zu erteilen.

Weiterhin in der Kritik steht, dass das Widerspruchs-System über ein Cookie funktioniere, welches dann über das Hausverbot wacht. Doch dafür muss man das Cookie des Drittanbieters DDOW überhaupt zulassen. Ein für Verbraucher unfreundliches Konzept, meint der Datenschutz-Referent beim Verbraucherzentrale Bundesverband (VZBV), Florian Glatzner:

„Es müsste doch vielmehr so sein, dass die Werbe-Anbieter verpflichtet sind, sich beim Nutzer jeweils die Genehmigung dafür zu holen, ihm nutzungsbasierte Werbung präsentieren zu dürfen“, argumentiert er. „Außerdem muss ich mich bei diesem Verfahren als Verbraucher selbständig darum kümmern, ob sich neue Werbeanbieter dem Widerspruchs-System angeschlossen haben, und ich müsste dann auch bei diesen explizit widersprechen.“

Das größte Problem sei die pauschale Erlaubnis, so Glatzner. Vielen Verbrauchern sei nicht bewusst, dass sie der Nutzung ihrer Daten für verhaltensbezogene Werbung bereits dadurch zustimmen, dass sie inaktiv bleiben und sich keinen Opt-out-Cookie setzen lassen.

Nicht weniger deutlich äußert sich Alexander Dix, Datenschutzbeauftragter des Landes Berlin: „Beim DDOW-Prozedere handelt es sich eindeutig um ein Opt-out-Verfahren. Und damit genügt es nicht der EU-Cookie-Richtlinie. Diese sieht eindeutlig ein Opt-in-Verfahren vor.“ Dem will Bernd Nauen, Justitiar beim DDOW, gar nicht widersprechen: „Ja, in technischer Hinsicht handelt es sich um ein Opt-out-Prinzip.“

Allerdings, so Nauen, könne der Nutzer ja mehreren, dem OBA-Verfahren angeschlossenen Anbieter gleichzeitig das Opt-out erklären. „Unser System ermöglicht ihnen, gezielter und granularer zu entscheiden. Und das wollten viele Nutzer auch – die meisten begrüßen zielgruppenspezifische Werbung.“ Das DDOW-Verfahren sei ein praktikabler Mittelweg für die Nutzer, ihre „informationelle Selbstbestimmung wahrzunehmen“, heißt es dazu auf der Website des Dachverbands der Deutschen Werbewirtschaft, ZAW.

Aufklärung über Cookies – per Cookie

Trotz dieser zur Schau getragenen Gewissheit über die Beliebtheit nutzungsbasierter Werbung startete der DDOW Anfang August seine Kampagne im Web. Als Erkennungszeichen dient das nebenstehende Piktogramm, das auf Bannern und Werbemotiven appliziert ist.

Allerdings basiert auch dieses Icon – wie der ganze OBA-Mechanismus – auf einem Drittanbieter-Cookie. Das aber sperren viele pauschal aus ihrem Browser aus. Wieviele der Nutzer dies praktizieren, dazu können weder die Online- und Werbeverbände noch Daten- oder Verbraucherschützer Zahlen vorlegen – auch wenn alle solche Erhebungen gerne hätten.

Laut Alexander Dix ist in den vergangenen Jahren aber die Sensibilität gegenüber Cookies und Online-Werbung generell gestiegen. Das hätte zuletzt die recht rege Diskussion um den Werbeblockierer „Adblock Plus“ und dessen gezielter Datensammelpraxis gezeigt. Daher sei ein relevanter Anteil an Cookie-Blockierern zu vermuten. Wenn diese aber das OBA-Icon gar nicht sehen, dann hieße das doch, so Dix, „dass dieser Personenkreis von der Aufklärung ausgeschlossen ist. Vermutlich will die Werbebranche ja diese Personen wieder zurückgewinnen, denn deren Einstellungen gehen ja indirekt zu Lasten der Werbewirtschaft.“

Kleines Symbol – mit welcher Wirkung?

Laut VZBV-Referent Glatzner wird das OBA-Piktogramm in der Regel zu klein gesetzt. Einer von ihm zitierten Studie aus den USA zufolge, wo OBA-Zeichen und Widerspruchsverfahren schon länger im Umlauf sind, hätten in den letzen Jahren gerade einmal ein halbes Prozent aller Internet-Nutzer das Icon angeklickt. „Das liegt auch daran, dass es meist in die Bannermotive integriert und dort schwer zu erkennen ist. Zudem gibt es eine Scheu, in ein Banner hinein zu klicken, weil nicht klar ist, dass man eine von der Werbung entkoppelte Meta-Information aktiviert und eben nicht die Werbung selbst anklickt, was man vielleicht gar nicht will“, so Glatzner.

Wieviele Verbraucher im Verlauf der acht Kampagnenwochen die Aufklärungsangebote beziehungsweise das Widerspruchsverfahren wahrgenommen haben, darüber soll eine vom DDOW beauftragte Evaluation Aufschluss bringen. Unabhängig davon steht für Datenschützer Dix fest, die Werbewirtschaft weiter zu fordern.

Dix sagt: „Die Artikel 29-Gruppe der europäischen Datenschutzbeauftragten hat sich dazu mehrfach geäußert und klargestellt, dass sie eine Opt-in-Lösung erwartet. Es gab ja Gespräche mit dem Geschäftsführer des ZAW, dabei stand zur Diskussion, dass die Werbewirtschaft Verhaltenskodizes entwickelt, die wir entsprechend prüfen können. Doch diese Kodizes sind uns bisher nicht vorgelegt worden.“

Selbstschutzmaßnahmen im Browser

Unabhängig davon, ob sich Datenschützer und Werbeindustrie irgendwann auf ein für beide Seiten akzeptables Prozedere einigen: Als Internetnutzer kann man sich durch Einstellungen im Browser gegen fremde Datensammler schützen. Neben der bereits erwähnten Aussperrung von Cookies von Drittanbietern und den Werbeanzeigen-Blockern wie Adblock gibt es seit einiger Zeit auch die Option, das Tracking generell zu verbieten.

Meist im Bereich „Datenschutz“ der Einstellungen untergebracht und als „Do not track“-Option bezeichnet, weist die Option jede besuchte Website an, auf die Verfolgung von Klicks und Eingaben des Nutzers zu verzichten. Das können die Website zwar technisch umgehen und es ist rechtlich nicht eindeutig geklärt, ob Website-Betreiber und Werbeanbieter gegen Datenschutzgesetze verstoßen, wenn sie sich nicht daran halten. Gleichwohl kann man so das Tracking generell verbieten oder zulassen, derzeit in den aktuellen Versionen der Browser Firefox, Internet Explorer, Safari, Chrome und Opera.

Wer sich dafür interessiert, mit wem eine besuchte Webseite im Hintergrund Daten austauscht, kann Browser-Erweiterungen oder spezielle Programme installieren. Sie machen jede einzelne Anfrage sichtbar und ermöglichen es, den Umgang damit genau zu steuern. Zu den bekannteren gehören Ghostery und die von Mozilla entwickelte Erweiterung „Collusion“. Allerdings steht „Ghostery“ unter dem Verdacht, dass es die mit ihm und von den Nutzern erteilten Zugriffsverbote und Einstellungen (heimlich) sammelt und veräußert, ähnlich wie Adblock-Plus.

April 12 2013

Wer schützt uns vor den Datenschützern?

Mir wurde heute eine nette kleine Anekdote zum Thema Datenschutz in Schilda Deutschland zugetragen. Der niedersächsische Datenschutzbeauftragte war bekanntlich ja schon immer mal für eine datenschutzrechtliche Groteske gut und er liefert auch aktuell wieder.

Denn der Landesdatenschutzbeauftragte verstößt – wie auch der gesamte Server niedersachsen.de – höchstselbst gegen datenschutzrechtliche Vorschriften.

Die Website des Landesdatenschutzbeauftragten enthält nämlich folgenden Tracking-Code des Tools Piwik:

Tracking-Piwik

In der Datenschutzerklärung des Landesdatenschutzbeauftragten findet sich dazu nichts. An anderer Stelle kann man allerdings beim Datenschutzbeauftragten den folgenden Hinweis finden:

Aus datenschutzrechtlicher Sicht sind viele der bekannten verwendeten Webtracking-Dienste unzulässig

Das würde ich so pauschal zwar nicht formulieren. Unzulässig ist es aber in jedem Fall dann, wenn man die Nutzer des Angebots nicht auf das Tracking hinweist. Die Nutzung des Tracking-Tools gehört wegen § 13 Abs. 1 TMG zumindest in die Datenschutzerklärung.

Wer kontrolliert eigentlich Datenschutzbehörden, die sich selbst nicht an die gesetzlichen Vorgaben halten?

May 22 2012

Quantified me

For some reason I have an aversion to the quantified self terminology. I guess I'm suspicious of excessive overt tracking of stuff that I hope to make into unconscious habit. It probably goes back to when I used to be a runner. I ran a couple of marathons and I would of course log every run and used upcoming races to motivate my training. I ran with a pulse monitor and used the real-time feedback to adjust my pace to the intention of each training session.

I was incredibly disciplined about my training right up until I stopped improving. Once I plateaued I just couldn't stick with it. I experienced a similar pattern with biking, rowing, yoga, and everything else I tried. Train hard, track everything, plateau, quit.

Then a few years ago I read about a study that looked at motivation and it made the point that sometimes leaving things open ended actually improves our ability to stick with it. I've been looking for that study for two years but can't find it again. It has stuck in my head though and fundamentally changed how I think about things. It's made much more skeptical of the value of competitions and other goals in achieving long-term fitness. And something is different for me now because I've been doing CrossFit for three years without quitting. Of course, it might just be that I haven't plateaued yet. But I also think nurturing an open-ended mindset has helped.

Having plateaued and quit so many times I guess I'm just skeptical of the value of tracking the minutia of my exercise life. I wouldn't have known I plateaued if I hadn't tracked the data after all.

So not too long ago when Sara Winge forwarded me a link to an article on the "datasexual" with the subject line "You've been memed" I was taken aback. "Me? I don't track stuff. I don't own a Fitbit. In fact, I'm a huge skeptic of the value of all this stuff. To me it seems too much like putting the cart of technology before the horse of just doing the work." But then I thought about it honestly and I had to admit it. Who am I kidding? I'm an obsessive tracker.

I track every Crossfit workout on Beyond The Whiteboard. I started a paleo / ancestral health diet in December and I use a kitchen scale to measure portions. I kept a journal of every meal for three months and when that got cumbersome I started taking a picture of them with my phone. I do it to encourage consciousness of what I'm eating and to make sure I'm keeping my macronutrient balance where it should be. I weigh myself at least three times each week and log weight, waist, and neck measurements each time to estimate body fat.

Quantiifed data

Not too long ago after I rowed what felt like a fast 2k during a crossfit workout I dug up my old logs from the '90s to see how it compared to the twenty-something me (slower of course, but not awful). I still had those logs and knew where to find them.

From there it gets more obsessive. Once I changed my eating habits I started getting a full lipid panel and other tests every three months to assess the impact of my new high fat / low carb diet (I get over 2/3 of calories from fats now). The next time around I plan to add tests for inflammation markers and a few other things.

I wasn't happy with my doctor only being able to order fasting blood sugar though, so I bought a glucometer and started monitoring my own real-time blood sugar. I measure fasting and +1, +2, and +3 hour postprandial glucose levels after various meals to evaluate my insulin response and to better tune my diet. I also occasionally measure pre- and post-workout glucose levels to optimize when to workout relative to mealtime.

Periodic at home A1c tests verify that my long-term glucose levels are in keeping with what I'm measuring in real time — as a correlation to verify test accuracy and to help me interpret the short-term results. Oh, and I ordered a 23andMe test kit to see (among other things) if I have any genetic disposition to diabetes.

So, I guess I have to admit it. Quantifying the self isn't just something other people do, it's something I do. Yet I remain a skeptic.

The line I'm trying to walk is between obsessive tracking that results in post-plateau burnout and using tracking to maintain awareness and intention while trying to remain open ended. "Maybe I'll work out today." "Maybe I'll lose a few pounds, or maybe I'll gain a few." But at the same time I want to take advantage of the awareness that comes from tracking. More importantly, I want to know what the data says about how healthy I am. A degradation in insulin response wouldn't just be a problem with a plateauing exercise program after all, it would have major long-term health impact.

Related:

January 11 2012

The rise of programmable self

Programmable self is a riff on the Quantified Self (QS). It is a simple concept:

Quantify what you want to change about yourself + motivational hacks = personal change success.

There are several potential "motivation hacks" that people regularly employ. The simplest of these is peer pressure. You could tell all of your co-workers every morning whether you kept your diet last night, for instance. Lots of research has shown that sort of thing is an effective motivator for change. Of course, you can make peer pressure digital by doing the same thing on Facebook/Twitter/Google+/whatever. Peer pressure has two components: shame and praise. It's motivating to avoid shame and to get praise. Do it because of a tweet and viola, you have digital peer pressure motivation.

Several books have recently popularized using money, in one form or another, as a motivational tool. There is some evidence, for instance, that people feel worse about losing $10 then they feel good about earning $10. This is called loss aversion, and it can easily be turned into a motivational hack. Having trouble finishing that book? Give 10 envelopes with $100 each to your best friend. Instruct them to mail the envelopes to your favorite (or most hated) charity for each month that you do not finish a chapter. Essentially, you've made your friend a "referee" of your motivational hack.

So, is there any potential to automate this process? To use software to hack your own motivation? One of the coolest applications that does just that is Stickk.com, which is designed to electronically manage contracts you make with yourself.

But that, by itself, is not programmable self.

Programmable self is the combination of a digital motivation hack, like Stickk, with a digital system that tracks behavior, like Fitbit (that's the Quantified Self part). You have to have both. Recently, for example, Stickk started supporting the use of the Withings Scale to support weight entries. Withings is a Wi-Fi-enabled scale that broadcasts your weight automagically to the Withings servers. From there, Withings will send your weight generally wherever you want: HealthVault, other personal health record (PHR) systems, or over to Stickk.com. With that feature, Stickk became a programmable-self platform.

Stickk is pretty old, and Lose it or Lose It, which is focused specifically on losing weight, is also ancient in Internet time. It launched in 2009. The site requires you to take a picture of a weekly weigh in (you actually photograph the scale) and send it in. That counts as digital tracking, but I wonder if it supports Withings (or if it will).

In October 2011, Beeminder launched, billing itself as a direct Stickk competitor, but "for data geeks." Indeed, it is a little geeky: Beeminder is focused on weight change and other goals that are numerically similar to weight gain. The notion is that there is a proper path for the improvement of certain numbers — as well as a little "data jitter" to eliminate — in order to improve. Beeminder also refers to the classical term for the lack of self discipline: akrasia — so bonus points for that.

Last November, The Eatery launched from Massive Health. Massive Health is a massively funded dream team, and their first app is a classic programmable-self experiment. You simply take pictures of your food with your camera (digital tracking = photos) and let others rate your food choices (motivation hack = praise/shame). It's a good idea, and you can expect lots more from Massive Health that qualifies as programmable self.

Recently, GymPact made a big splash, even ending up in a New York Times blog post. Gympact is an iOS (soon Android) app that lets you check in at the gym. If you fail to check in, you get charged a fee. If you do keep your commitment to go to the gym, then you also earn some of the money from all of the people who failed to go to the gym.

Finally, Buster Benson and Jen S. McCabe are working on Bud.ge, which might be the first of the programmable-self platform plays.

All of these count as programmable self. I seriously doubt that any of these companies were aware of my original interview about programmable self or would even be comfortable with the term, which sounds pretty geeky and devious. (Which is, of course, why I love it.)

Other friends of mine in the serious games/games for health/gamification movement would probably count as programmable self, too. But some of them seem convinced that "fun" can have a deeper component in motivation then some of the more aggressive techniques that I, and other programmable self people, seem to favor. I should also mention that I am hardly the only one in the QS movement stumbling in this direction.

I will be writing about programmable self on Radar occasionally, but there is a lot more going on than I can track here . That's why I've also made a Tumblr about the subject and filled it with all of the "software for behavior change" goodness that anyone can take. My @fredtrotter Twitter account is mostly focused on programmable self as well.

Most importantly, I want to hear about what you have tried to do with your own personal change hacks, especially those that impact your health in one way or another. For that, I have set up a Programmable Self Google Group. Please join us. Some of the top minds in behavior change are already subscribers.

The Quantified Self movement is not primarily about the "tool creators" who make stuff for people to use, but a movement of users who defy the boundaries of tools and manage to create innovative quantification tools on their own. Many of these efforts also count as programmable-self approaches. No discussion of programmable self can ignore the work of individuals, so here is a decidedly non-exhaustive list of people innovating in this space:

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

December 08 2011

Strata Week: The looming data science talent shortage

Here are a few of the big-data stories that caught my attention this week.

Data scientists in demand

This week, EMC released (pdf) the findings of its recent survey of the data science community. Calling it the largest ever survey of its kind, the EMC Data Science Study included responses from more than 500 data scientists, information analysts, and data specialists from the U.S., U.K., France, Germany, India and China.

The majority of respondents (83%) said they believed that new technologies would increase the need for data scientists. But 64% also felt as though this new demand for data scientists would outstrip the supply (31% said demand would "significantly outpace" supply). Just 12% felt as though future data science jobs would be filled by current business intelligence professionals.

Chart from Data Science Revealed studyThe source for future talent? College students, not surprisingly — 34% said future data science jobs would go to computer science grads; 24% said these jobs would go to those from other disciplines. And in the case of data scientists, those may well be college students with masters or PhDs — some 40% of data scientists have an advanced degree, and nearly one in 10 have a doctorate. In comparison, less than 1% of business intelligence professionals have a PhD.

But the problems that the data science community faces aren't simply a future talent shortage. Just a third of respondents said they were confident in their company's ability to make data-driven business decisions. Again, respondents pointed to a shortage of employees with the right training or skills (32%). Budget shortages were also an issue (32%).

Another problem uncovered by the survey: data accessibility. Just 12% of business intelligence analysts and 22% of data scientists say they "strongly believe" that employees have the access they need to run experiments on data.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Carrier IQ and big data

The mobile intelligence company Carrier IQ has gone from obscurity to infamy following the discovery by Android developer Trevor Eckhart that Carrier IQ's rootkit software could record all sorts of user data — texts, web browsing, keystrokes, and even phone calls.

The software is on an estimated 100 million phones — Android and iOS alike — and the news of it has prompted calls for an FTC investigation, questions from a Senator, and class-action lawsuits.

Carrier IQ issued a statement, explaining that "Our software makes your phone better by delivering intelligence on the performance of mobile devices and networks to help the operators provide optimal service efficiency."

But at GigaOm, Kevin Fitchard called Carrier IQ's relationships to handset makers and carriers a "bizarre big-data triangle":

This is big data for the mobile world — massive databases of consumer behavior delving into when, how and in what manner we use our devices. By Carrier IQ's own admission, its software is embedded in more than 150 million handsets. There are plenty of companies that would find that information enormously useful. The problem is Carrier IQ never got permission from all these smartphone users to collect that data, never told them it was gathering it, and never provided a way of opting out.

DataSift will soon offer access to historical tweets

DataSift Historical DataIt was April of last year when Twitter announced it was donating its entire archive to the Library of Congress, and since then, researchers have been waiting to get their hands on this older Twitter data.

As it currently stands, you can only search Twitter back as far as a week. And while you can get access to the Twitter firehose, that's little help at looking at the historical record.

But starting soon, developers and researchers will have access to a bit more of that record when DataSift begins offering historical data. DataSift's alpha version will offer access to 60 days' worth of the Twitter feed, and when the service formally launches next year, DataSift promises more data.

It's not quite the Library of Congress, which, as we noted earlier this year, is working on the technology infrastructure to make the historical Tweets indexable and accessible. The Library of Congress does have access to the Twitter firehose (via the other stream provider, Gnip), so it looks like that's where the complete record will, for now at least, reside.

Got data news?

Feel free to email me.

Related:

November 18 2011

02mydafsoup-01
[...]

  • Facebook doesn’t track everybody the same way. It uses different methods for members who have signed in and are using their accounts, members who are logged-off and non-members.
  • The first time you arrive at any Facebook.com page, the company inserts cookies in your browser. If you sign up for an account, it inserts two types of cookies. If you don’t set up an account, it only inserts one of the two types.
  • These cookies record every time you visit another website that uses a Facebook Like button or other Facebook plugin — which work together with the cookies to note the time, date and website being visited. Unique characteristics that identify your computer are also recorded.
  • Facebook keeps logs that record your past 90 days of activity. It deletes entries older than 90 days.
  • If you are logged into a Facebook account, your name, email address, friends and all of the other data in your Facebook profile is also recorded.

  • [...]
    Facebook Reveals its User-Tracking Secrets | mashable.com 2011-11-17
    Reposted byRKkrekkwartemalfoxbanana

    November 10 2011

    Strata Week: The social graph that isn't

    Here are a few of the data stories that caught my attention this week:

    Not social. Not a graph.

    Graph Paper by Calsidyrose, on FlickrIt's hardly surprising that the founder of a "bookmarking site for introverts" would have something to say about the "social graph." But what Pinboard's Maciej Ceglowski has penned in a blog post titled "The Social Graph Is Neither" is arguably the must-read article of the week.

    The social graph is neither a graph, nor is it social, Ceglowski posits. He argues that today's social networks have failed to capture the complexities and intricacies of our social relationships (there's no graph) and have become something that's at best contrived and at worst icky (actually, that's not the "worst," but it's the adjective Ceglowski uses).

    From his post:

    Imagine the U.S. Census as conducted by direct marketers — that's the social graph. Social networks exist to sell you crap. The icky feeling you get when your friend starts to talk to you about Amway or when you spot someone passing out business cards at a birthday party, is the entire driving force behind a site like Facebook. Because their collection methods are kind of primitive, these sites have to coax you into doing as much of your social interaction as possible while logged in, so they can see it.

    But if today's social networks are troublesome, they're also doomed, Ceglowski contends, much as the CompuServes and the Prodigys of an earlier era were undone. It's not so much a question of their being out-innovated, but rather they were out-democratized. As the global network spread, the mass marketing has given way to grassroots efforts.

    "My hope," Ceglowski writes, "is that whatever replaces Facebook and Google+ will look equally inevitable and that our kids will think we were complete rubes for ever having thrown a sheep or clicked a +1 button. It's just a matter of waiting things out and leaving ourselves enough freedom to find some interesting, organic, and human ways to bring our social lives online."

    Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

    Save 20% on registration with the code RADAR20

    Cloudera raises $40 million

    ClouderaThe Hadoop-based startup Cloudera announced this week that it has raised another $40 million in funding, led by Ignition Partners, Greylock, Accel, Meritech Capital Partners, and In-Q-Tel. This brings the total investment in the company to some $76 million, a solid endorsement of not just Cloudera but of the Hadoop big data solution.

    Hadoop is a trend that we've covered almost weekly here as part of the Strata Week news roundup. And GigaOm's Derrick Harris has run some estimates on the numbers of the Hadoop ecosystem at large, finding that: "Hadoop-based startups have raised $104.5 million since May. The same set of companies has raised $159.7 million since 2009 when Cloudera closed its first round."

    While it's easy to label Hadoop as one of the buzzwords of 2011, the amount of investor interest, as well as the amount of adoption, is an indication that many people see this as a cornerstone of a big data strategy as well as a good source of revenue for the coming years.

    Kaggle raises $11 million to crowdsource big data

    KaggleIt's a much smaller round of investment than Cloudera's, to be sure, but Kaggle's $11 million Series A round announced this week is still noteworthy. Kaggle provides a platform for running big data competitions. "We're making data science a sport," so its tagline reads.

    But it's more than that. There remains a gulf between data scientists and those who have data problems to solve. Kaggle helps bridge this gap by letting companies outsource their big data problems to third-party data scientists and software developers, with prizes going to the best solutions. Kaggle claims it has a community of more than 17,000 PhD-level data scientists, ready to take on and resolve companies' data problems.

    Kaggle has thus far enabled several important breakthroughs, including a competition that helped identify new ways to map dark matter in the universe. That's a project that had been worked on for several decades by traditional methods, but those in the Kaggle community tackled it in a couple of weeks.

    The Supreme Court looks at GPS data tracking

    The U.S. Supreme Court heard oral arguments this week in United States v. Jones, a case that could have major implications on mobile data, GPS and privacy. At issue is whether police need a warrant in order to attach a tracking device to a car to monitor a suspect's movements.

    Surveillance via technology is clearly much easier and more efficient than traditional surveillance methods. Why follow a suspect around all day, for example, when you can attach a device to his or her car and just watch the data transmission? But it's clear that the data you get from a GPS device is much more enhanced than human surveillance, so it raises all sorts of questions about what constitutes a reasonable search. And while you needn't get a warrant to shadow someone's car, attaching that GPS tracking device might just violate the Fourth Amendment and the protection against unreasonable search and seizure.

    But what's at stake is much larger than just sticking a tracking device to the underbelly of a criminal suspect's vehicle. After all, every cell phone owner gives off an incredible amount of mobile location data, something that the government could conceivably tap into and monitor.

    During oral arguments, Supreme Court justices seemed skeptical about the government's power to use technology in this way.

    Got data news?

    Feel free to email me.

    Photo: Graph Paper by Calsidyrose, on Flickr

    Related:

    November 02 2011

    What does privacy mean in an age of big data?

    As we do more online — shop, browse, chat, check in, "like" — it's clear that we're leaving behind an immense trail of data about ourselves. Safeguards offer some level of protection, but technology can always be cracked and the goals of data aggregators can shift. So if digital data is and always will be a moving target, how does that shape our expectations for privacy? Terence Craig (@terencecraig), co-author of "Privacy and Big Data," examines this question and related issues in the following interview.

    Your book argues that by focusing on how advertisers are using our data, we might be missing some of the bigger picture. What are we missing, specifically?

    Terence CraigTerence Craig: One of the things I tell people is I really don't care if companies get more efficient at selling me soap. What I do care about is the amount of information that is being aggregated to sell me soap and what uses that data might be put toward in the future.

    One of the points that co-author Mary Ludloff and I tried to make in the book is that the reasons behind data collection have nothing to do with how that data will eventually be used. There's way too much attention being paid to "intrusions of privacy" as opposed to the problem that once data is out there, it's out there. And potentially, it's out there as long as electronic civilization exists. How that data will be used is anybody's guess.

    What's your take on the promise of anonymity often associated with data collection?

    Terence Craig: It's fundamentally irresponsible for anyone who collects data to claim they can anonymize that data. We've seen the Netflix de-anonymization, the AOL search release, and others. There's been several cases where medical data has been released for laudatory goals, but that data has been de-anonymized rather quickly. For example, the Electronic Frontier Foundation has a piece that explains how a researcher was able to connect an anonymized medical record to former Massachusetts governor William Weld. And in relation to that, a Harvard genome project tries to make sure people understand the privacy risks of participating.

    If we assume that companies have good will toward their consumers' data — and I'll assume that most large corporations do — these companies can still be hacked. They can be taken advantage of by bad employees. They can be required by governments to provide backdoors into their systems. Ultimately, all of this is risky for consumers.

    Assuming that data can't be anonymized and companies don't have malicious plans for our personal data, what expectations can we have for privacy?

    Terence Craig: We've moved back to our evolutionary default for privacy, which is essentially none. Hunter-gatherers didn't have privacy. In small rural villages with shared huts between multi-generational families, privacy just wasn't really available there.

    The question is how do we address a society that mirrors our beginnings, but comes with one big difference? Before, anyone who knew the intimate details of our lives were people we had met physically, and they were often related to us. But now the geographical boundary has been erased by the Internet, so what does that mean? And how are we as a society going to evolve to deal with that?

    With that in mind, I've given up on the idea of digital privacy as a goal. I think you have to if you want to reap the rewards of being a full participant in a digitized society. What's important is for us to make sure we have transparency from the large institutions that are aggregating data. We need these institutions to understand what they're doing with data and to share that with people so we, in aggregate, can agree whether or not this is a legitimate use of our data. We need transparency so that we — consumers, citizens — can start to control the process. Transparency is what's important. The idea that we can keep the data hidden or private, well ... that horse has left the stable.

    What's the role of governments here, both in terms of the data they keep but also the laws they pass about data?

    Terence Craig: Basically anything the government collects, I believe should be made available. After all, governments are some of the largest aggregators of data from all sorts of people. They either purchase it or they demand it for security needs from primary collectors like Google, Facebook, and the cell phone companies — the millions of requests law enforcement agencies sent to Sprint in 2008-2009 was a big story we mentioned in the book. So, it's important that governments reveal what they're doing with this information.

    Obviously, there's got to be a balance between transparency and operational security needs. What I want is to have a general idea of: "Here's what we — the government — are doing with all of the data. Here's all of the data we've collected through various means. Here's what we're doing with it. Is that okay?" That's the sort of legislation I would like, but you don't see that anywhere at this point.

    This interview was edited and condensed.

    Privacy and Big Data — This book introduces you to the players in the personal data game, and explains the stark differences in how the U.S., Europe, and the rest of the world approach the privacy issue.

    Related:

    Reposted byschlingelulexElbenfreund

    October 03 2011

    02mydafsoup-01
    via oAnth (reposted) at Diaspora*

    for those who are looking for a decentralized social network platform which in my opinion has realistic potentials in the coming years to develop as a central base for the international protest movement


    -------------------------------------------------

    oAnth:

    this entry is part of the OccupyWallStreet compilation 2011-09/10, here.

    September 14 2011

    Social data: A better way to track TV

    Solid State by skippyjon, on FlickrNielsen families, viewer diaries, and TV meters just won't cut it anymore. Divergent forms of television viewership require new audience measurement tools. Jodee Rich (@WingDude), CEO and founder of PeopleBrowsr, says social data is the key to new toolsets because it reveals both viewing behavior and sentiment.

    Rich explores the connection between social data and television analytics in the following interview. He'll expand on these ideas during a presentation at next week's Strata Summit in New York.

    Nielsen has been measuring audience response since the era of radio, yet the title of your Strata talk is "Move over, Nielsen." What is Nielsen's methodology, and why does it no longer suffice?

    Jodee RichJodee Rich: Nielsen data is sampled across the United States from approximately 20,000 households. Data is aggregated every night, sent back to Nielsen, and broken out by real-time viewings and same-day viewings.

    There are two flaws in Nielsen's rating system that we can address with social analytics:

    1. Nielsen's method for classifying shows as "watched" — The Nielsen system does not demonstrate a show's popularity as much as it showcases which commercials viewers tune in for. If a person switches the channel to avoid commercials, the time spent watching that show is not tallied. The show is only counted as watched in full when the viewer is present for commercials.
    2. Nielsen ratings don't measure mediums other than television — The system does not take into account many of the common ways people now access shows, including Hulu, Netflix, on-demand, and iTunes.

    How does social data provide more accurate ways of measuring audience response?

    Jodee Rich: Social media offers opportunities to measure sentiment like never before. The volume of data available through social media outlets simply dwarfs Nielsen's sample base of 20,000 households. Millions of people form the social media user base, and naturally that base is more representative of the dynamics of an evolving demographic.

    It's not just the volume, however. Social media values real-time engagement over passive participation. We can see not just what people are watching, but also monitor what they say about it. By observing actively engaged people, we can better discern who the viewers are, what they value, what they discuss, how often they talk about these things, and most importantly, how they feel about it. This knowledge allows brands to tailor messages with very high relevance.

    Strata Summit New York 2011, being held Sept. 20-21, is for executives, entrepreneurs, and decision-makers looking to harness data. Hear from the pioneers who are succeeding with data-driven strategies, and discover the data opportunities that lie ahead.

    Save 30% on registration with the code ORM30

    How will these new measurement tools benefit viewers?

    Jodee Rich: With social data, the television experience will be better catered to viewers. Broadcasters will enrich the viewing experience by creating flexible, responsive services that are sensitive to real people's tastes and conversations. We believe that ultimately this will make for more engaging entertainment and prolong the lives of the shows people love.

    This interview was edited and condensed

    Photo: Solid State by skippyjon, on Flickr

    Related:

    August 19 2011

    02mydafsoup-01

    Ghostery | Detect - Learn - Control


    Ghostery sees the invisible web - tags, web bugs, pixels and beacons. Ghostery tracks the trackers and gives you a roll-call of the ad networks, behavioral data providers, web publishers, and other companies interested in your activity.

    ---------------------------

    // oAnth  (added 2011-08-22)

    Before you try to install Ghostery there are some aspects worth to concider.
    'Reviews for Ghostery'
    - https://addons.mozilla.org/en-US/firefox/addon/ghostery/reviews/?page=8

    I see here more advantages than risks and installed it.

    Once installed, you may configure the application according to your individual privacy demands.
    Reposted bycheg00 cheg00

    August 11 2011

    IVW-Tracking-Tool doch nicht datenschutzkonform?

    Vor drei Tagen habe ich darüber berichtet, dass der Hamburgische Datenschutzbeauftragte in einer Pressemitteilung verkündet hat, dass das Tracking-Tool der Fa. INFOnline GmbH, das u.a. von der IVW für ihre Reichweitenmessung eingesetzt wird, nunmehr datenschutzkonform sei.

    Kris Köhntopp hat diese Aussage aus technischer Sicht unter die Lupe genommen und erläutert anschaulich, warum eine Verbesserung des Datenschutzniveaus gerade nicht erreicht worden ist. Denn das Tracking des IVW basiert auf dem Einsatz von Cookies, wobei nach Ansicht Köhntopps im konkreten Fall mehr Daten erhoben werden als für die statistischen Zwecke der IVW notwendig.

    Auf Google+ hatte zuvor bereits Jürgen Kuri von der c’t darauf hingewiesen, dass das IVW-Zählverfahren offensichtlich dadurch datenschutzkonform gemacht wird, dass eine Opt-out-Möglichkeit eingeführt wird, die wiederum ein permamentes Cookie setzt. Denn durch dieses Cookie erkennt die IVW, dass man nicht gezählt werden möchte.

    Das ist aus datenschutzrechtlicher Sicht gänzlich widersinnig, um nicht zu sagen absurd und verdeutlicht einmal mehr, dass der Datenschutz in seiner bisherigen Form im Netz schlecht bis gar nicht funktioniert. Und ein Landesdatenschutzbeauftragter bejubelt den zusätzlichen Einsatz von Cookies als Durchbruch für den Datenschutz.

    Mir stellt sich hier dann aber auch die Frage, wie die Anbieter solcher Tools reagieren werden, wenn dieses Gesetzgebungsvorhaben tatsächlich das Ende der Cookies ohne Einwilligung einläutet? Was dann Herr Caspar?

    Wann kommt der längst überfällige Reality-Check im Datenschutzrecht?

    August 08 2011

    Hamburger Datenschutzbeauftragter: IVW-Tracking-Tool jetzt datenschutzkonform

    Der Hamburgische Datenschutzbeauftragte hatte sich Anfang des Jahres als Steinewerfer im Glashaus entpuppt, nachdem er einräumen musste, dass seine eigene Website – damals Teil von “hamburg.de” – gemessen an den eigenen Kriterien nicht datenschutzkonform war. Ein zentraler Kritikpunkt war das Tracking-Tool der Fa. INFOnline GmbH, das u.a. von der IVW für ihre Reichweitenmessung eingesetzt wird.

    Dieses Tracking-Tool soll nunmehr datenschutzkonform ausgestaltet sein, wie der Hamburgische Datenschutzbeauftragte in einer Pressemitteilung erläutert, u.a. weil die erfassten IP-Adressen um das letzte Oktett gekürtzt werden, wie es heißt.

    Das würde dann aber auch bedeuten, dass Google Analytics ebenfalls datenschutzkonform sein müsste, wenn das von Google angebotene sog. IP-Masking eingesetzt wird. Ich bin gesapnnt, ob der Hamburger Datenschutzbeauftragte, der auch für Google zuständig ist, diese Schlussfolgerung ebenfalls ziehen wird.

    Die Datenschutzerklärung von hamburg.de bleibt allerdings in Teilen dieskussionswürdig, insbesondere was die Passage zu den Social-Plug-Ins (Ziff. 6) angeht.

    June 16 2011

    Strata Week: The effort to digitize Palin's email archive

    Here are a few of the data stories that caught my attention this week:

    Sarah Palin's Inbox

    Last Friday, in response to a years-old public records request, the state of Alaska finally released some 24,000 pages of emails sent by former governor Sarah Palin. And "pages" really is the operative word here. Palin's emails were all printed out — about 250 pounds of paper all told — at a printing cost of $725 per set. At least initially, the documents were only available to those who picked them up in Juneau — or to those willing to pay the high cost of having the six boxes mailed elsewhere.

    Various organizations worked quickly to digitize the documents, but the task was so daunting that there were calls from many news agencies, including The New York Times to crowdsource the review of the emails.

    The Sunlight Foundation, an open government advocacy group, unveiled Sarah's Inbox this week, a site that makes it easier for people to search and examine Palin's emails.

    The project echoes a similar one undertaken by the Sunlight Foundation last year when the group made a searchable interface for then Supreme Court nominee Elena Kagan's emails.

    Sample email from Sarah's Inbox project
    One of Sarah Palin's many email messages archived at Sarah's Inbox.

    As the Sunlight Foundation notes:

    Like Elena's Inbox, Sarah's Inbox faced staggering issues of data quality because government officials continue to release digital files as hideous printouts requiring a laborious and error-ridden optical character recognition (OCR) pass over. You will notice that many of the emails are garbled, incomplete or contain odd characters — please keep in mind that we did the best with what we had and are not responsible for the content. Due to the programmatic nature of the tools used to build this site, we recommend checking any research effort against the source files.

    Legal limits on location data

    Roughly two months after the iOS location story broke here on Radar, the U.S. legislature has taken steps to limit how both the government and private companies can use location data.

    Two bills were introduced this week — one in the House and one in the Senate. The latter was proposed by Senators Al Franken and Richard Blumenthal and would require companies to obtain users' consent before sharing information about the location of a mobile device. The other bill, proposed by Representative Jason Chaffetz and Senator Ron Wyden, would require law enforcement agencies to obtain a warrant in order to track someone's location via their mobile phone.

    The proposals are part of a larger effort to update digital privacy laws, as legislators seem to grow increasingly concerned about consumer protections and data security.

    Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science -- from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

    Save 20% on registration with the code STN11RAD

    LexisNexis open sources its Hadoop alternative

    LexisNexisResearch company LexisNexis announced this week that it will open source its big data processing tools. LexisNexis is positioning its High Performance Computing Cluster (HPCC) Systems as an alternative to Hadoop, boasting that it can "process, analyze, and find links and associations in high volumes of complex data significantly faster and more accurately than current technology systems."

    LexisNexis has a long history of working with big datasets and it began developing HPCC Systems internally in its Risk Solutions unit a decade ago. Risk Solutions CEO James Peck says the company has opted to open source HPCC in order to leverage the "innovation of the open source community to further the development of the platform for the benefit of our customers and the community."

    HPCC Systems is comprised of a data-centric programming language and two processing platforms: the Thor Data Refinery Cluster and the Roxie Rapid Data Delivery Cluster.

    We've been watching the Hadoop competition heat up over the last few months, and the entry by LexisNexis makes the development of big data technologies and the big data market even more interesting.

    Got data news?

    Feel free to email me.



    Related:


    April 27 2011

    The iPhone tracking story, one week later

    By Alasdair Allan and Pete Warden

    It's now been a week since we published the iPhone tracking story, so it seemed a good time to cover what we've learned.

    The fix

    iPhone trackApple has just released a Q&A covering this problem and they will be fixing the issues we spotted with a software update. "The reason the iPhone stores so much data is a bug we uncovered," Apple notes in the statement.

    Apple explains that nearby locations are pulled down from an Apple database and stored on the phone. These locations are from a "crowd-sourced database of Wi-Fi hotspot and cell tower data." This matches the picture that was emerging from research. It explains why there's lots of locations that don't match towers, and also why the accuracy is within a few-hundred meters, since we've learned that "micro-cells" in urban areas are clustered closely together.

    The Q&A explains the technical workings behind the log and reassures us that only anonymous data is sent back. Our conclusions still apply.

    Apple doesn't address our claim that this reveals sensitive information about your travels. At this point we're just relieved to get an explanation and a fix, but people can examine their own data and decide for themselves how happy they would be sharing it with strangers.

    Forensics

    What Does Your iPhone Know About You? More Than You Think — Alexis Madrigal has written a fascinating follow-up piece covering the data that professionals can read from your phone. Using forensics tools like the Lantern program that Alex Levinson helped build, anyone with physical access to the device can construct a picture of the user's life. It's eye-opening what the "law enforcement, government, and corporate examiners" who purchase the system can uncover about your behavior.

    The Tell-all telephone visualization also makes for thoughtful viewing. It's built from details that a German politician forced his cell phone provider to share after it was caught storing six months of location data on its subscribers. I think one of the reasons that the iPhone Tracker application has had so much use is that it shows people their own data in an understandable way. Unfortunately, that means that similar information that's harder to access behind a company's firewall may not get the same scrutiny, just because it's harder to show in a way that connects with people.

    Uses for good

    I've long been a fan of Geoloqi's opt-in service for recording and sharing your travels, but several other projects in the same area have appeared in my inbox over the last few days. Maria Scileppi has created the Living Brushstroke project (see video below) to capture people's movements at events, and turn the data into art. Intriguing and beautiful patterns emerge as people cross paths. It's a very fresh way to look at our lives.



    Related:


    April 24 2011

    Additional iPhone tracking research

    By Alasdair Allan and Pete Warden

    Here's the latest developments on iPhone tracking.

    Android records a short log

    The Guardian has a good overview of Android's equivalent to consolidated.db. It records the last 50 cell locations, and the last 200 Wi-Fi networks, but older entries are overwritten. As we mentioned in our original video, this was what we expected on the iPhone when we found the file, and it was the sheer scale and duration of the recording that floored us, along with how easy it was to access on your computer. Android doesn't appear to copy the file over when you sync, so you'd need physical access to the phone to read it.

    Phoning home your location

    In the Wall Street Journal there's a good story covering how phones often send your location back to servers at both Apple and Google. We've known that cell companies are gathering this kind of data, because they need it for their basic operations, but the most interesting question for me is how it's actually stored by these software companies. If it's truly just for improving their location services, it could be anonymized so that it would be hard to figure out an individual's movements if you had the data. Even if it's not, the data is somewhat protected when it's on a company's internal network, since that keeps it further out of reach than a file that's held on your machine.

    Better for tracking travel than home or office locations

    Sean Gorman and my friend Peter Batty have done some impressive work digging into the details of the location data. Their conclusion is that it's hard to spot locations where you spend a lot of time in the same place, like your house or place of work. It's almost as if re-visiting the same spot overwrites a lot of the older data for that place, which would fit with a lot of what we've seen. They also try to quantify the accuracy of the location, pointing out how many outliers appear.

    Even just showing where you've been traveling to is pretty concerning, but it's good to rule out some malicious uses. The work they've done gives us a lot more about the characteristics of the data, I'm looking forward to seeing more of this kind of analysis.

    Intriguingly, their work also has some support for Will Clarke's idea that the locations are associated with cell towers. Peter's data shows a cluster around Mile High Stadium, which he hasn't visited recently but which does have a lot of cell infrastructure. Sean has another map that overlays actual tower locations with his points, and it's clear they don't coincide, but could well be triangulated from multiple towers. Sean's observation fits with our initial hypothesis that the locations are the result of sometimes-inaccurate triangulation from towers, but Peter's is evidence that there's a bias in the data to clustering around tower positions.

    Peter is investigating the WiFiLocation table. This typically contains a lot more points than the cell version, with 219,000 entries in Alasdair's data versus only 29,000 cell points. We didn't visualize this in the application because the derived lat/long points are a lot noisier, but that may be an issue with the quality of the location-lookup tables Apple are using since they switched away from SkyHook. It appears to record the ID of many of the WiFi networks you've come into range of, so I'll be interested to see what Peter and others discover about this data.



    Related:


    April 23 2011

    Search Notes: Search and privacy and writing robots

    This week, we continue looking at search privacy issues and at the ongoing battle between Google, Bing, and Yahoo. Oh, and writing robots — we'll look at those, too.

    Privacy and tracking issues

    Searchers don't often think about privacy, but governments certainly do, and over time, search engines have had to balance gathering as much data as possible to improve search results and concerns about privacy. In 2008, Yahoo was very vocal about their policy of only retaining data for 90 days. Now, they've changed that policy. They'll keep raw search log data for 18 months and "have gone back to the drawing board" regarding other log file data.

    Microsoft and Google keep search logs for 18 months and Yahoo may have found that keeping this data for a shorter period of time put them at a competitive disadvantage. In the new book "In the Plex," Steven Levy talks about how important Google found search data to be early on.

    The search behavior of users, captured and encapsulated in the logs that could be analyzed and mined, would make Google the ultimate learning machine ... Over the years, Google would make the data in its logs the key to evolving its search engine. It would also use those data on virtually every other product the company would develop.

    Perhaps that's why Google hasn't added the new "do not track" header to Chrome. The data is too valuable to provide encouragement for users to opt out.

    Firefox tracking
    Firefox 4 includes a no tracking option. Whether sites choose to accept this is another matter.

    Although, as security researcher Christopher Soghoian said to Wired:

    "The opt-out cookies and their plug-in are not aimed at consumers. They are aimed at policy makers. Their purpose is to give them something to talk about when they get called in front of Congress. No one is using this plug-in and they don't expect anyone to use it."

    And as the Wired article notes, the header doesn't mean much at the moment as companies aren't using it and legislation doesn't require them to.

    Bing continues to gain search share

    Last week, I noted that Bing was slowly gaining search share in the United States. This week, the Bing UK blog said that they are gaining share in the UK as well. Of course, the gain between February and March of 2011 was only .28% and Google is still at 90% share, but hey, Bing will take what they can get.

    Yahoo reports revenue declines

    On Search Engine Land, Danny Sullivan has a great article digging into the details of Yahoo's second quarter earnings. Yahoo is blaming the revenue decline on the new partnership with Microsoft, but the article points out that the explanation isn't as easy as that, and in fact, revenue began declining long before the switch was made.

    Can robots write better content than humans?

    In recent weeks, Google has been in the news for tweaking its algorithms to better rank sites with unique, high-quality content rather than pages from "content farms." But in some cases, can machines write higher quality stories than people? A recent NPR story recounts a journalism face off between a robot journalist and a human journalist ... and the robot won. Certainly, algorithms are great at data extraction and in some cases, at presenting that data. But we probably don't want machines to take over the analysis, do we?



    Related:


    April 22 2011

    iPhone tracking: The day after

    By Alasdair Allan and Pete Warden

    iPhone trackI don't think either of us were expecting to see this story strike such a nerve. There's been some amazing detective work from researchers across the web, and so here's a selection of the most interesting immediate reactions.

    Alex Levinson — Right from launch, we had an FAQ pointing to articles by people like Ryan Neal and Paul Courbis who had found this file (consolidated.db) before, but hadn't understood or been able to communicate its significance. The main reason we went public with this was exactly because it already seemed to be an open secret among people who make their living doing forensic phone analysis, but not among the general public — even pretty geeky people like Alasdair and me. We were freaked out by the implications of this data and how unprotected it was, but most of the forensics community seemed to miss quite how creepy ordinary people would find it.

    I do appreciate how frustrating this must be for Alex though, and would like to apologize personally to him that we didn't include his article among the prior research we cited. Unlike the others, it didn't show up in web searches or the books we referenced. It also didn't help that most of the follow-up articles by other people left out the details that we'd tried to make clear about who found it first. We obviously didn't communicate it as well as we thought we had, which is completely our fault.

    My Life According to the iPhone's Secret Tracking Log — Alexis Madrigal has a far more interesting life than me, judging by his map. I especially like the points from a flight with Jim Fallows somewhere over West Virginia. As he says, this data can be incredibly interesting, and as data geeks we were just as fascinated as he is. I actually look forward to a future where we can use this sort of information, but with the user's permission.

    Apple is not “recording your moves” — Both of us have been following Will Clarke's blog for a while and we liked this article. It's good to look skeptically at the accuracy of the data both in space and time. We do disagree about one of the conclusions though: that the points are just the locations of cell towers. That was one of our first thoughts when we saw the data. But the fact that there's thousands of different points scattered across small areas, all in slightly different places, seems like pretty strong evidence that they're not just the locations of cell towers. Another way of putting that is that there's a lot more points than there are towers. There's also lots of points with the same tower ID code that are in different locations. That all led to our conclusion that it was trying to figure out the device's position, even if it wasn't very good at it.

    Until we get a deeper analysis, that's just a provisional conclusion of course. But getting smart folks like Will to dig into this and correct anything we've got wrong is exactly why we open-sourced it. He also picks up on the Las Vegas Anomaly. Multiple people have reported seeing a phantom trip to the city show up, and one theory (other than a lot of lost weekends) is that Apple has an unpacking or testing facility there. Alasdair's phone that was shipped with iOS4 shows this, whereas my older device that originally had iOS3 doesn't, which was suggestive. I wonder if Will's device is a newer one, too?

    OpenStreetMap — The application we released relies on this volunteer-run site to render the background map tiles. We ended up tripling their usual load, according to a team member. They actually fired up extra servers to cope, so I made sure to add a link to their donation page from our main site. If you got something out of the application, please do consider giving something to them, or even getting involved. It's a fantastic team and community. How many other organizations would have responded to heavy usage by a free client by paying for more servers themselves? I even messed up their credit text on the initial version of the application, but they were very understanding about that too.



    Related:


    January 12 2011

    Website des Hamburger Datenschutzbeauftragten selbst nicht datenschutzkonform?

    Ein Leser meines Blogs hat gestern in einem Kommentar geschrieben, dass die unter “datenschutz-hamburg.de” aufrufbare Website des Hamburger Datenschutzbeauftragten, der derzeit gegen Google Analytics vorgeht, selbst Tracking-Technologie einsetzt und dort kräftig getrackt würde.

    Diese Aussage war zumindest insoweit nachvollziehbar, als das Firefox Plug-In “Counterpixel” anzeigt, dass dort das IVW-Pixel zum Einsatz kommt. Dies vermutlich deshalb, weil der Auftritt des Datenschutzbeauftragten Teil von “hamburg.de” ist und dort das Statistik-Tool der Informationsgemeinschaft zur Feststellung der Verbreitung von Werbeträgern (IVW) eingesetzt wird, das übrigens von vielen großen deutschen Websites genutzt wird.

    Das Programm der IVW ist freilich, wie Google Analytics auch, ein Tracking-Tool, das Daten über die Besucher der Website sammelt und an die IVW weiterleitet, u.a. auch die IP-Adressen der Seitenbesucher. Und wenn man einem Artikel von golem.de glauben darf, werden auch von der IVW IP-Adressen vollständig, ohne Anonymisierung erfasst und gespeichert, weshalb hiergegen grundsätzlich dieselben datenschutzrechtlichen Bedenken bestehen müssen wie gegen Analytics.

    Wenn Datenschutzbehörden schon offensiv die Auffassung vertreten, dass Tracking-Tools datenschutzrechtlich bedenklich sind, dann sollten sie sie zumindest nicht auf ihren eigenen Websites benutzen. Es zeigt sich damit aber auch, dass ein datenschutzkonformer Webauftritt gar nicht so einfach ist, auch nicht für einen Landesdatenschutzbeauftragten.

    Die Diskussion kann ohnehin nicht auf Google beschränkt bleiben, sondern muss sich auf Tracking-Technologien insgesamt erstrecken.

    Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
    Could not load more posts
    Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
    Just a second, loading more posts...
    You've reached the end.

    Don't be the product, buy the product!

    Schweinderl