Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

December 05 2013

Four short links: 5 December 2013

  1. DeducerAn R Graphical User Interface (GUI) for Everyone.
  2. Integration of Civil Unmanned Aircraft Systems (UAS) in the National Airspace System (NAS) Roadmap (PDF, FAA) — first pass at regulatory framework for drones. (via Anil Dash)
  3. Bitcoin Stats — $21MM traded, $15MM of electricity spent mining. Goodness. (via Steve Klabnik)
  4. iOS vs Android Numbers (Luke Wroblewski) — roundup comparing Android to iOS in recent commerce writeups. More Android handsets, but less revenue per download/impression/etc.

October 25 2013

Four short links: 25 October 2013

  1. Seagate Kinetic Storage — In the words of Geoff Arnold: The physical interconnect to the disk drive is now Ethernet. The interface is a simple key-value object oriented access scheme, implemented using Google Protocol Buffers. It supports key-based CRUD (create, read, update and delete); it also implements third-party transfers (“transfer the objects with keys X, Y and Z to the drive with IP address 1.2.3.4”). Configuration is based on DHCP, and everything can be authenticated and encrypted. The system supports a variety of key schemas to make it easy for various storage services to shard the data across multiple drives.
  2. Masters of Their Universe (Guardian) — well-written and fascinating story of the creation of the Elite game (one founder of which went on to make the Raspberry Pi). The classic action game of the early 1980s – Defender, Pac Man – was set in a perpetual present tense, a sort of arcade Eden in which there were always enemies to zap or gobble, but nothing ever changed apart from the score. By letting the player tool up with better guns, Bell and Braben were introducing a whole new dimension, the dimension of time.
  3. Micropolar (github) — A tiny polar charts library made with D3.js.
  4. Introduction to R (YouTube) — 21 short videos from Google.

October 04 2013

GDELT : What can we learn from the last 200 million things that happened in the world ? | War of…

#GDELT: What can we learn from the last 200 million things that happened in the world? | War of Ideas
http://ideas.foreignpolicy.com/posts/2013/04/10/what_can_we_learn_from_the_last_200_million_things_that_happene

The excitement over Global Data on Events, Location, and Tone - to give its full name — is understandable. The singularly ambitious project could have a transformative effect on how we use data to understand and anticipate political events.

Essentially, GDELT is a massive list of important political events that have happened — more than 200 million and counting — identified by who did what to whom, when and where, drawn from news accounts and assembled entirely by software. Everything from a riot over food prices in Khartoum, to a suicide bombing in Sri Lanka, to a speech by the president of Paraguay goes into the system.

Similar event databases have been built for particular regions, and DARPA has been working along similar lines for the Pentagon with a project known as ICEWS, but for a publicly accessible program (you can download it here though you’ll need some programming skills to use it) GDELT is unprecedented in it geographic and historic scale. The database updates with new events every night following the day’s news and while it currently goes back to 1979, its developers are working on adding events going back as far as 1800 according to lead author Kalev Leetaru, a fellow at the University of Illinois Graduate School of Library and Information Science.

#histoire #politique #conflits #data via @francoisbriatte

#GDELT
http://eventdata.psu.edu/data.dir/GDELT.html

Package GDELT pour #R
http://cran.r-project.org/web/packages/GDELTtools

Guardian datablog
http://www.theguardian.com/news/datablog/2013/apr/12/gdelt-global-database-events-location

https://willopines.wordpress.com/2013/04/11/excitement-about-gdelt-and-some-personal-intellectual-history

Quantifying memory
http://quantifyingmemory.blogspot.co.uk/2013/04/big-geo-data-visualisations.html
http://4.bp.blogspot.com/-WIMVlwxpzmU/UW5eF3cMyjI/AAAAAAAAAG8/s7_37GIj28Q/s1600/2000-01-01.gif

Mapping with GDELT
http://nbviewer.ipython.org/urls/raw.github.com/dmasad/GDELT_Intro/master/GDELT_Mapping.ipynb

Mapping Syria’s conflict
http://syria.newscientistapps.com

August 30 2013

Guide du datajournalisme

Guide du #datajournalisme
http://www.datajournalismhandbook.org

http://jplusplus.github.io/guide-du-datajournalisme/img/cover_print_border.jpg

Le Guide du datajournalisme est une œuvre inachevée. Si vous relevez quoi que ce soit qui manque ou qui devrait être modifié, veuillez nous le signaler pour la prochaine version. (...)
En adaptant le Datajournalism Handbook en français, nous avons donné la parole à celles et à ceux qui innovent dans le journalisme francophone. Ils apportent des éclairages locaux qui montrent qu’il est possible de faire du journalisme autrement

(version #git : http://jplusplus.github.io/guide-du-datajournalisme)
#visualisation #cartographie #R #excel #google_refine etc

August 24 2012

Four short links: 24 August 2012

  1. Speak Like a Pro (iTunes) — practice public speaking, and your phone will rate your performance and give you tips to improve. (via Idealog)
  2. If Hemingway Wrote Javascript — glorious. I swear I marked Andre Breton’s assignments at university. (via BoingBoing)
  3. R Open Sciopen source R packages that provide programmatic access to a variety of scientific data, full-text of journal articles, and repositories that provide real-time metrics of scholarly impact.
  4. Keeping Your Site Alive (EFF) — guide to surviving DDOS attacks. (via BoingBoing)

July 05 2011

Four short links: 5 July 2011

  1. Conference Organisers Handbook -- accurate guide to running a two-day 300-person conference. Compare Yet Another Perl Conference guidelines.
  2. Twitter Shifting More Code to JVM -- interesting how, at scale, there are some tools and techniques of the scorned Enterprise that the web cool kids must turn to. Some. Business Process Workflow XML Schemas will never find love.
  3. Louis von Ahn on Duolingo -- from the team that gave us "OCR books as you verify you are a human" CAPTCHAs comes "learn a new language as you translate the web". I would love to try this, it sounds great (and is an example of what crowdsourcing can be).
  4. Fully Bayesian Computing (PDF) -- A fully Bayesian computing environment calls for the possibility of defining vector and array objects that may contain both random and deterministic quantities, and syntax rules that allow treating these objects much like any variables or numeric arrays. Working within the statistical package R, we introduce a new object-oriented framework based on a new random variable data type that is implicitly represented by simulations. Perl made text processing easy because strings were first-class objects with a rich set of functions to operate on them; Node.js has a sweet HTTP library; it's interesting to see how much more intuitive an algorithm becomes when random variables are a data type. (via BigData)

March 09 2011

Four short links: 9 March 2011

  1. R Studio -- AGPLv3-licensed IDE for R. It brings your R console, source code, plots, help, history, and workspace browser into one cohesive package. We've added some neat productivity features like a searchable endless command history, function/symbol completion, data import dialog with preview, one-click Sweave compile, and more. Source on github. Built as a web-app on Google AppEngine, from Joe Cheng who did Windows Live Writer at Microsoft. (via DeWitt Clinton)
  2. Adventures in Participatory Audience -- Nina Simon helped thirteen students produce three projects to encourage participation in museum audiences: Xavier, Stringing Connections, and Dirty Laundry. My favourite was Dirty Laundry, where people shared secrets connected to works of art. Nina's description of what she learned has some nuggets: friendly faces welcoming people in gets better response than a card with instructions, and I am still flummoxed as to what would make someone admit to an affair or bad parenting in a sterile art gallery, or the devastating one that read, "I avoid the important, difficult conversations with those I love the most." Audience participation in the real world has lessons on what works for those who would build social software.
  3. Why Generic Machine Learning Fails -- Returns for increasing data size come from two sources: (1) the importance of tails and (2) the cost of model innovation. When tails are important, or when model innovation is difficult relative to cost of data capture, then more data is the answer. [...] Machine learning is not undifferentiated heavy lifting, it’s not commoditizable like EC2, and closer to design than coding. The Netflix prize is a good example: the last 10% reduction in RMSE wasn't due to more powerful generic algorithms, but rather due to some very clever thinking about the structure of the problem; observations like "people who rate a whole slew of movies at one time tend to be rating movies they saw a long time ago" from BellKor.
  4. Anatomy of a Crushing -- Maciej Ceglowski describes how pinboard.in survived the flood of Delicious émigrées. It took several rounds of rewrites to get the simple tag cloud script right, and this made me very skittish about touching any other parts of the code over the next few days, even when the fixes were easy and obvious. The part of my brain that knew what to do no longer seemed to be connected directly to my hands.

September 01 2010

Four short links: 1 September 2010

  1. R Library for Chernoff Faces -- faces represent the rows of a data matrix by faces. plot.faces plots faces into a scatterplot. Interesting emotional way to visualize data, which was used to good effect (though not with this library) by BERG in Schooloscope. (via the tutorial at Flowing Data)
  2. Piwik -- GPLed web analytics package.
  3. Pomegranate -- a data store for billions of tiny files. (via the High Scalability blog interview with the creator of Pomegranate)
  4. New Backpack Makes 3D Maps of Buildings -- the backpack indoor equivalent of the Google Maps cars, from Berkeley researchers.

January 01 2010

Four short links: 1 January 2010

  1. Measuring Type -- clever way to measure which font uses more ink.
  2. Vowpal Wabbit -- fast learning software from Yahoo! Research and Hunch. Code available in git. (via zecharia on Delicious)
  3. Literature Review on Indexing Time-Series Data -- a graduate student's research work included this literature review of papers on indexing time-series data. (via jpatanooga on Delicious)
  4. igraph -- programming library for manipulating graph data, with the usual algorithms (minimum spanning tree, network flow, cliques, etc.) available in R, Python, and C.

November 05 2009

Four short links: 5 November 2009

  1. Heat Maps in R -- We used financial data here because it's easier to access than the airline data, but it's actually a pretty interesting way of looking at a financial time series. Weekend and holiday effects are a bit more obvious, and it's a bit like being able to see the daily, weekly, monthly and yearly closes all at once (by scanning your eye over the calendar in different directions). Includes source code. (via migurski on Delicious)
  2. BlackHat and EC2 -- Theft of resources is the red-headed step-child of attack classes and doesn't get much attention, but on cloud platforms where resources are shared amongst many users these attacks can have a very real impact. With this in mind, we wanted to show how EC2 was vulnerable to a number of resource theft attacks and the videos below demonstrate three separate attacks against EC2 that permit an attacker to boot up massive numbers of machines, steal computing time/bandwidth from other users and steal paid-for AMIs. (via straup on Delicious)
  3. Funny Characters in Unicode -- I never get tired of the wacky stuff in Unicode. I love the thought of a Unicode committee somewhere arguing passionately about the number of buttons on the snowman .... (via Hacker News)
  4. Statistics to English Translation -- The terms sensitivity and specificity generally refer to diagnostic or screening procedures, such as an HIV or allergy tests. The sensitivity of a test is its true positive rate; the specificity is its true negative rate, although it can be more intuitive to think of specificity as the complement of the false positive rate. This matters. Bandying around numbers with misleading labels, or misinterpreting numbers that have a precise and defined meaning, does not further understanding. (Said 78.4% of statisticians, with a 20% confidence factor probability of false positives)

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl