Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

January 31 2014

Four short links: 31 January 2014

  1. Bolts — Facebook’s library of small, low-level utility classes in iOS and Android.
  2. Python Idioms (PDF) — useful cheatsheet.
  3. Michael Abrash’s Graphics Programming Black Book — Markdown source in github. Notable for elegance and instructive for those learning to optimise. Coder soul food.
  4. About Link Bait (Anil Dash) — excellent consideration of Upworthy’s distinctive click-provoking headlines, but my eye was caught by we often don’t sound like 2012 Upworthy anymore. Because those tricks are starting to dilute click rates. from Upworthy’s editor-at-large. Attention is a scarce resource, and our brains are very good at filtering.

December 10 2013

Four short links: 10 December 2013

  1. ArangoDBopen-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript extensions.
  2. Google’s Seven Robotics Companies (IEEE) — The seven companies are capable of creating technologies needed to build a mobile, dexterous robot. Mr. Rubin said he was pursuing additional acquisitions. Rundown of those seven companies.
  3. Hebel (Github) — GPU-Accelerated Deep Learning Library in Python.
  4. What We Learned Open Sourcing — my eye was caught by the way they offered APIs to closed source code, found and solved performance problems, then open sourced the fixed code.

November 21 2013

Four short links: 21 November 2013

  1. Network Connectivity Optional (Luke Wroblewski) — we need progressive enhancement: assume people are offline, then enhance if they are actually online.
  2. Whoosh fast, featureful full-text indexing and searching library implemented in pure Python
  3. Flanker (GitHub) — open source address and MIME parsing library in Python. (via Mailgun Blog)
  4. Stream Adventure (Github) — interactive exercises to help you understand node streams.

November 15 2013

November 11 2013

Four short links: 12 Nov 2013

  1. Quantitative Reliability of Programs That Execute on Unreliable Hardware (MIT) — As MIT’s press release put it: Rely simply steps through the intermediate representation, folding the probability that each instruction will yield the right answer into an estimation of the overall variability of the program’s output. (via Pete Warden)
  2. AirBNB’s Javascript Style Guide (Github) — A mostly reasonable approach to JavaScript.
  3. Category Theory for Scientists (MIT Courseware) — Scooby snacks for rationalists.
  4. Textblob — Python open source text processing library with sentiment analysis, PoS tagging, term extraction, and more.

October 22 2013

Mining the social web, again

When we first published Mining the Social Web, I thought it was one of the most important books I worked on that year. Now that we’re publishing a second edition (which I didn’t work on), I find that I agree with myself. With this new edition, Mining the Social Web is more important than ever.

While we’re seeing more and more cynicism about the value of data, and particularly “big data,” that cynicism isn’t shared by most people who actually work with data. Data has undoubtedly been overhyped and oversold, but the best way to arm yourself against the hype machine is to start working with data yourself, to find out what you can and can’t learn. And there’s no shortage of data around. Everything we do leaves a cloud of data behind it: Twitter, Facebook, Google+ — to say nothing of the thousands of other social sites out there, such as Pinterest, Yelp, Foursquare, you name it. Google is doing a great job of mining your data for value. Why shouldn’t you?

There are few better ways to learn about mining social data than by starting with Twitter; Twitter is really a ready-made laboratory for the new data scientist. And this book is without a doubt the best and most thorough approach to mining Twitter data out there. But that’s only a starting point. We hear a lot in the press about sentiment analysis and mining unstructured text data; this book shows you how to do it. If you need to mine the data in web pages or email archives, this book shows you how. And if you want to understand how to people collaborate on projects, Mining the Social Web is the only place I’ve seen that analyzes GitHub data.

All of the examples in the book are available on Github. In addition to the example code, which is bundled into IPython notebooks, Matthew has provided a VirtualBox VM that installs Python, all the libraries you need to run the examples, the examples themselves, and an IPython server. Checking out the examples is as simple as installing Virtual Box, installing Vagrant, cloning the 2nd edition’s Github archive, and typing “vagrant up.” (This quick start guide summarizes all of that.) You can execute the examples for yourself in the virtual machine; modify them; and use the virtual machine for your own projects, since it’s a fully functional Linux system with Python, Java, MongoDB, and other necessities pre-installed. You can view this as a book with accompanying examples in a particularly nice package, or you can view the book as “premium support” for an open source project that consists of the examples and the VM.

If you want to engage with the data that’s surrounding you, Mining the Social Web is the best place to start. Use it to learn, to experiment, and to build your own data projects.

October 07 2013

Four short links: 8 October 2013

  1. Lightworks — open source non-linear video editing software, with quite a history.
  2. Puzzlescript — open source puzzle game engine for HTML5.
  3. pudb — full-screen (text-mode) Python debugger.
  4. Freelanfree, open-source, multi-platform, highly-configurable and peer-to-peer VPN software.

August 29 2013

Four short links: 30 August 2013

  1. intention.jsmanipulates the DOM via HTML attributes. The methods for manipulation are placed with the elements themselves, so flexible layouts don’t seem so abstract and messy.
  2. Introducing Brick: Minimal-markup Web Components for Faster App Development (Mozilla) — a cross-browser library that provides new custom HTML tags to abstract away common user interface patterns into easy-to-use, flexible, and semantic Web Components. Built on Mozilla’s x-tags library, Brick allows you to plug simple HTML tags into your markup to implement widgets like sliders or datepickers, speeding up development by saving you from having to initially think about the under-the-hood HTML/CSS/JavaScript.
  3. F1: A Distributed SQL Database That Scalesa distributed relational database system built at Google to support the AdWords business. F1 is a hybrid database that combines high availability, the scalability of NoSQL systems like Bigtable, and the consistency and usability of traditional SQL databases. F1 is built on Spanner, which provides synchronous cross-datacenter replication and strong consistency. Synchronous replication implies higher commit latency, but we mitigate that latency by using a hierarchical schema model with structured data types and through smart application design. F1 also includes a fully functional distributed SQL query engine and automatic change tracking and publishing.
  4. Looking Inside The (Drop)Box (PDF) — This paper presents new and generic techniques, to reverse engineer frozen Python applications, which are not limited to just the Dropbox world. We describe a method to bypass Dropbox’s two factor authentication and hijack Dropbox accounts. Additionally, generic techniques to intercept SSL data using code injection techniques and monkey patching are presented. (via Tech Republic)

April 24 2013

Four short links: 24 April 2013

  1. Solar Energy: This is What a Disruptive Technology Looks Like (Brian McConnell) — In 1977, solar cells cost upwards of $70 per Watt of capacity. In 2013, that cost has dropped to $0.74 per Watt, a 100:1 improvement (source: The Economist). On average, solar power improves 14% per year in terms of energy production per dollar invested.
  2. Process Managers — overview of the tools that keep your software running.
  3. Bittorrent Sync — Dropbox-like features, BitTorrent under the hood.
  4. Brython — Python interpreter written in Javascript, suitable for embedding in webpages. (via Nelson Minar)

April 17 2013

Four short links: 17 April 2013

  1. Computer Software Archive (Jason Scott) — The Internet Archive is the largest collection of historical software online in the world. Find me someone bigger. Through these terabytes (!) of software, the whole of the software landscape of the last 50 years is settling in. (And documentation and magazines and …). Wow.
  2. 7 in 10 Doctors Have a Self-Tracking Patientthe most common ways of sharing data with a doctor, according to the physicians, were writing it out by hand or giving the doctor a paper printout. (via Richard MacManus)
  3. opsmezzo — open-sourced provisioning tools from the Nodejitsu team. (via Nuno Job)
  4. Hacking Secret Ciphers with Pythonteaches complete beginners how to program in the Python programming language. The book features the source code to several ciphers and hacking programs for these ciphers. The programs include the Caesar cipher, transposition cipher, simple substitution cipher, multiplicative & affine ciphers, Vigenere cipher, and hacking programs for each of these ciphers. The final chapters cover the modern RSA cipher and public key cryptography.

November 05 2012

Four short links: 5 November 2012

  1. The Psychology of Everything (YouTube) — illustrating some of the most fundamental elements of human nature through case studies about compassion, racism, and sex. (via Mind Hacks)
  2. Reports of Exempt Organizations (Public Resource) — This service provides bulk access to 6,461,326 filings of exempt organizations to the Internal Revenue Service. Each month, we process DVDs from the IRS for Private Foundations (Type PF), Exempt Organizations (Type EO), and filings by both of those kinds of organizations detailing unrelated business income (Type T). The IRS should be making this publicly available on the Internet, but instead it has fallen to Carl Malamud to make it happen. (via BoingBoing)
  3. Chris Anderson Leaves for Drone Co (Venturebeat) — Editor-in-chief of Wired leaves to run his UAV/robotics company 3D Robotics.
  4. pysqli (GitHub) — Python SQL injection framework; it provides dedicated bricks that can be used to build advanced exploits or easily extended/improved to fit the case.

October 04 2012

Checking in on Python

Guido van Rossum is the creator of Python. I recently had the opportunity to talk with him about the state of the language.

You probably don’t realize it, but Python’s capabilities are pushed every time you use YouTube and Dropbox. During our interview, Van Rossum said both of these services are at the forefront of Python’s development.

“Whenever someone clicks on a [YouTube] video, they will see HTML that was generated from Python,” he said. “That’s definitely pushing the limits.” [Discussed 27 seconds in — you can see the scalability presentation that Van Rossum mentions during this segment here.]

On the Dropbox side, Van Rossum said the service’s clients for Linux, Windows and Mac are all implemented in Python. You’re also downloading a miniature version of the Python runtime when you’re using Dropbox. [Noted at 1:20.]

Van Rossum also spoke about the lengthy transition Python has undergone from Python 2 to Python 3. “If you want improvements to your Python … now is the time to start trying out Python 3.” Why? While the changes to the language are actually quite small, with the exception of unicode handling being completely overhauled, Python 3 is a better, faster version of Python. In addition, many third parties like Django are coming on line with libraries and frameworks for Python 3. [Discussed at the 7:01 mark.]

Additional topics discussed during the interview include:

You can view the full discussion in the following video.

Related:

July 27 2012

Four short links: 27 July 2012

  1. Social Media in China (Fast Company) — fascinating interview with Tricia Wang. We often don’t think we have a lot to learn from tech companies outside of the U.S., but Twitter should look to Weibo for inspiration for what can be done. It’s like a mashup of Tumblr, Zynga, Facebook, and Twitter. It’s very picture-based, whereas Twitter is still very text-based. In Weibo, the pictures are right under each post, so you don’t have to make an extra click to view them. And people are using this in subversive ways. Whether you’re using algorithms to search text or actual people–and China has the largest cyber police force in the world—it’s much easier to censor text than images. So people are very subversive in hiding messages in pictures. These pictures are sometimes very different than what people are texting, or will often say a lot more than the actual text itself. (via Tricia Wang)
  2. A Treatise on Font Rasterisation With an Emphasis on Free Software (Freddie Witherden) — far more than you ever thought you wanted to know about how fonts are rendered. (via Thomas Fuchs)
  3. Softwear Automation — robots to make clothes, something which is surprisingly rare. (via Andrew McAfee)
  4. A Guide to Analyzing Python Performance — finding speed and memory problems in your Python code. With pretty pictures! (via Ian Kallen)

May 31 2012

Using Python for Computer Vision

Python is a tremendous asset when you're trying to classify images, track changes in scenes, search for items within images, implement augmented reality, or do the myriad other things that fall under the umbrella of Computer Vision. In this interview, Jan Erik Solem, author of the upcoming book "Programming Computer Vision with Python," describes the uses for some common operations, and choices programmers have.

Highlights from the full video interview include:

  • The value of Python in computer vision [Discussed at the 0:24 mark]
  • Searching for images within images [Discussed at the 2:13 mark]
  • Clustering or grouping images [Discussed at the 3:22 mark]
  • Constructing a 3D scene from images [Discussed at the 6:11 mark]
  • Modeling and calibrating a camera [Discussed at the 7:22 mark]

You can view the entire conversation in the following video:

OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.

Save 20% on registration with the code RADAR


Related:



February 16 2012

Developer Week in Review: NASA says goodbye to big iron

It looks like I'm going to have a life-changing decision to make in the next few weeks, one that will be shared by millions of people around the world. At risk, the balance in my bank account.

I refer, of course, to whether I'll pony up the cash to upgrade my iPad 2 to a 3, once Apple actually tells us what the iPad 3 will have in it. Unless it cooks gourmet dinners and transports you to other planets, my best guess is that I won't. For one thing, we're also facing the release of the iPhone 5 later in the year, and I make it a policy only to do one Apple fan-boy "upgrade the expensive toy you just bought last year" purchase a year. For another, it looks like the 3 is going to be a faster version of the 2 with a Retina display, and I just can't see it being enough of a delta in features to make it worth the cost.

If I'm going to upgrade either device, I need cash in the bank, so time to earn my keep with this week's news.

HAL is crestfallen ...

NASA logoWe arrive at a bit of a milestone this week, as NASA says goodbye to the last piece of big iron left in its data processing infrastructure. With the retirement of the last IBM Z9, NASA finishes its mission to boldly go where most of the rest of the high tech world had already gone years ago. I especially liked the shout-out to old-school programmers in JCL at the end of NASA's blog post marking the occasion.

NASA, like many organizations running life-critical applications, has to take a very conservative approach to hardware upgrades, because failure is not an option. The computers installed into NASA space vehicles and probes are notorious for being generations behind the current state of the art, because of the long lead times to get them spec'd out and installed. Obviously, no mainframe flies into space, for reasons of weight and space if nothing else. You can see the same kind of excruciatingly slow hardware progress at agencies like the FAA, which can take a human generation to upgrade to a new air traffic control system.

For now, let us bid farewell to the brave Z9, last of its kind at NASA. It would be nice to fantasize that it was responsible for some intricate detail of manned space flight, but the reality is that it evidently ran business applications. Even so, if you don't pay the engineers and vendors, they don't work, so it did play its own sort of role in the exploration of the universe.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Monty Redmond's Visual Python

Visual Studio, like Eclipse and Xcode, provides IDE support for a huge swath of the developer community. While it's still common to find old-schoolers who use Emacs or vi to grind out code, most programmers these days end up using an IDE to take advantage of the debugging and integrated documentation features they provide.

Eclipse is well-known for the wide variety of languages and platforms it supports, but it's easy to forget that Microsoft is making a concerted effort to open up Visual Studio to a wider developer audience as well. One sign of this is the version 1.1 release of Python Tools for Visual Studio, which has just come out. This toolkit is notable for another reason, too: it's one of the projects coming out of Microsoft's Codeplex open source initiative.

I know I'm not alone in having been skeptical of Microsoft's recent warming to open source. It's easy to see it as yet another "embrace, extend and extinguish" play. But at a certain point, you have to say that if it walks and talks like a mule, it may in fact be a mule after all. While I don't expect to see the Windows XP source code being donated to Apache anytime soon, it does seem to appear that Microsoft is making an honest effort to leverage the power of the open source model where it makes sense. That's a huge change from the company's previous "open source is communism" stance. As with most things, time will tell if this is the real deal.

I guess we'll find out what happens when you cross the streams ...

Open source developers have a reputation for bringing a passion, sometimes at an obsessive level, to the projects they work on. But even they would find themselves challenged to keep up with the frenzied level of creative mania displayed by bronies, adult fans of the new My Little Pony reboot. So what happens when you combine the two forces of open source and the brony herd? Wonder Twin developer powers activate!

"PonyKart" is a "Mario Kart"-style game set in the "My Little Pony: Friendship is Magic" universe. It's being developed by a group of brony developers over on SourceForge. It's still in the early days, but the initial videos they've released are impressive.

There's a reason you don't see a lot of open source games with this level of complexity; it's a fairly massive undertaking and is usually only within the resources of major game houses. There is a very capable Linux "MarioKart" clone out there, but consider that the "PonyKart" folks have only been in operation since July of last year, compared to the six years of development that have gone into "Supertuxkart" so far, and you can get a feel for the awesome power that can be brought to bear when two committed movements overlap. To be fair, there are more tools available now — such as physics engines — then when "Supertuxkart" started development, but the "PonyKart" effort is still striking. Imagine what could happen if we could get the Gleeks interested in video editing software ...

Tying in another theme often harped upon in these pages, the reason PonyKart can happen at all is that Hasbro has gone out of its way to apply a light hand as far as their intellectual property is concerned. Rather than wrapping a death-grip around the My Little Pony characters, Hasbro has let fans pretty much run wild with them (including the inevitable Rule 34 stuff). The company has wisely decided to let the fans churn up a meme-storm, while it sits back and counts the profits from toy sales. Are you listening, RIAA and MPAA? You could do much better by cooperating with your fan base, rather than persecuting them.

Of course, "PonyKart" could still lose momentum and die. There's a big difference between a long-term effort and horsing around for a few months (see what I did there?). But given the evidence to date, I wouldn't count this nag out of the race yet.

(Obligatory full disclosure: Your humble chronicler is a member of the herd, although not involved in the "PonyKart" project.)

Got news?

Please send tips and leads here.

Related:

December 26 2011

Four short links: 26 December 2011

  1. Pattern -- a BSD-licensed bundle of Python tools for data retrieval, text analysis, and data visualization. If you were going to get started with accessible data (Twitter, Google), the fundamentals of analysis (entity extraction, clustering), and some basic visualizations of graph relationships, you could do a lot worse than to start here.
  2. Factorie (Google Code) -- Apache-licensed Scala library for a probabilistic modeling technique successfully applied to [...] named entity recognition, entity resolution, relation extraction, parsing, schema matching, ontology alignment, latent-variable generative models, including latent Dirichlet allocation. The state-of-the-art big data analysis tools are increasingly open source, presumably because the value lies in their application not in their existence. This is good news for everyone with a new application.
  3. Playtomic -- analytics as a service for gaming companies to learn what players actually do in their games. There aren't many fields untouched by analytics.
  4. Write or Die -- iPad app for writers where, if you don't keep writing, it begins to delete what you wrote earlier. Good for production to deadlines; reflective editing and deep thought not included.

December 22 2011

Four short links: 22 December 2011

  1. Fuzzy String Matching in Python (Streamhacker) -- useful if you're to have a hope against the swelling dark forces powered by illiteracy and touchscreen keyboards.
  2. The Business of Illegal Data (Strata Conference) -- fascinating presentation on criminal use of big data. "The more data you produce, the happier criminals are to receive and use it. Big data is big business for organized crime, which represents 15% of GDP."
  3. Isarithmic Maps -- an alternative to chloropleths for geodata visualization.
  4. Server-Side Javascript Injection (PDF) -- a Blackhat talk about exploiting backend vulnerabilities with techniques learned from attacking Javascript frontends. Both this paper and the accompanying talk will discuss security vulnerabilities that can arise when software developers create applications or modules for use with JavaScript-based server applications such as NoSQL database engines or Node.js web servers. In the worst-case scenario, an attacker can exploit these vulnerabilities to upload and execute arbitrary binary files on the server machine, effectively granting him full control over the server.

September 19 2011

Four short links: 19 September 2011

  1. 1996 vs 2011 Infographic from Online University (Evolving Newsroom) -- "AOL and Yahoo! may be the butt of jokes for young people, but both are stronger than ever in the Internet's Top 10". Plus ça change, plus c'est la même chose.
  2. Pandas -- open source Python package for data analysis, fast and powerful. (via Joshua Schachter)
  3. The Society of Mind -- MIT open courseware for the classic Marvin Minsky theory that explains the mind as a collection of simpler processes. The subject treats such aspects of thinking as vision, language, learning, reasoning, memory, consciousness, ideals, emotions, and personality. Ideas incorporate psychology, artificial intelligence, and computer science to resolve theoretical issues such as whole vs. parts, structural vs. functional descriptions, declarative vs. procedural representations, symbolic vs. connectionist models, and logical vs. common-sense theories of learning. (via Maria Popover)
  4. Gamers Solve Problem in AIDS Research That Puzzled Scientists for Years (Ed Yong) -- researchers put a key protein from an HIV-related virus onto the Foldit game. If we knew where the halves joined together, we could create drugs that prevented them from uniting. But until now, scientists have only been able to discern the structure of the two halves together. They have spent more than ten years trying to solve structure of a single isolated half, without any success. The Foldit players had no such problems. They came up with several answers, one of which was almost close to perfect. In a few days, Khatib had refined their solution to deduce the protein’s final structure, and he has already spotted features that could make attractive targets for new drugs. Foldit is a game where players compete to find the best shape for a protein, but it's capable of being played by anyone--barely an eighth of players work in science.

August 24 2011

Four short links: 24 August 2011

  1. STM in PyPy -- a proposal to add software transactional memory to the all-Python Python interpreter as a way of simplifying concurrent programming. I first learned about STM from Haskell's Simon Peyton-Jones at OSCON. (via Nelson Minar)
  2. Werner Vogels' Static Web Site on S3 -- nice writeup of the toolchain to publish a web site to static files served from S3.
  3. China Inadvertently Reveals State-Sponsored Hacking -- if UK, US, France, Israel, or Chinese citizens believe their government doesn't have malware and penetration teams working on extracting information from foreign governments, they're dreaming.
  4. MyChinese360 -- virtual foreign language instruction in Mandarin, including "virtual visits" to Chinese landmarks. The ability to get native speakers virtually into the classroom makes the Internet a huge asset for rural schools. (via Lucy Gray)

August 17 2011

Four short links: 17 August 2011

  1. Tablib -- MIT-licensed open source library for manipulating tabular data. Reputed to have a great API. (via Tim McNamara)
  2. Stanford Education Everywhere -- courses in CS, machine learning, math, and engineering that are open for all to take. Over 58,000 have already signed up for the introduction to machine learning taught by Peter Norvig, Google's Director of Research.
  3. Wearable LED Television -- 160x120 RGBs powered by a 12v battery, built for Burning Man (natch). (via Bridget McKendry)
  4. Temporary Tattoo Biosensors (Science News) -- early work putting flexible sensors into temporary tattoos. (via BoingBoing)

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl