Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 17 2012

The stories behind a few O'Reilly "classics"

This post originally appeared in Tim O'Reilly's Google+ feed.

It's amazing to me how books I first published more than 20 years ago are still creating value for readers. O'Reilly Media is running an ebook sale for some of our "classics."

vi and Vim"Vi and Vim" is an updated edition of a book we first published in 1986! Linda Lamb was the original author; I was the editor, and added quite a bit of material of my own. (In those days, being the "editor" for us really meant being ghostwriter and closet co-author.) I still use and love vi/vim.

"DNS and Bind" has an interesting back story too. In the late '80s or early '90s, I was looking for an author for a book on smail, a new competitor to sendmail that seemed to me to have some promise. I found Cricket Liu, and he said, "what I really want to write a book about is Bind and the Domain Name System. Trust me, it's more important than smail." The Internet was just exploding beyond its academic roots (we were still using UUCP!), but I did trust him. We published the first edition in 1992, and it's been a bestseller ever since.

"Unix in a Nutshell" was arguably our very first book. I created the first edition in 1984 for a long-defunct workstation company called Masscomp; we then licensed it to other companies, adapting it for their variants of Unix. In 1986, we published a public edition in two versions: System V and BSD. The original editions were inspired by the huge man page documentation sets that vendors were shipping at the time: I wanted to have something handy to look up command-line options, shell syntax, regular expression syntax, sed and awk command syntax, and even things like the ascii character set.

The books were moderately successful until I tried a price drop from the original $19.50 to $9.95 as an experiment, with the marketing headline "Man bites dog." I told people we'd try the new price for six months, and if it doubled sales, we'd keep it. Instead, the enormous value proposition increased sales literally by an order of magnitude. At the book's peak, we were selling tens of thousands of copies a month.

Every other "in a nutshell" book we published derived from this one, a product line that collectively sold millions of copies, and helped put O'Reilly on the map.

"Essential System Administration" is another book that dates back to our early days as a documentation consulting company. I wrote the first edition of this book for Masscomp in 1984; it might well be the first Unix system administration book ever written. I had just written a graphics programming manual for Masscomp, and was looking for another project. I said, "When any of us have any problems with our machines, we go to Tom Texeira. Where are our customers going to go?" So I interviewed Tom, and wrote down what he knew. (That was the origin of so many of our early books — and the origin of the notion of "capturing the knowledge of innovators.")

I acquired the rights back from Masscomp, and licensed the book to a company called Multiflow, where Mike Loukides ran the documentation department. Mike updated the book. Æleen Frisch, who was working for Mike, did yet another edition for Multiflow, and when the company went belly up, I acquired back the improved version (and hired Mike as our first editor besides me and Dale). He signed Æleen to develop it as a much more comprehensive book, which has been in print ever since.

"Sed and Awk" has a funny backstory too. It was one of the titles that inspired the original animal designs. Edie Freedman thought Unix program names sounded like weird animals, and this was one of the titles she chose to make a cover for, even though the book didn't exist yet. We'd hear for years that people knew it existed — they'd seen it. Dale Dougherty eventually sat down and wrote it, mostly because he loved awk but also just to satisfy those customers who just knew it existed.

(Here's a brief history of how Edie came up with the idea for the animal book covers.)

Unix Power ToolsAnd then there's "Unix Power Tools." In the late '80s, Dale had discovered hypertext via Hypercard, and when he discovered Viola and the World Wide Web, that became his focus. We had written a book called "Unix Text Processing" together, and I was hoping to lure him back to writing another book that exercised the hypertext style of the web, but in print. Dale was working on GNN by that time and couldn't be lured onto the project, but I was having so much fun that I kept going.

I recruited Jerry Peek and Mike Loukides to the project. It was a remarkable book both in being crowdsourced — we collected material from existing O'Reilly books, from saved Usenet posts, and from tips submitted by customers — and in being cross-linked like the web. Jerry built some great tools that allowed us to assign each article a unique ID, which we could cross-reference by ID in the text. As I rearranged the outline, the cross-references would automatically be updated. (It was all done with shell scripts, sed, and awk.)

Lots more in this trip down memory lane. But the fact is we've kept the books alive, kept updating them, and they are still selling, and still helping people do their jobs, decades later. It's something that makes me proud.

See comments and join the conversation about this topic at Google+.

December 06 2010

Strata Gems: The timeless utility of sed and awk

We're publishing a new Strata Gem each day all the way through to December 24. Yesterday's Gem: Where to find data.

Strata 2011Edison famously said that genius is 1% inspiration and 99% perspiration. Much the same can be said for data analysis. The business of obtaining, cleaning and loading the data often takes the lion's share of the effort.

Now over 30 years old, the UNIX command line utilities sed and awk are useful tools for cleaning up and manipulating data. In their Taxonomy of Data Science, Hilary Mason and Chris Wiggins note that when cleaning data, "Sed, awk, grep are enough for most small tasks, and using either Perl or Python should be good enough for the rest." A little aptitude with command line tools can go a long way.

sed is a stream editor: it operates on data in a serial fashion as it reads it. You can think of sed as a way to batch up a bunch of search and replace operations that you might perform in a text editor. For instance, this command will replace all instances of "foo" with "bar" within a file:

sed -e 's/foo/bar/g' myfile.txt

Anybody who has used regular expressions within a text editor or programming language will find sed easy to grasp. Awk takes a little more getting used to. A record-oriented tool, awk is the right tool to use when your data contains delimited fields that you want to manipulate.

Consider this list of names, which we'll imagine lives in the file presidents.txt.

George Washington
John Adams
Thomas Jefferson
James Madison
James Monroe

To extract just the first names, we can use the following command:

$ awk '{ print $1 }' presidents.txt
George
John
Thomas
James
James

Or, to just find those records with "James" as the first name:

$ awk '$5 ~ /James/ { print }' presidents.txt
James Madison
James Monroe

Awk can do a lot more, and features programming concepts such as variables, conditionals and loops. But just a basic grasp of how to match and extract fields will get you far.

For more information, attend the Strata Data Bootcamp, where Hilary Mason is an instructor, or read sed & awk.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl