Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

April 02 2012

State of the Computer Book Market, part 2: The Categories

In this second installment (the first post can be found here), we look at computer book sales in specific technology categories.

Remember that we've organized the data into six "Category Families" — Systems and Programming, Web Design and Development, Business Applications, Digital Media Applications, Consumer Operating Systems and Devices, and Computer Topics.

Within each of these Families are category group, super-category, category, and atomic category, in a five-level hierarchy. For example, Systems and Programming includes the category groups programming languages, databases, software engineering, general programming, security, and so on.

In the rest of this post, we will contrast 2011 with 2010.

As a refresher, here are two treemaps of the Category Families, with their sub-areas for the final quarters of 2011 compared to 2010. The map on the left shows the growth of the count of titles in each area and the map on the right shows the growth in units for each area.

Count of Titles Units 12_Cat_QTR_TitleCount_PrevYear.jpg

The Treemap on the left shows the number of new titles entering the Top 3000 in 2011. Security General (upper-left center), Data Analysis (left-bottom center), iPad-consumer (middle-bottom center), MacOSX (middle-bottom center) and HTML5 (upper-right corner) where the brightest green growth areas in 2011.

The Treemap on the right shows the top growing areas from a units perspective. The same areas are the top performers, but they have moved around a bit and are larger in some cases which reflects their market share. Again, this is comparing the last quarter of 2011 with the last quarter of 2010. This time period reflects the holiday shopping season and usually the best for consumer topics and not necessarily for the more technical titles which peak early in the new year.

In the next two images, you can see how our Category Families stack up. The image on the left shows the number of titles that made the Top 3000 in a given year. Contrast that with the image on the right, which shows the number of units sold in each year. What you will notice is that the number of titles in Systems and Programming went up in 2011 to its highest level since we began tracking, yet the units sold for the Category has been going down each year. Consumer Operating Systems and Devices and Computer Topics are the two areas that went slightly up in both the number of titles and units sold in 2011. Systems and Programming still is the largest category and is a chief indicator for the health of the computer book market, and it's been in a consistent decline — for print books. You'll see some more positive indicators in my upcoming post on digital distribution.

Count of Titles Units Family_count.jpg Family_units.jpg

The table below shows each Category Family's compared growth between 2010 and 2011 (YoY Growth), 2010 and 2011 ranking (10Rank/11Rank) and 2010 and 2011 percent of market share (10Share/11Share).

Category Families YoY Growth 10Rank 11Rank 10Share 11Share Business Applications -00.45% 2nd 2nd 21.00% 20.60% Computer Topics / Other 15.78% 6th 6th 03.15% 04.11% Consumer Operating Systems 04.22% 3rd 3rd 15.44% 17.27% Digital Media 09.29% 5th 5th 17.27% 18.58% Systems and Programming -00.64% 1st 1st 34.62% 35.02% Web Design and Development -02.58% 4th 4th 14.32% 13.72%

Before we look into categories further, let's first take a look at the words that make up all the computer titles for 2011. It's an interesting view of the words that the publishing industry puts on the front of books, online searches, and anywhere there is metadata about content. A note about this data: I threw away the stop-words like "the," "and," "it," "with," etc. I also disregarded "Microsoft," since it is a descriptor used for various products and is redundant. Here is the "title" view of the market. What obviously pops to me is Programming and Development, but Data came from nowhere to being a discernible word on the image [located @ 10:00 on a clock].


As the market keeps declining, the response of many publishers is to increase the number of titles published, in an attempt to gain market share. Immediately below are two bar graphs showing the trend for how many titles made it into the Bookscan dataset in a given year, and the average units sold is for all titles. So this is the non-obvious point here: There are not necessarily more titles being published, but more titles making it into the dataset. This could be attributed to a lower threshold to get in. In other words, some weeks the threshold to make the Top 3000 list can be as low as 1 unit sold. It is a relative measure. The last couple of years have had lower thresholds, and thus more titles made the list but with worse average units. When the market is healthy, the threshold moves up and only the solid-performing titles make it into the Top 3000. The lower threshold barrier is resulting in a significant decrease in the average units per titles for all publishers.

Number of Titles Average Units num_titles.jpg Avg_Units.jpg

When we drill into the category families a bit, we see that seven of our 10 top categories (known as super-categories) sold fewer units in 2010 than in 2009, for a net loss of -244,936 units for just the top 10 areas. In other words, our bigger and typically more stable areas were selling significantly fewer units in 2010. In the first half of 2010, there were 49 super category areas that were ahead in the sales over the first half of 2009, yet six of the 49 categories slowed down and ended up losing enough ground to show a year-over-year decrease in units. We ended up with 43 super-categories producing more units in 2010 than they did in 2009.

The biggest winners in growth order are: Tablet, Mobile Programming, Windows Consumer, Security Topics, Hardware Topics, Social Web, Computers and Society, Cloud Computing, Information Technology, and Data Topics. The Tablet super-category went from roughly 15,000 units in the first half of 2010 to an additional 100,000 units in the second half of the year. An increase in titles fueled this growth — output tripled from 7 titles in the first half of 2010 to 22 titles by the year's end.

The areas with the largest drop in units were, in descending order: Web Page Creation, Digital Photography, Mac OS, Flash, Web Programming, Web Design Tools, Personal Computers, Linux, Software Project Management, and Personal Database. The category that surprises me the most is Web Programming. Sixteen fewer titles in the Web Programming area made the list in 2010, and only 7% of the titles sold more than 1,000 units, as compared to 11% in 2009.

The table below provides a view of the market's erosion. The Average Min value represents the "low threshold" weekly average during a given year. The Average Max is the high-range weekly average for a given year. Number of Titles is self-explanatory. You will notice that the years with the highest min had fewer overall titles represented in the data. The bottom line is that as the market erodes, it appears as though we are seeing a watering-down — more titles producing fewer units on average.

Year Average Min Average Max Number of Titles 2004 9.2 1,133 7,451 2005 9.6 1,099 7,123 2006 9.6 1,315 6,881 2007 9.4 1,348 7,092 2008 8.2 1,534 7,310 2009 7.3 1,057 7,557 2010 6.7 1,112 7,792

So it could be said that we've been in a bit of a tech innovation slump. But in my opinion we are in a distribution slump or holding pattern. By that I mean that we have print books, digital versions of the same thing, and yet have you seen any really innovative format for a tech book hit the market lately; something like what Khan Academy has done with other parts of education. They certainly have a long way to go to build out a Computer Science Curriculum. I think before publishers say we are in a tech slump, we need to look inside our own walls first and realize that we may be in a publishing slump as our consumers want different educational experiences.

Now let's look at the categories that comprise each category family. Below are some individual trend charts from our dashboard showing the 24-month period from January 2010 to December 31, 2011 for the major categories. By looking at a 24-month pattern, you get more insight into whether or not a particular area seems to be hit by seasonal factors, and if there is a steady decline/increase for the category. It is important to look at scale on these charts because it visually shows you the relative market size. Another way to think about it is if the trend line is high in the individual box, the category is big, and if it is low, it is a smaller category. What is interesting to note is that Consumer Operating Systems, Digital Media, and Business Applications and Devices all have a January spike, which is likely due to individuals buying "how to" books for their new computers, devices, and operating systems. This is a consistent seasonal pattern.

Systems and Programming Business Apps Consumer Ops and Devices   sys_prog_dash.jpg bus_apps_dash.jpg con_ops_dash.jpg   Web Development & Design Digital Media Computer Topics web_dev_dash.jpg dig_med_dash.jpg com_top_dash.jpg

The Categories (24-month rolling, January 2010 — December 2011)

Clicking on the charts below will produce a larger view. When viewing the charts below, keep the reference charts above in mind. Viewing these jointly provides more context on the size of market and seasonal patterns.

Category_Family: Consumer Operating Systems and Devices

Here are the trend lines for the five main categories (cat_family) that make up Consumer Operating Systems and Devices.


This category is a medium-sized area and was the one of three Category Families to show growth year-over-year. This category's growth is driven by the iPad, the iPhone and the Nook in the Portable Devices sub-category.

The consumer operating systems and devices market shows ups and downs each year and pretty closely reflects what is going on in the whole market. If you compare the growth of Mac OS X with Microsoft Windows, the Windows books had in increase in 2010 but both declined in 2011. The chart below shows how these two are stacked up against each other. Foreshadowing the 2012 results, I believe that the Windows category will be up because Windows 8 will ship and be a significant upgrade for most. I believe> that Apple will continue to decline as they roll out $29 upgrades that are minimal. The iPad, iPhone, and Android devices will continue to soar.


Category_Family: Business/Office Applications

When comparing the Business Apps area for 2010 and 2011, there were 12 super_cats (one level below cat_family) that performed ahead of the prior year and 21 that underperformed compared to the prior year. The 21 underperforming super_cats only lost 2,090 more units than the 12 positive areas had gained, for an overall -0.44% growth rate.

The three healthiest super categories were Office Suites at 7.49% growth, Collaboration Technologies at 12.43% growth and Social Network (Facebook) at 11.73% growth, while Presentation Topics at -11.88%, Accounting at -8.21%, and Search at -15.30% saw the biggest drop in units for this Business category. It is interesting to see that Spreadsheets is pretty much the same as the market. A very slight uptick in growth, 88 more units in 2011 that 2010, and is still a large super category in rank. Spreadsheets trails only Digital Photography and Tablets for the top spot as the biggest super category.

Here are the trend lines for the eight categories that make up Business/Office Applications.




Notice how much bigger of a category "office" is than the other two ("gen bus app" & "design"). But the news in this category is that Office titles have slightly stabilized, having gone from -4.66% decrease last year to a -0.48% decrease this year. This decline mirrors the overall market. The category has been dominated by entry level user books. These sort of entry level books are driven by Series that have consistent promises and both Dummies and Microsoft Press each held four spots in the top 10 best sellers list for this category. This does make sense when you think about it. I said last year that it looked like Dummies have a bit of a book dynasty, so to speak, but in 2011 Microsoft Press rocketed into this space well. The category chart Web Apps is mostly dominated by books on Facebook. Who would have thought you'd need a book on how to use Facebook? These are not programming Facebook APIs, but rather how to use the Social Network. Foreshadowing 2012, I expect that this Category Family will continue to do well as Windows 8 will undoubtedly create more demand for Office books in 2012.

Category_Family: Web Design and Development

Web Design and Development is down -4.36% from 2010 to 2011. Another 37,438 fewer units were sold in this category in 2011 than in 2010. And remember, 2010 was one of the worst years we've seen in awhile for this category. There were eight sub areas that showed growth in this category — HTML5 at 74.60% growth and Social Web at 9.36% and JavaScript at 17.32% growth led the way in 2011 for this area. If we combine HTML5 and JavaScript because they are very closely related, the combined growth rate is a healthy 41.39% growth and 45,559 more units sold in 2011. O'Reilly has three of the top five books in this area with Learning PHP, MySQL, and JavaScript leading the category in unit sales for two years in a row. Head First HTML with CSS & XHTML and JavaScript: The Good Parts also cracked the top five for us.

The areas that surprised me the most, though, were Web Programming which saw ~25,152 fewer units sold in 2011 than in 2010 or a -23.61 growth. And closely behind was Web Design Tools that produced -16,534 fewer units for -23.67% growth and Web Development producing -11,684 fewer units and -28.54% growth. Yet HTML5 and JavaScript are growing. This is a bit perplexing but could be attributed to developers wanting more specific topics rather broad reaching topics and tools. In Web Design Tools it is mostly Dreamweaver's fall that puts this category down. In Web Development it is Website creation type of books for "beginners" and "dummies" that have fallen the most.

Here are the trend lines for the eight categories that make up Web Design and Development.




Obviously the big sub categories here are "web design" and "web development." It is dominated by titles that talk about performance, scalability, reliability, and tuning. Similar to what you will find at our Velocity Conference. Foreshadowing for 2012, the area to watch is JavaScript. Doesn't everyone need to know and learn JavaScript?

Category_Family: Systems and Programming

This is the largest of our top-level category families. It is the place where most of the programming language, database, and software development titles reside. There are now 73 super_cat subcategories (super category) in this area and in 2011, 46 of the areas were negative year-over-year and only 27 areas had growth. There were -68,0295 fewer units sold in these areas during 2011. This is only a -3.14% decline, so this large family of titles actually performed slightly worse than the overall market. Mobile Programming and Data Analysis were the two biggest growing areas. Mobile Programming produced 30,636 more units for a 38.84% growth rate while Data Analysis produced 22,925 more units for 22.42% growth rate in 2011.

The top five performing categories, in order, were Mobile Programming, Data Analysis, Security Topics [+9,648 units / 5.53% growth], Java [+7,316 units / 7.33% growth], and Python [4,886 / 10.55% growth]. The categories with the worst performance, in order, were IT Certification [-19,078 / -31.50% growth], Windows Administration [-14,852 units / -14.13% growth], Microsoft Programming [-13,491 / -27.70% growth], C# [-12,993 units / -20.26% growth], and Network General [-11,234 / -19.04% growth].

In the top performing area of Mobile Programming, iOS was nine times as large as Android in 2009, and roughly 2.5 times as large of a category in 2010, and today sells only 1.2 times as many copies of Android books to Developers. Again, this is developer books, not consumer-oriented titles. For more on how the mobile developer market is shaping up, it seems like a two horse race with iOS and Android. Windows Mobile is a blip along with cross-platform solutions like PhoneGap as you can see in the image directly below.


This chart shows the number of units (sum of Unit in blue bars) and the Average units per title (AvgUnitsTitle in red line) for the mobile area. Android has a higher unit average whereas iOS has more units sold because more titles made the list. This all makes me wonder about the Windows Mobile blip and whether Microsoft should just jump into the Android space too or continue to make more from licensing it than their own platform.

Here are the trend lines for the 12 categories that make up Systems and Programming.





Next up, Post 3 will be about the publishers, winners and losers. Post 4 will contain more analysis of programming languages. And Post 5 will look at digital sales.

February 24 2012

Practical applications of data in publishing

At TOC, you're as likely to run into media professionals, entrepreneurs and innovators as you are publishers, booksellers and others working in traditional publishing. This, in turn, makes the underlying themes as varying and diverse as the attendees. This is the second in a series, taking a look at five themes that permeated interviews, sessions and/or keynotes at this year's show. The complete series will be posted here.

As the world — and publishing — becomes more and more digital, more and more data is produced and, ideally, collected. Knowing what kinds of data can be useful and how data analytics can be applied to inform publishing decisions is on the minds of many publishing professionals. Data was one of the overriding themes at this year's Tools of Change for Publishing conference, including discussions on how publishers can benefit from real-time data, practical applications of data and analytics, and how data can not only inform publishing decisions, but can actually aid in content creation.

In a keynote address, Roger Magoulas, director of market research at O'Reilly Media, talked about data research and the view of the data space at O'Reilly. He offered practical suggestions on how to incorporate data and addressed some of the reasons behind the buzz going on in the data space:


Machine learning and natural language processing, for instance, have become mainstream tools. Magoulas said the tools for making use of big data have kept pace with the increasing amounts of data produced, allowing a small team like his — just three people — to do everything.

When incorporating data to inform business decisions or to analyze business scenarios, Magoulas said data alone isn't enough — the data needs a narrative; the numbers alone won't tell the story. He addressed the area of data science from a functional viewpoint:

"On the one side, you manage data — you've got to acquire it; you might have to clean it up; you've got to organize it. On the other side, you're trying to make sense of it; you're trying to gather insights."

Magoulas said those are the two key parts, but that the most important part probably is having or cultivating a culture that can accommodate the data: "People need to understand the message that you're giving ... and how to value the input ... People need to be able to think in an experimental way and to stay curious."

When offering practical suggestions on incorporating data into a business, Magoulas stressed that becoming data savvy is important; "you can't just go buy big data and expect to know what you're doing." He also said keeping the data close to the analysis is important:

"You want to be agile, and if you separate it out and have a data group, an analytics group, and a design group, everyone is going to be waiting for someone else. Integration is really important."

You can view Magoulas' keynote in the following video (and you can find his slides here):

The data discussion turned real-time and academic in the "Mendeley Case Study: How The World's Largest Crowdsourced Academic Database Is Changing Academic Publishing" session, hosted by Jan Reichelt, director and co-founder of Mendeley Ltd. Reichelt shared some lessons learned at Mendeley and talked about how real-time data on content usage provides important insights into how academics interact with research. He stressed the increasing importance of social and community-collaborated content:



In addition to insights gleaned from the data around content usage, data around content production also was telling. Similar to other areas of the publishing industry — journalism, self-publishing — Reichelt highlighted the blurring lines between types of content producers and the types of content produced in academic publishing:


Reichelt's presentation slides can be found here.

Peter Collingridge (@gunzalis), co-founder of Enhanced Editions, talked about how publishers can benefit from real-time data and analytics in terms of marketing. In an interview, he said data can inform answers to vital questions:

"When you're in a much faster-paced world, with the industry moving toward being consumer- rather than trade-facing, and with a fragmented retail and media landscape, you need to make decisions based on fact: What is the ROI on a £50,000 marketing campaign? Where do my banner ads have the best CTR? Who are the key influencers here — are they bloggers, mainstream media, or somewhere else? How many of our Twitter followers actually engage? When should we publish, in what format, and at what price?

Data should absolutely inform the answers to these questions ... Over time, you build up a picture of which tactics work best and which don't. And immediate feedback allows you to hone your activities in real-time to what works best (particularly if you are A/B testing different approaches), or from a more strategic perspective, to plan out campaigns that have historically worked best for comparable titles."

As the data deluge grows in the digital age, it not only is useful for analysis and informing decisions, it also can be used to create content. In a video interview, Robbie Allen, founder and CEO of Automated Insights, a company that produces narrative content from raw data, addressed this topic. He said for now, quantitative content created from structured data — think sports stories, financial reports — is best suited for automation, but that creating content from unstructured data isn't out of the question:

"In the unstructured world, we still can access what I call 'consistent unstructured data.' If there's patterns to data, we can still pull out data from that and make it structured. So, ultimately, we start with structured, then we go to consistent unstructured, and eventually, we'll even be able to pull data out of completely unstructured."

Allen's full interview can be viewed in the following video:

If you couldn't make it to TOC, or you missed a session you wanted to see, sign up for the TOC 2012 Complete Video Compilation and check out our archive of free keynotes and interviews.


February 15 2012

Book marketing is broken. Big data can fix it

Peter Collingridge (@gunzalis), cofounder of Enhanced Editions says digital books are requiring a new style of data-driven marketing and promotion that publishers aren't yet implementing. He also says that book marketing is broken and big data is the solution.

In the following interview, Collingridge talks about how real-time data and analytics can help publishers and he shares insights from the beta period of Bookseer, a market intelligence service for books his company is developing.

What are some key findings from the Bookseer beta?

peter-collingridge.jpgPeter Collingridge: I think despite the increasing awareness of data as being a critical tool for publishers to compete, it's genuinely hard for people to look at data as a natural addition to the work they are doing, whether that's in PR, marketing, acquisition, or pricing.

Publishing has operated in a well-defined way for a long time, where experience and intuition have dominated decision making and change is hard. What has been really exciting is that when people have the data in front of them, clearly showing the immediate impact of something they did — a link between cause and effect that they couldn't see before — they get really excited. We've had people talking about being "obsessed" and "addicted" to the data.

Some of the most surprising findings: That on some titles, big price changes aren't as relevant to volume as everyone thinks; that big-name glowing reviews of literary fiction don't have anywhere near the impact on sales to merit the effort; and that social media buzz almost never translates into sales.

For me, the key observations so far are around marketing. First, big budget media spending and ostentatious banner ads might impress authors and bookshops, but they deliver very poor return on investment (ROI) for sales. Secondly, the super-smart publishers are behaving like startups and doing tiny little pieces of very focused and cheap marketing — and watching the results like hawks before iterating in direct response to the data. Bookseer is designed to disclose the former and to aid the latter — and that is probably our biggest finding: it works!

Find out more about Bookseer in the following video from the If Book Then conference earlier this year in Milan.

What kinds of data are most important for publishers to track?

Peter Collingridge: Before we built Bookseer, we spoke with 25 people across the industry, including authors big, small and unpublished; editors and publishers; managing directors; digital directors; sales, marketing and PR directors; and literary agents. We asked exactly that question.

For most people, the data they had was pretty basic: Nielsen (which obviously only goes to the granularity of one week) plus the F5 button to manically refresh an Amazon web page for changes in sales rank. Neither of these is particularly helpful in determining the impact of an activity.

Of course, there are loads of data points, but we began with the lowest-hanging fruit. Aggregated sales (print and digital) across multiple sources; Amazon sales rank; price; best-seller charts; social media mentions; buzz; review coverage in mainstream and new media, and on social reading sites; and other factors such as promotion (advertising and other) and merchandising.

We think the most important thing to do is aggregate activity and data points across as many sources as possible, building a picture of what's going on for one title or across a whole retailer, and allowing publishers to draw their own conclusions.

What does real-time data let publishers do?

Peter Collingridge: Publishing has been B2B, about supplying books into bookshops, for forever — combined with working with media to support that. And for that world, weekly aggregated retail sales work, I guess. But when you're in a much faster-paced world, with the industry moving toward being consumer- rather than trade-facing, and with a fragmented retail and media landscape, you need to make decisions based on fact: What is the ROI on a £50,000 marketing campaign? Where do my banner ads have the best CTR? Who are the key influencers here — are they bloggers, mainstream media, or somewhere else? How many of our Twitter followers actually engage? When should we publish, in what format, and at what price?

Data should absolutely inform the answers to these questions. Furthermore, with a disciplined approach to promotion, where activities are separated from each other by a day or a few hours, real-time measurement can identify what works and what doesn't. We can identify the difference between Al Gore tweeting about a book and Tim O'Reilly doing the same; the difference between a Time review and a piece on CNN; the impact of a price drop against an email sent to 200,000 subscribers; and measure the exact ROI on a £300 campaign against a £30,000 one.

Over time, you build up a picture of which tactics work best and which don't. And immediate feedback allows you to hone your activities in real-time to what works best (particularly if you are A/B testing different approaches), or from a more strategic perspective, to plan out campaigns that have historically worked best for comparable titles.

How would you describe the relationship between sales and social media?

Peter Collingridge: Right now, sales drives social — not the other way round. However, I believe there will come a point when that's not the case, and we will be able to identify that.

This interview was edited and condensed.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20


Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!