Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

September 20 2011

BuzzData: Come for the data, stay for the community

BuzzDataAs the data deluge created by the activities of global industries accelerates, the need for decision makers to find a signal in the noise will only grow more important. Therein lies the promise of data science, from data visualization to dashboard to predictive algorithms that filter the exaflood and produce meaning for those who need it most. Data consumers and data producers, however, are both challenged by "dirty data" and limited access to the expertise and insight they need. To put it another way, if you can't derive value, as Alistair Croll has observed here at Radar, there's no such thing as big data.

BuzzData, based in Toronto, Canada, is one of several startups looking to help bridge that gap. BuzzData launched this spring with a combination of online community and social networking that is reminiscent of what GitHub provides for code. The thinking here is that every dataset will have a community of interest around the topic it describes, no matter how niche it might be. Once uploaded, each dataset has tabs for tracking versions, visualizations, related articles, attachments and comments. BuzzData users can "follow" datasets, just as they would a user on Twitter or a page on Facebook.

"User experience is key to building a community around data, and that's what BuzzData seems to be set on doing," said Marshall Kirkpatrick, lead writer at ReadWriteWeb, in an interview. "Right now it's a little rough around the edges to use, but it's very pretty, and that's going to open a lot of doors. Hopefully a lot of creative minds will walk through those doors and do things with the data they find there that no single person would have thought of or been capable of doing on their own."

The value proposition that BuzzData offers will depend upon many more users showing up and engaging with one another and, most importantly, the data itself. For now, the site remains in limited beta with hundreds of users, including at least one government entity, the City of Vancouver.

"Right now, people email an Excel spreadsheet around or spend time clobbering a shared file on a network," said Mark Opauszky, the startup's CEO, in an interview late this summer. "Our behind-the-scenes energy is focused on interfaces so that you can talk through BuzzData instead. We're working to bring the same powerful tools that programmers have for source code into the world of data. Ultimately, you're not adding and removing lines of code — you're adding and removing columns of data."

Opauszky said that BuzzData is actively talking with data publishers about the potential of the platform: "What BuzzData will ultimately offer when we move beyond a minimum viable product is for organizations to have their own territory in that data. There is a 'brandability' to that option. We've found it very easy to make this case to corporations, as they're already spending dollars, usually on social networks, to try to understand this."

That corporate constituency may well be where BuzzData finds its business model, though the executive team was careful to caution that they're remaining flexible. It's "absolutely a freemium model," said Opauszky. "It's a fundamentally free system, but people can pay a nominal fee on an individual basis for some enhanced features — primarily the ability to privatize data projects, which by default are open. Once in a while, people will find that they're on to something and want a smaller context. They may want to share files, commercialize a data product, or want to designate where data is stored geographically."

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code ORM30


Open data communities

"We're starting to see analysis happen, where people tell 'data stories' that are evolving in ways they didn't necessarily expect when they posted data on BuzzData," said Opauszky. "Once data is uploaded, we see people use it, fork it, and evolve data stories in all sorts of directions that the original data publishers didn't perceive."

For instance, a dataset of open data hubs worldwide has attracted a community that improved the original upload considerably. BuzzData featured the work of James McKinney, a civic hacker from Montreal, Canada, in making it so. A Google Map mashing up locations is embedded below:


The hope is that communities of developers, policy wonks, media, and designers will self-aggregate around datasets on the site and collectively improve them. Hints of that future are already present, as open government advocate David Eaves highlighted in his post on open source data journalism at BuzzData. As Eaves pointed out, it isn't just media companies that should be paying attention to the trends around open data journalism:

For years I argued that governments — and especially politicians — interested in open data have an unhealthy appetite for applications. They like the idea of sexy apps on smart phones enabling citizens to do cool things. To be clear, I think apps are cool, too. I hope in cities and jurisdictions with open data we see more of them. But open data isn't just about apps. It's about the analysis.

Imagine a city's budget up on BuzzData. Imagine the flow rates of the water or sewage system. Or the inventory of trees. Think of how a community of interested and engaged "followers" could supplement that data, analyze it, and visualize it. Maybe they would be able to explain it to others better, to find savings or potential problems, or develop new forms of risk assessment.

Open data journalism

"It's an interesting service that's cutting down barriers to open data crunching," said Craig Saila, director of digital products at the Globe and Mail, Canada's national newspaper, in an interview. He said that the Globe and Mail has started to open up the data that it's collecting, like forest fire data, at the Globe and Mail BuzzData account.

"We're a traditional paper with a strong digital component that will be a huge driver in the future," said Saila. "We're putting data out there and letting our audiences play with it. The licensing provides us with a neutral source that we can use to share data. We're working with data suppliers to release the data that we have or are collecting, exposing the Globe's journalism to more people. In a lot of ways, it's beneficial to the Globe to share census information, press releases and statistics."

The Globe and Mail is not, however, hosting any information there that's sensitive. "In terms of confidential information, I'm not sure if we're ready as a news organization to put that in the cloud," said Saila. "Were just starting to explore open data as a thing to share, following the Guardian model."

Saila said that he's found the private collaboration model useful. "We're working on a big data project where we need to combine all of the sources, and we're trying to munge them all together in a safe place," he said. "It's a great space for journalists to connect and normalize public data."

The BuzzData team emphasized that they're not trying to be another data marketplace, like Infochimps, or replace Excel. "We made an early decision not to reinvent the wheel," said Opauszky, "but instead to try to be a water cooler, in the same way that people go to Vimeo to share their work. People don't go to Flickr to edit photos or YouTube to edit videos. The value is to be the connective tissue of what's happening."

If that question about "what's happening?" sounds familiar to Twitter users, it's because that kind of stream is part of BuzzData's vision for the future of open data communities.

"One of the things that will become more apparent is that everything in the interface is real time," said Opauszky. "We think that topics will ultimately become one of the most popular features on the site. People will come from the Guardian or the Economist for the data and stay for the conversation. Those topics are hives for peers and collaborators. We think that BuzzData can provide an even 'closer to the feed' source of information for people's interests, similar to the way that journalists monitor feeds in Tweetdeck."

Related:

August 25 2011

The Daily Dot wants to tell the web's story with social data journalism

If the Internet is the public square of the 21st century, the Daily Dot wants to be its town crier. The newly launched online media startup is trying an experiment in community journalism, where the community is the web. It's an interesting vision, and one that looks to capitalize on the amount of time people are spending online.

The Daily Dot wants to tell stories through a mix of data journalism and old-fashioned reporting, where its journalists pick up the phone and chase down the who, what, when, where, how and why of a video, image or story that's burning up the social web. The site's beat writers, who are members of the communities they cover, watch what's happening on Twitter, Facebook, Reddit, YouTube, Tumblr and Etsy, and then cover the issues and people that matter to them.

Daily Dot screenshot

Even if the newspaper metaphor has some flaws, this focus on original reporting could help distinguish the Daily Dot in a media landscape where attention and quality are both fleeting. In the hurly burly of the tech and new media blogosphere, picking up the phone to chase down a story is too often neglected.

There's something significant about that approach. Former VentureBeat editor Owen Thomas (@OwenThomas), the founding editor of the Daily Dot, has emphasized this angle in interviews with AdWeek and Forbes. Instead of mocking what people do online, as many mainstream media outlets have been doing for decades, the Daily Dot will tell their stories in the same way that a local newspaper might cover a country fair or concert. While Thomas was a well-known master of snark and satire during his tenure at Valleywag, in this context he's changed his style.

Where's the social data?

Whether or not this approach gains traction within the communities the Daily Dot covers remains to be seen. The Daily Dot was co-founded by Nova Spivack, former newspaper executive Nicholas White, and PR consultant Josh Jones-Dilworth, with a reported investment of some $600,000 from friends and family. White has written that he gave up the newspaper to save newspapering. Simply put, the Daily Dot is experimenting with covering the Internet in a way that most newspapers have failed to do.

"I trust that if we keep following people into the places where they gather to trade gossip, argue the issues, seek inspiration, and share lives, then we will also find communities in need of quality journalism," wrote White. "We will be carrying the tradition of local community-based journalism into the digital world, a professional coverage, practice and ethics coupled with the kind of local interaction and engagement required of a relevant and meaningful news source. Yet local to us means the digital communities that are today every bit as vibrant as those geographically defined localities."

To do that, they'll be tapping into an area that Spivack, a long-time technology entrepreneur, has been investing and writing about for years: data. Specifically, applying data journalism to mining and analyzing the social data from two of the web's most vibrant platforms: Tumblr and Reddit.

White himself is unequivocal about the necessity of data journalism in the new digital landscape, whether at the Daily Dot or beyond:

The Daily Dot may be going in this direction now because of our unique coverage area, but if this industry is to flourish in the 21st century, programming journalists should not remain unique. Data, just like the views of experts, men on the street, polls and participants, is a perspective on the world. And in the age of ATMs, automatic doors and customer loyalty cards, it's become just as ubiquitous. But the media isn't so good with data, with actual mathematics. Our stock-in-trade is the anecdote. Despite a complete lack of solid evidence, we've been telling people their cell phones will give them cancer. Our society ping-pongs between eating and not eating carbs, drinking too much coffee and not enough water, getting more Omega-3s — all on the basis of epidemiological research that is far, far, far from definitive. Most reporters do not know how to evaluate research studies, and so they report the authors' conclusions without any critical evaluation — and studies need critical evaluation.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science -- from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code STN11RAD

Marshall Kirkpatrick, a proponent and practitioner of data journalism, dug deep into how data journalism happens at the Daily Dot. While he's similarly unsure of whether the publication will be interesting to a large enough audience to sustain an advertising venture, the way that the Daily Dot is going about hunting down digital stories is notable. Kirkpatrick shared the details over at ReadWriteWeb:

In order to capture and analyze that data from sites like Twitter, YouTube, Reddit, Etsy and more (the team says it's indexing a new community about every six weeks), the Dot has partnered with the mathematicians at Ravel Data. Ravel uses 80Legs for unblockable crawling, then Hadoop, its own open source framework called GoldenOrb and then an Eigenvector centrality algorithm (similar to Pagerank) to index, analyze, rank and discover connections between millions of users across these social networks.

There are a couple of aspects of data journalism to consider here. One is supplementing the traditional "nose for news" that Daily Dot writers apply to finding stories. "The data really begins to serve as our editorial prosthetics of sorts, telling us where to look, with whom to speak, and giving us the basic groundwork of the communities that we can continue to prod in interesting ways and ask questions of," explained Doug Freeman, an associate at Daily Dot investor Josh Jones-Dilworth's PR firm, in an interview. In other words, the editors of the Daily Dot analyze social data to identify the community's best sources for stories and share them on a "Leaderboard" that — in beta — shows a ranked list of members of Tumblr and Reddit.

Another open question is how social data could help with the startup's revenue down the road. "Our data business is a way of creating and funding new value in this regard; we instigated structured crawls of all of the communities we will cover and will continue to do so as we expand into new places," said Freeman. "We started with Reddit (for data and editorial both) because it is small and has a lot of complex properties — a good test balloon. We've now completed data work with Tumblr and YouTube and are continuing." For each community, data provides a view of members, behaviors, and influence dynamics.

That data also relates to how the Daily Dot approaches marketing, branding and advertising. "It's essentially a to-do list of people we need to get reading the Dot, and a list of their behaviors," said Freeman. "From a brand [point of view], it's market and audience intelligence that we can leverage, with services alongside it. From an advertiser [point of view], this data gives resolution and insight that few other outlets can provide. It will get even more exciting over time as we start to tie Leaderboard data to user accounts and instigate CPA-based campaigns with bonuses and bounties for highly influential clicks."

Taken as a whole, what the Daily Dot is doing with social data and digital journalism feels new, or at least like a new evolution. We've seen Facebook and Twitter integration into major media sites, but not Reddit and Tumblr. It could be that the communities of these sites acting as "curation layers" for the web will produce excellent results in terms of popular content, though relevance could still be at issue. Whether this venture in data journalism is successful or not will depend upon it retaining the interest and loyalty of the communities it covers. What is clear, for now, is that the experiment will be fun to watch — cute LOL cats and all.



Related:


August 10 2011

T-Mobile challenges churn with data

For T-Mobile USA, Inc., big data is federated and multi-dimensional. The company has overcome challenges from a disparate IT infrastructure to enable regional marketing campaigns, more advanced churn management, and an integrated single-screen "Quick View" for customer care. Using its data integration architecture, T-Mobile USA can begin to manage "data zones" that are virtualized from the physical storage and network infrastructure.

With 33.63 million customers at the end of the first quarter of 2011 and US$4.63 billion in service revenues that quarter, T-Mobile USA manages a complex data architecture that has been cobbled together through the combination of VoiceStream Wireless (created in 1994), Omnipoint Communications (acquired in 2000) and Powertel (merged with VoiceStream Wireless in 2001 by new parent company Deutsche Telekom AG).

The recently announced AT&T agreement to acquire T-Mobile USA kicked off a regulatory review process that is expected to last approximately 12 months. If completed, the acquisition would create the largest wireless carrier in the United States, with nearly 130 million customers. Until then, AT&T and T-Mobile USA remain separate companies and continue to operate independently.

Information management architecture

As T-Mobile USA awaits the next stage of its corporate history, integration architecture manager Sean Hickey and his colleagues manage data flows across a federated, disparate infrastructure. To enable T-Mobile's more than 33 million U.S. customers to "stick together," as the company says in its marketing tagline, a lot of subscriber and network data has to come together among multiple databases and source systems.

T-Mobile Information Management Architecture and Source Systems
T-Mobile Information Management Architecture and Source Systems (click to enlarge).

Previously, many IT systems were very point specific, stove-piped and not scalable. Some systems began as start-up projects that are now still running seven or eight years later, long after they no longer meet a good return on investment (ROI) standard. Staff that knew the original data models and schema no longer work there.

To integrate data across its disparate federated architecture, T-Mobile USA uses Informatica PowerCenter. (Disclosure: Informatica is a client of my company, Zettaforce.). T-Mobile runs PowerCenter version 8.6.1, is a 9.1 beta customer, and plans to upgrade to version 9.1 in the fourth quarter of this year. Data modeling tools include CA ERwin and Embarcadero ER/Studio. To identify data relationships in its complex IT environment, T-Mobile USA uses Informatica PowerCenter Data Profiling and IBM Exeros Discovery
Data Architecture (now part of IBM InfoSphere Discovery).

This data integration layer powers multiple key business drivers, including regional marketing campaigns, churn management and customer care. Longer term projects — such as adoption of self-service BI and automatically provisioned virtual data marts for business analysts — are on hold pending the acquisition.

Virtual data zones

Backed by this data integration layer, the T-Mobile USA architecture team introduced the concept of virtual "data zones". Each data zone comprises data subjects, and is tied to one or more business objectives. These zones virtualize data applications from the physical data storage and network. From a data architecture perspective, the data zone approach helps pinpoint where there are complex systems to maintain, shadow IT, redundant feeds, differences in data definitions or incompatible data. This approach also helps highlight where business rules are embedded all over, leading to duplicate or inconsistent business rules, versus more centralized rule management.

T-Mobile Data Zones
T-Mobile Data Zones (click to enlarge).

T-Mobile USA adopted SAP BusinessObjects Strategic Workforce Planning, the first SAP application to use SAP HANA in-memory computing to provide real-time insights and simulation capabilities. According to Sean Hickey, T-Mobile USA has been very pleased so far with pilot tests of the HANA-enabled in-database analytics.

Legacy systems do present constraints with management of specific data subjects. For example, T-Mobile USA would like to archive off historical subscriber records that are more than seven years old, which is the cut-off date for regulatory-required retention. However, with the bottom-up growth of the the company's data architecture, it is difficult to carve out old data. The call date was not necessarily part of the partition key. Accordingly, with how data is segmented, T-Mobile USA continues to store subscriber records and other information dating back to 1999.

T-Mobile Data Zones
T-Mobile Data Subjects (click to enlarge).

Regional marketing campaigns

Each data zone is associated with one or more strategic business objectives. For marketing, a couple years ago T-Mobile did a fairly aggressive U.S. reorganization to become a more regional-oriented organization. T-Mobile used to do U.S. national marketing campaigns but has moved to a decentralized model that involves geography, demographics and call usage patterns to perform cross-sell and upsell campaigns by region, with assistance from third-party marketing partners for outsourced analytics. T-Mobile now has more than 20 regional districts across the United States, with a local head who is responsible for sales, marketing and operations in that district.

Northern California VP and GM Rich Garwood added about 30 staff in new regional jobs to take over functions previously handled by T-Mobile USA headquarters in Bellevue, Wash., and will for the first time make a concerted effort to market to small business owners in Northern California. "It's exciting for us as employees. We really have local ownership of what the results are", Garwood told the San Francisco Business Times.

T-Mobile Data Zones
T-Mobile Business Objectives Associated with Each Data Zone (click to enlarge).

SAS Marketing Automation gathers 300 attributes, including campaigns, take rates and dispositions. Before, T-Mobile did national campaigns, with a kind of "shoot and see what sticks" approach. Now, T-Mobile's regions can run targeted campaigns specific to customer demographics and customer segmentation. This requires pulling in more than 20 different sources of data. Deep data mining operations cover billions of rows a day.

For analytics reporting, T-Mobile USA uses SAP Business Objects including Crystal Reports. Finance and accounting department staff still tend to download data into Excel spreadsheets. As part of the company's data security enforcement, every employee and contractor is required to use a T-Mobile-supplied computer with hard drive encryption. Power users can access the Teradata system directly with Teradata SQL for data mining.

Churn management

T-Mobile USA has begun using a "tribe" calling circle model — with multi-graphs akin to social network analysis — to predict propensity of churns and mitigate the potential impact of "tribe leaders" who have high influence in large, well-connected groups of fellow subscribers. An influential tribe "leader" who switches to a competitor's service can kick off "contagious churn," where that leader's friends, family or co-workers also switch.

In the past, wireless service providers calculated net present value (NPV) by estimating a subscriber's lifetime spend on services and products. Now, part of the NPV calculation measures the level of influence and size of a subscriber's tribe.

As noted by Ken King, marketing manager for the communications industry at SAS: "In North America, we increasingly work with service providers that are keen to examine not only segmentation, churn and customer lifetime value but new things like social networking impact on their brands, or the relationships between customers so they can recognize group leaders and their influence on others in terms of buying products or switching to competitors."

Churn management at T-Mobile USA begins with an Amdocs subscriber billing system and financial data stored in a Teradata enterprise data warehouse (EDW). "The heart of the company is the billing system", said Sean Hickey.

However, some key data for churn management is not captured in the billing system. Non-billable events can be very important for marketing. Raw call data gathered from cell towers and switches, supplied by Ericsson and other system vendors, can show the number of dropped calls for each subscriber and the percent of a subscriber's total calls that drop. T-Mobile USA loads call data into IBM Netezza systems from a series of flat files.

T-Mobile engineering uses this data for drop call analysis. They can look at drop calls for specific phone numbers. For example, if a T-Mobile customer moves to a new home in a location where cell towers provide only limited coverage, T-Mobile marketing can proactively offer the subscriber a new cell phone that could improve reception, or a free femtocell that connects to the subscriber's home broadband network. Customer demographic data, however, is not stored in the Netezza systems — that's stored by T-Mobile IT in its Teradata enterprise data warehouse (TED).

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science -- from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code STN11RAD

The Teradata EDW sends out extracts to T-Mobile USA's SQL servers and Oracle servers. The Teradata, Oracle and Microsoft SQL Server databases are fed by dozens of source systems, including Seibel, the Billing Portal, Epiphany, Sales Operations and Cash Applications. "Shadow IT" data warehouses include revenue assurance, cost benefits analysis (CBA), business operations and sales credits.

The T-Mobile USA information management team has targeted multiple data marts and shadow IT warehouses to incorporate into the Teradata enterprise data warehouse, pending funded projects to add those to the EDW. In this respect, T-Mobile USA is similar to many other Fortune 500 organizations, which balance an EDW vision with the constraints of budgeting, legacy systems and acquisition integration, and therefore manage a hybrid information management architecture combining an EDW and data federation.

Data delivery is really across the board. It takes a day for information to be batch loaded from retail stores and web sales. It used to then take a second data for analysis. The combination of Informatica PowerCenter and SAP Business Objects Explorer enables the T-Mobile USA channel management team to run reports within seconds rather than an hour or a day. "It's a pretty cool platform," said Hickey. Future steps may target speeding up the data acquisition.

T-Mobile USA continues to innovate for churn management. To better identify the multi-faceted reasons behind customer turnover, T-Mobile USA ran a proof of concept (PoC) with EMC Greenplum, with a storage capacity of roughly 1 petabyte, including data from cell towers, call records, clickstreams and social networks. Following the PoC, T-Mobile USA decided to work with an outsourced service provider, which uses Apache Hadoop to store and process multi-dimensional data. Sentiment analysis predicts triggers and indicators of what customer actions are going to be, which helps T-Mobile proactively respond.

Informatica's newly announced PowerCenter version 9.1 includes connectivity for Hadoop Distributed File System (HDFS), to load or extract data, as explained by Informatica solution evangelist Julianna DeLua. Customers can use Informatica data quality and other transformation tools either pre- or post-writing the data into HDFS.

Single-screen Quick View for customer care

Backed by this data integration architecture, T-Mobile USA just rolled out Quick View as part of an upgrade of its customer care system. With Quick View, agents and retail store associates can view multiple key indicators including the customer segmentation value on one screen. Before, call center agents and retail store associates had to look at multiple screens, which is problematic while talking live with a customer.

Quick View pops up with offers specific to that customer, such as a new phone or a new service plan. Subscribers with a high value may be sent automatically to care agents specially trained on handling high-value customers. T-Mobile USA plans to extend Quick View to third-party retailer partners such as Best Buy that sell T-Mobile phones and services in their retail stores.

More integration

In addition to empowering innovations in regional marketing campaigns, churn management and customer care, data integration will take on even more significance if the AT&T acquisition of T-Mobile USA is approved next year. An approved acquisition would kick off a host of new integration initiatives between the two companies.



Related:


Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl