Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

May 10 2012

Strata Week: Big data boom and big data gaps

Here are a few of the data stories that caught my attention this week.

Big data booming

The call for speakers for Strata New York has closed, but as Edd Dumbill notes, the number of proposals are a solid indication of the booming interest in big data. The first Strata conference, held in California in 2011, elicited 255 proposals. The following event in New York elicited 230. The most recent Strata, held in California again, had 415 proposals. And the number received for Strata's fall event in New York? That came in at 635.

Edd writes:

"That's some pretty amazing growth. I can thus expect two things from Strata New York. My job in putting the schedule together is going to be hard. And we're going to have the very best content around."

The increased popularity of the Strata conference is just one data point from the week that highlights a big data boom. Here's another: According to a recent report by IDC, the "worldwide ecosystem for Hadoop-MapReduce software is expected to grow at a compound annual rate of 60.2 percent, from $77 million in revenue in 2011 to $812.8 million in 2016."

"Hadoop and MapReduce are taking the software world by storm," says IDC's Carl Olofson. Or as GigaOm's Derrick Harris puts it: "All aboard the Hadoop money train."

A big data gap?

Another report released this week reins in some of the exuberance about big data. This report comes from the government IT network MeriTalk, and it points to a "big data gap" in the government — that is, a gap between the promise and the capabilities of the federal government to make use of big data. That's interesting, no doubt, in terms of the Obama administration's recent $200 million commitment to a federal agency big data initiative.

Among the MeriTalk report's findings: 60% of government IT professionals say their agency is analyzing the data it collects and less than half (40%) are using data to make strategic decisions. Those responding to the survey said they felt as though it would take, on average, three years before their agencies were ready to fully take advantage of big data.

Prismatic and data-mining the news

The largest-ever healthcare fraud scheme was uncovered this past week. Arrests were made in seven cities — some 107 doctors, nurses and social workers were charged, with fraudulent Medicare claims totaling about $452 million. The discoveries about the fraudulent behavior were made thanks in part to data-mining — looking for anomalies in the Medicare filings made by various health care providers.

Prismatic penned a post in which it makes the case for more open data so that there's "less friction" in accessing the sort of information that led to this sting operation.

"Both the recent sting and the Prime case show that you need real journalists and investigators working with technology and data to achieve good results. The challenge now is to scale this recipe and force transparency on a larger scale.

"We need to get more technically sophisticated and start analysing the data sets up front to discover the right questions to ask, not just the answer the questions we already know to ask based on up-front human investigation. If we have to discover each fraud ring or singleton abuse as a one-off case, we'll never be able to wipe out fraud on a large enough scale to matter."

Indeed, despite this being the largest bust ever, it's really just a fraction of the estimated $20 to $100 billion a year in Medicare fraud.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20

Got data news?

Feel free to email me.

Related:

March 02 2012

Visualization of the Week: Visualizing the Strata Conference

The Strata Conference wrapped up yesterday. Who was there and where did they come from? That's what The Guardian and Information Lab looked to discover in the following visualization.

The Guardian's visualization of the Strata Conference
Click to enlarge.

You can view the entire visualization here.

Found a great visualization? Tell us about it

This post is part of an ongoing series exploring visualizations. We're always looking for leads, so please drop a line if there's a visualization you think we should know about.

Strata Santa Clara 2012 Complete Video Compilation
The Strata video compilation includes workshops, sessions and keynotes from the 2012 Strata Conference in Santa Clara, Calif. Learn more and order here.

More Visualizations:

March 01 2012

Commerce Weekly: Small banks lagging in mobile

Here are some of the commerce stories that caught my attention this week.

Smaller banks lagging in mobile channel

Smaller financial institutions, which depend on a higher level of customer service to compete with the giants, are falling behind in the increasingly important mobile channel, according to a report by Javelin Strategy & Research. Javelin says about 37% of customers at big banks use mobile banking, compared with only 21% at regional and community banks and only 15% at credit unions. Javelin's report suggests two reasons for this. First, community bank customers tend to be older, less well off, and less tech-savvy than customers at big banks. Second, big banks can invest more in online and mobile development and marketing, resulting in a better banking experience through those channels. (That's certainly been my experience: my attempts to switch to a smaller bank were thwarted by a virtually unusable online banking system, which drove me back into the warm and fuzzy interface of a cold financial giant.)

Some smaller financial institutions say they benefitted from the anti-big-bank sentiment of the past year, epitomized by Bank Transfer Day on Nov. 5, 2011. Redwood Credit Union in Santa Rosa, for example, says its new membership was three times the normal rate last fall. But to keep that momentum going, Javelin suggests, financial institutions like Redwood will need to funnel some of their new income into development of these channels.

The report also found that mobile usage is beginning to surpass non-mobile online usage, even if those customers tap their accounts through a mobile browser. Most customers reach banks' mobile sites through a browser on their phone. However, at the largest banks, which tend to offer a "triple play," more customers use apps and SMS text instead of the browser.

X.commerce harnesses the technologies of eBay, PayPal and Magento to create the first end-to-end multi-channel commerce technology platform. Our vision is to enable merchants of every size, service providers and developers to thrive in a marketplace where in-store, online, mobile and social selling are all mission critical to business success. Learn more at x.com.

How Netflix improves its recommendations

One of the interesting presentations at O'Reilly's Strata conference this week was about how Netflix looks at its data to present recommendations of other shows members might like. Netflix streams 30 million shows a day. It has 5 billion ratings on those shows and collects another 4 million every day. Data scientist Xavier Amatriain discussed how Netflix uses the data from those ratings and other, more implicit data (including what people watch, which listings they mouse over to read, whether or not they finish programs) to offer recommendations that members will like enough to keep their accounts active, month after month.

Netflix gained a lot of attention a few years back with a broad open innovation initiative: it offered $1 million to anyone who could improve the Netflix recommendation engine by at least 10%. Amatriain said two teams tied for the prize with plans that improved the probability that Netflix could recommend shows that members would like based on their previous activities (though, he added, the cost of integrating those new recommendation engines into Netflix' system may have exceeded their value). Even so, since 75% of shows watched on Netflix's streaming service are based on recommendations, it's more important than ever to offer something that will draw viewers' interest.

Netflix queue example

The clues from all this data allow Netflix to present an array of recommendations to its members. First, there's a row of "top ten" most likely shows. Of course, as Amatriain pointed out, these recommendations are based on viewing history and clues of the entire membership household, not just one viewer. For example, when I log on, along with the thrillers and comedies that Netflix recommends to me, there's a fair amount of "Pretty Little Liars" and other teen dramas that my daughters might like. I used to wonder if this bizarre mix confused Netflix, but Amatriain's talk has reassured me that the company understands what's going on. Then, at a finer-grained level, there are "hyper genres" that Netflix can offer based on your track record: not just Kids Shows, but Goofy Kids Shows; not just Family Movies but Feel-good Father-Daughter Movies. Slicing the offerings narrowly improves the chances of a hit, and it's no accident that the single most likely recommendation is the first one in each row.

Of course, the main complaint Netflix receives (other than its new price structure, I would imagine) is, "why don't you have the show I want to watch?" Amatriain said the company also looks at implicit data to decide what new content to license. So when you search for a show that Netflix doesn't offer for streaming, it gets noted. I guess if you really want it to show up, keep searching for it.

Opera enters the payment fray, PayPal and Home Depot go nationwide

Mobile World Congress, the humongous European conference on all things mobile, is happening this week and everyone loosely connected to mobile payments seemed to time an announcement around it. Here are some of the more interesting announcements that have come down the PR wire from Barcelona:

  • Opera, whose Opera Mini browser has more than 160 million downloads, launched the Opera Payment Exchange (OPX). Opera says it wants to "democratize" the payment space by building a payment platform that works on more platforms and devices than Android and iOS smartphones. It says the OPX platform provides APIs that developers can use to integrate payment systems with the Opera Mini mobile browser.
  • PayPal and Home Depot said they would roll out nationwide the payment program they have been piloting in a handful of Bay Area stores over the past six weeks. The program is a significant step for PayPal, bringing its payment system offline and into the physical retail world. Customers can buy hardware and other stuff on their PayPal account, with a PayPal card or with a mobile number and PIN — no NFC required.
  • Isis, the mobile payments joint venture between AT&T, T-Mobile, and Verizon Wireless, announced more partners in its effort to build a payments ecosystem. Customers of Chase, CapitalOne, and BarclayCard will be able to load their payment information into Isis-compatible phones when they're ready. Isis secured deals with the top four credit card companies (or "payment networks" to use the parlance) last July; now it's making agreements with the banks ("issuers"). Isis is planning two pilots in 2012, in Austin and Salt Lake City, though it's not clear what phones the technology will be in by then.

Tip us off

News tips and suggestions are always welcome, so please send them along.


If you're interested in learning more about the commerce space, check out DevZone on x.com, a collaboration between O'Reilly and X.commerce.


Bank photo: Old Bank in Sunbury Village by Maxwell Hamilton, on Flickr

Related:

February 22 2012

Big data in the cloud

Big data and cloud technology go hand-in-hand. Big data needs clusters of servers for processing, which clouds can readily provide. So goes the marketing message, but what does that look like in reality? Both "cloud" and "big data" have broad definitions, obscured by considerable hype. This article breaks down the landscape as simply as possible, highlighting what's practical, and what's to come.

IaaS and private clouds



What is often called "cloud" amounts to virtualized servers: computing
resource that presents itself as a regular server, rentable per
consumption. This is generally called infrastructure as a service
(IaaS), and is offered by platforms such as Rackspace Cloud or Amazon
EC2. You buy time on these services, and install and configure your
own software, such as a Hadoop cluster or NoSQL database. Most of the
solutions I described in my Big Data Market Survey can be deployed on
IaaS services.



Using IaaS clouds doesn't mean you must handle all deployment
manually: good news for the clusters of machines big data
requires. You can use orchestration frameworks, which handle the
management of resources, and automated infrastructure tools, which
handle server installation and configuration. RightScale offers a
commercial multi-cloud management platform that mitigates some of the
problems of managing servers in the cloud.



Frameworks such as OpenStack and Eucalyptus aim to present a uniform
interface to both private data centers and the public
cloud. Attracting a strong flow of cross industry support, OpenStack
currently addresses computing resource (akin to Amazon's EC2) and
storage (parallels Amazon S3).



The race is on to make private clouds and IaaS services more usable:
over the next two years using clouds should become much more
straightforward as vendors adopt the nascent standards. There'll be a
uniform interface, whether you're using public or private cloud
facilities, or a hybrid of the two.



Particular to big data, several configuration tools already target
Hadoop explicitly: among them Dell's Crowbar, which aims to make
deploying and configuring clusters simple, and Apache Whirr, which is
specialized for running Hadoop services and other clustered data processing systems.



Today, using IaaS gives you a broad choice of cloud supplier, the
option of using a private cloud, and complete control: but you'll be
responsible for deploying, managing and maintaining your clusters.

Microsoft SQL Server is a comprehensive information platform offering enterprise-ready technologies and tools that help businesses derive maximum value from information at the lowest TCO. SQL Server 2012 launches next year, offering a cloud-ready information platform delivering mission-critical confidence, breakthrough insight, and cloud on your terms; find out more at www.microsoft.com/sql.

Platform solutions

Using IaaS only brings you so far for with big data applications: they handle the creation of computing and storage resources, but don't address anything at a higher level. The set up of Hadoop and Hive or a similar solution is down to you.

Beyond IaaS, several cloud services provide application layer support for big data work. Sometimes referred to as managed solutions, or platform as a service (PaaS), these services remove the need to configure or scale things such as databases or MapReduce, reducing your workload and maintenance burden. Additionally, PaaS providers can realize great efficiencies by hosting at the application level, and pass those savings on to the customer.

The general PaaS market is burgeoning, with major players including VMware (Cloud Foundry) and Salesforce (Heroku, force.com). As big data and machine learning requirements percolate through the industry, these players are likely to add their own big-data-specific services. For the purposes of this article, though, I will be sticking to the vendors who already have implemented big data solutions.

Today's primary providers of such big data platform services are Amazon, Google and Microsoft. You can see their offerings summarized in the table toward the end of this article. Both Amazon Web Services and Microsoft's Azure blur the lines between infrastructure as a service and platform: you can mix and match. By contrast, Google's philosophy is to skip the notion of a server altogether, and focus only on the concept of the application. Among these, only Amazon can lay claim to extensive experience with their product.

Amazon Web Services

Amazon has significant experience in hosting big data processing. Use of Amazon EC2 for Hadoop was a popular and natural move for many early adopters of big data, thanks to Amazon's expandable supply of compute power. Building on this, Amazon launched Elastic Map Reduce in 2009, providing a hosted, scalable Hadoop service.

Applications on Amazon's platform can pick from the best of both the IaaS and PaaS worlds. General purpose EC2 servers host applications that can then access the appropriate special purpose managed solutions provided by Amazon.

As well as Elastic Map Reduce, Amazon offers several other services relevant to big data, such as the Simple Queue Service for coordinating distributed computing, and a hosted relational database service. At the specialist end of big data, Amazon's High Performance Computing solutions are tuned for low-latency cluster computing, of the sort required by scientific and engineering applications.


Elastic Map Reduce

Elastic Map Reduce (EMR) can be programmed in the usual Hadoop ways, through Pig, Hive or other programming language, and uses Amazon's S3 storage service to get data in and out.

Access to Elastic Map Reduce is through Amazon's SDKs and tools, or with GUI analytical and IDE products such as those offered by Karmasphere. In conjunction with these tools, EMR represents a strong option for experimental and analytical work. Amazon's EMR pricing makes it a much more attractive option to use EMR, rather than configure EC2 instances yourself to run Hadoop.

When integrating Hadoop with applications generating structured data, using S3 as the main data source can be unwieldy. This is because, similar to Hadoop's HDFS, S3 works at the level of storing blobs of opaque data. Hadoop's answer to this is HBase, a NoSQL database that integrates with the rest of the Hadoop stack. Unfortunately, Amazon does not currently offer HBase with Elastic Map Reduce.

DynamoDB

Instead of HBase, Amazon provides DynamoDB, its own managed, scalable NoSQL database. As this a managed solution, it represents a better choice than running your own database on top of EC2, in terms of both performance and economy.

DynamoDB data can be exported to and imported from S3, providing interoperability with EMR.

Google

Google's cloud platform stands out as distinct from its competitors. Rather than offering virtualization, it provides an application container with defined APIs and services. Developers do not need to concern themselves with the concept of machines: applications execute in the cloud, getting access to as much processing power as they need, within defined resource usage limits.

To use Google's platform, you must work within the constraints of its APIs. However, if that fits, you can reap the benefits of the security, tuning and performance improvements inherent to the way Google develops all its services.

AppEngine, Google's cloud application hosting service, offers a MapReduce facility for parallel computation over data, but this is more of a feature for use as part of complex applications rather than for analytical purposes. Instead, BigQuery and the Prediction API form the core of Google's big data offering, respectively offering analysis and machine learning facilities. Both these services are available exclusively via REST APIs, consistent with Google's vision for web-based computing.

BigQuery

BigQuery is an analytical database, suitable for interactive analysis over datasets of the order of 1TB. It works best on a small number of tables with a large number of rows. BigQuery offers a familiar SQL interface to its data. In that, it is comparable to Apache Hive, but the typical performance is faster, making BigQuery a good choice for exploratory data analysis.

Getting data into BigQuery is a matter of directly uploading it, or importing it from Google's Cloud Storage system. This is the aspect of BigQuery with the biggest room for improvement. Whereas Amazon's S3 lets you mail in disks for import, Google doesn't currently have this facility. Streaming data into BigQuery isn't viable either, so regular imports are required for constantly updating data. Finally, as BigQuery only accepts data formatted as comma-separated value (CSV) files, you will need to use external methods to clean up the data beforehand.

Rather than provide end-user interfaces itself, Google wants an ecosystem to grow around BigQuery, with vendors incorporating it into their products, in the same way Elastic Map Reduce has acquired tool integration. Currently in beta test, to which anybody can apply, BigQuery is expected to be publicly available during 2012.

Prediction API

Many uses of machine learning are well defined, such as classification, sentiment analysis, or recommendation generation. To meet these needs, Google offers its Prediction API product.

Applications using the Prediction API work by creating and training a model hosted within Google's system. Once trained, this model can be used to make predictions, such as spam detection. Google is working on allowing these models to be shared, optionally with a fee. This will let you take advantage of previously trained models, which in many cases will save you time and expertise with training.

Though promising, Google's offerings are in their early days. Further integration between its services is required, as well as time for ecosystem development to make their tools more approachable.

Microsoft

I have written in some detail about Microsoft's big data strategy in Microsoft's plan for Hadoop and big data. By offering its data platforms on Windows Azure in addition to Windows Server, Microsoft's aim is to make either on-premise or cloud-based deployments equally viable with its technology. Azure parallels Amazon's web service offerings in many ways, offering a mix of IaaS services with managed applications such as SQL Server.

Hadoop is the central pillar of Microsoft's big data approach, surrounded by the ecosystem of its own database and business intelligence tools. For organizations already invested in the Microsoft platform, Azure will represent the smoothest route for integrating big data into the operation. Azure itself is pragmatic about language choice, supporting technologies such as Java, PHP and Node.js in addition to Microsoft's own.

As with Google's BigQuery, Microsoft's Hadoop solution is currently in closed beta test, and is expected to be generally available sometime in the middle of 2012.

Big data cloud platforms compared

The following table summarizes the data storage and analysis capabilities of Amazon, Google and Microsoft's cloud platforms. Intentionally excluded are IaaS solutions without dedicated big data offerings.




































































  Amazon Google Microsoft Product(s) Amazon Web Services Google Cloud Services Windows Azure Big data storage S3 Cloud Storage HDFS on Azure Working storage Elastic Block Store AppEngine (Datastore, Blobstore) Blob, table, queues NoSQL database DynamoDB1 AppEngine Datastore Table storage Relational database Relational Database Service (MySQL or Oracle) Cloud SQL (MySQL compatible) SQL Azure Application hosting EC2 AppEngine Azure Compute Map/Reduce service Elastic MapReduce (Hadoop) AppEngine (limited capacity) Hadoop on Azure2 Big data analytics Elastic MapReduce (Hadoop interface3) BigQuery2 (TB-scale, SQL interface) Hadoop on Azure (Hadoop interface3) Machine learning Via Hadoop + Mahout on EMR or EC2 Prediction API Mahout with Hadoop Streaming processing Nothing prepackaged: use custom solution on EC2 Prospective Search API 4 StreamInsight2 ("Project Austin") Data import Network, physically ship drives Network Network Data sources Public Data Sets A few sample datasets Windows Azure Marketplace Availability Public production Some services in private beta Some services in private beta

Conclusion

Cloud-based big data services offer considerable advantages in removing the overhead of configuring and tuning your own clusters, and in ensuring you pay only for what you use. The biggest issue is always going to be data locality, as it is slow and expensive to ship data. The most effective big data cloud solutions will be the ones where the data is also collected in the cloud. This is an incentive to investigate EC2, Azure or AppEngine as a primary application platform, and an indicator that PaaS competitors such as Cloud Foundry and Heroku will have to address big data as a priority.

It is early days yet for big data in the cloud, with only Amazon offering battle-tested solutions at this point. Cloud services themselves are at an early stage, and we will see both increasing standardization and innovation over the next two years.

However, the twin advantages of not having to worry about infrastructure and economies of scale mean it is well worth investigating cloud services for your big data needs, especially for an experimental or green-field project. Looking to the future, there's no doubt that big data analytical capability will form an essential component of utility computing solutions.

Notes:

1 In public beta.

2 In controlled beta test.

3 Hive and Pig compatible.

4 Experimental status.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Related:

January 26 2012

Strata Week: Genome research kicks up a lot of data

Here are a few of the data stories that caught my attention this week.

Genomics data and the cloud

Bootstrap DNA by Charles Jencks, 2003 by mira66, on FlickrGigaOm's Derrick Harris explores some of the big data obstacles and opportunities surrounding genome research. He notes that:

When the Human Genome Project successfully concluded in 2003, it had taken 13 years to complete its goal of fully sequencing the human genome. Earlier this month, two firms — Life Technologies and Illumina — announced instruments that can do the same thing in a day, one for only $1,000. That's likely going to mean a lot of data.

But as Harris observes, the promise of quick and cheap genomics is leading to other problems, particularly as the data reaches a heady scale. A fully sequenced human genome is about 100GB of raw data. But citing DNAnexus founder Andreas Sundquist, Harris says that:

... volume increases to about 1TB by the time the genome has been analyzed. He [Sundquist] also says we're on pace to have 1 million genomes sequenced within the next two years. If that holds true, there will be approximately 1 million terabytes (or 1,000 petabytes, or 1 exabyte) of genome data floating around by 2014.

That makes the promise of a $1,000 genome sequencing service challenging when it comes to storing and processing petabytes of data. Harris posits that it will be cloud computing to the rescue here, providing the necessary infrastructure to handle all that data.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Stanley Fish versus the digital humanities

Literary critic and New York Times opinionator Stanley Fish has been on a bit of a rampage in recent weeks, taking on the growing field of the "digital humanities." Prior to the annual Modern Language Association meeting, Fish cautioned that alongside the traditional panels and papers on Ezra Pound and William Shakespeare and the like, there were going to be a flood of sessions devoted to:

...'the digital humanities,' an umbrella term for new and fast-moving developments across a range of topics: the organization and administration of libraries, the rethinking of peer review, the study of social networks, the expansion of digital archives, the refining of search engines, the production of scholarly editions, the restructuring of undergraduate instruction, the transformation of scholarly publishing, the re-conception of the doctoral dissertation, the teaching of foreign languages, the proliferation of online journals, the redefinition of what it means to be a text, the changing face of tenure — in short, everything.

That "everything" was narrowed down substantially in Fish's editorial this week, in which he blasted the digital humanities for what he sees as its fixation "with matters of statistical frequency and pattern." In other words: data and computational analysis.

According to Fish, the problem with digital humanities is that this new scholarship relies heavily on the machine — and not the literary critic — for interpretation. Fish contends that digital humanities scholars are all teams of statisticians and positivists, busily digitizing texts so they can data-mine them and systematically and programmatically uncover something of interest — something worthy of interpretation.

University of Illinois, Urbana-Champaign English professor Ted Underwood argues that Fish not only mischaracterizes what digital humanities scholars do, but he misrepresents how his own interpretive tradition works:

... by pretending that the act of interpretation is wholly contained in a single encounter with evidence. On his account, we normally begin with a hypothesis (which seems to have sprung, like Sin, fully-formed from our head), and test it against a single sentence.

One of the most interesting responses to Fish's recent rants about the humanities' digital turn comes from University of North Carolina English professor Daniel Anderson, who demonstrates in the following video a far fuller picture of what "digital" "data" — creation and interpretation — looks like:

Hadoop World merges with O'Reilly's Strata New York conference

Two of the big data events announced they'll be merging this week: Hadoop World will now be part of the Strata Conference in New York this fall.

[Disclosure: The Strata events are run by O'Reilly Media.]

Cloudera first started Hadoop World back in 2009, and as Hadoop itself has seen increasing adoption, Hadoop World, too, has become more popular. Strata is a newer event — its first conference was held in Santa Clara, Calif., in February 2011, and it expanded to New York in September 2011.

With the merger, Hadoop World will be a featured program at Strata New York 2012 (Oct. 23-25).

In other Hadoop-related news this week, Strata chair Edd Dumbill took a close look at Microsoft's Hadoop strategy. Although it might be surprising that Microsoft has opted to adopt an open source technology as the core of its big data plans, Dumbill argues that:

Hadoop, by its sheer popularity, has become the de facto standard for distributed data crunching. By embracing Hadoop, Microsoft allows its customers to access the rapidly-growing Hadoop ecosystem and take advantage of a growing talent pool of Hadoop-savvy developers.

Also, Cloudera data scientist Josh Willis takes a closer look at one aspect of that ecosystem: the work of scientists whose research falls outside of statistics and machine learning. His blog post specifically addresses one use case for Hadoop — seismology, for which there is now Seismic Hadoop — but the post also provides a broad look at what constitutes the practice of data science.

Got data news?

Feel free to email me.

Photo: Bootstrap DNA by Charles Jencks, 2003 by mira66, on Flickr

Related:

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl