The Data Day, A few days: November 10-23, 2015

Cloudera herds Impala and Kudu to the ASF. And more

And that’s the data day, today.

Our 2013 Database survey is now live

451 Research’s 2013 Database survey is now live at http://bit.ly/451db13 investigating the current use of database technologies, including MySQL, NoSQL and NewSQL, as well as traditional relation and non-relational databases.

The aim of this survey is to identify trends in database usage, as well as changing attitudes to MySQL following its acquisition by Oracle, and the competitive dynamic between MySQL and other databases, including NoSQL and NewSQL technologies.

There are just 15 questions to answer, spread over five pages, and the entire survey should take less than ten minutes to complete.

All individual responses are of course confidential. The results will be published as part of a major research report due during Q2.

The full report will be available to 451 Research clients, while the results of the survey will also be made freely available via a
presentation at the Percona Live MySQL Conference and Expo in April.

Last year’s results have been viewed nearly 55,000 times on SlideShare so we are hoping for a good response to this year’s survey.

One of the most interesting aspects of a 2012 survey results was the extent to which MySQL users were testing and adopting PostgreSQL. Will that trend continue or accelerate in 2013? And what of the adoption of cloud-based database services such as Amazon RDS and Google Cloud SQL?

Are the new breed of NewSQL vendors having any impact on the relational database incumbents such as Oracle, Microsoft and IBM? And how is SAP HANA adoption driving interest in other in-memory databases such as VoltDB and MemSQL?

We will also be interested to see how well NoSQL databases fair in this year’s survey results. Last year MongoDB was the most popular, followed by Apache Cassandra/DataStax and Redis. Are these now making a bigger impact on the wider market, and what of Basho’s Riak, CouchDB, Neo4j, Couchbase et al?

Additionally, we have been tracking attitudes to Oracle’s ownership of MySQL since the deal to acquire Sun was announced. Have MySQL users’ attitudes towards Oracle improved or declined in the last 12 months, and what impact will the formation of the MariaDB Foundation have on MariaDB adoption?

We’re looking forward to analyzing the results and providing answers to these and other questions. Please help us to get the most representative result set by taking part in the survey at http://bit.ly/451db13

The geographic distribution of NoSQL skills – just one more thing

Hidden away amongst the details of our little tour around LinkedIn statistics on NoSQL and Hadoop skills was some interesting information on how many LinkedIn members list the various data management technologies in our sample in their profiles.

Our original post contained the fact that there were 9,079 LinkedIn members with “Hadoop” in their member profiles, for example, compared to 366,084 with “MySQL” in their member profiles.

Later posts showed there were 170 with “Membase” and 1,687 with “HBase”, 787 with “Apache Cassandra” and 376 with “Riak”, 6,048 with “MongoDB” and 2,152 with “Redis”, and finally, 1,844 with “CouchDB” and 268 with “Neo4j”.

This gives us an interesting perspective on the relative adoption of the various NoSQL databases:

If it wasn’t already obvious from the list above, the chart illustrates just how much more prevalent MongoDB skills are compared to the other NoSQL databases, followed by Redis, Apache CouchDB, Apache HBase and Apache Cassandra. The chart also illustrates that while HBase is the second most prevalent NoSQL skill set in the USA, it is only fourth overall given its lower prevalence in the rest of the world.

In response, a representative from a certain vendor notes “Some skills are more valued not because they are more prevalent, but because they are harder to achieve.” Make of that what you will.

The geographic distribution of NoSQL skills: Apache Cassandra and Riak

Following last week’s post putting the geographic distribution of Hadoop skills, based on a search of LinkedIn members, in context, this week we will be publishing a series of posts looking in detail at the various NoSQL projects.

The posts examine the geographic spread of LinkedIn members citing a specific NoSQL database in their member profiles, as of December 1, and provides an interesting illustration of the state of adoption for each.

Following yesterday’s look at Membase and HBase, part two examines the geographic spread of Apache Cassandra and Basho Technologies’ Riak.

The statistics showed that 52.2% of the 787 LinkedIn members with “Apache Cassandra” in their member profiles are based in the US (as previously explained, we had to use the ‘Apache’ qualifier with Cassandra to filer out people with the name Cassandra).

A significant proportion (18.0%) of those are in the Bay area, although fewer than Hadoop, Membase and HBase. The results also indicate that Canada is a hot-spot for Apache Cassandra skills, with 4.1%, while Apache Cassandra is also making in-roads into Europe via France and Spain.

Basho’s Riak is less dependent on the USA for adoption. The statistics showed that less than half – 45.5% – of the 376 LinkedIn members with “Riak” in their member profiles are based in the US, with only 13.0% in the Bay area.

Riak hot-spots include the UK (6.9%) and Australia (4.3%). as well as the Boston area, in keeping with the company’s HQ.

The series will continue later this week with MongoDB, CouchDB, Neo4j, and Redis.

N.B. The size of the boxes is in proportion to the search result (click each image for a larger version). World map image: Owen Blacker

The geographic distribution of Hadoop skills: in context

NC State University’s Institute for Advanced Analytics recently published some interesting statistics on Apache Hadoop adoption based on a search of LinkedIn data.

The statistics graphically illustrate what a lot of people wer already pretty sure of: that the geographic distribution of Hadoop skills (and presumably therefore adoption) is heavily weighted in favour of the USA, and in particular the San Francisco Bay Area.

The statistics showed that 64% of the 9,079 LinkedIn members with “Hadoop” in their member profiles (by no means perfect but an insightful measure nonetheless) are based in the US, and that the vast majority of those are in the Bay Area.

The results are what we would expect to see given the relative level of immaturity of Apache Hadoop adoption, as well as the nature and location of the early Hadoop adopters and Hadoop-related vendors.

The results got me thinking two things:
– how does the geographic spread compare to a more maturely adopted project?
– how does it compare to the various NoSQL projects?

So I did some searching of LinkedIn to find out.

To answer the first question I performed the same search for MySQL, as an example of a mature, widely-adopted open source project.

The results show that just 32% of the 366,084 LinkedIn members with “MySQL” in their member profiles are based in the US (precisely half that of Hadoop) while only 4.4% are in the Bay area, compared to 28.2% of the 9,079 LinkedIn members with “Hadoop” in their member profiles.

The charts below illustrate the difference in geographic distribution between Hadoop and MySQL. The size of the boxes is in proportion to the search result (click each image for a larger version).

With regards to the second question, I also ran searches for MongoDB, Riak, CouchDB, Apache Cassandra*, Membase*, Neo4j, Hbase, and Redis.

I’ll be posting the results for each of those over the next week or so, but in the meantime, the graphic below shows the split between the USA and Rest of the World (ROW) for all ten projects.

It illustrates, as I suspected, that the distribution of skills for NoSQL databases is more geographically disperse than for Hadoop.

I have some theories as to why that is – but I’d love to hear anyone else’s take on the results.

*I had to use the ‘Apache’ qualifier with Cassandra to filer out anyone called Cassandra, while Membase returned a more statistically relevant result than Couchbase.

World map image: Owen Blacker

Forthcoming webinar: Real Enterprise NoSQL Applications

On Wednesday, December 7, 2011 at 10am PT (6pm GMT) I’ll be taking part in a webinar with DataStax CTO and Apache Cassandra project chair Jonathan Ellis on the subject of Apache Cassandra: Real NoSQL Applications in the Enterprise Today.

The session will shed light on real-world use cases for NoSQL databases by providing case studies from enterprise production users taking advantage of the massively scalable and highly-available architecture of Apache Cassandra.

I’ll be summarising some of the findings from our NoSQL, NewSQL and Beyond research report, and exploring the drivers behind the development and adoption of NoSQL databases – explaining how the failure of existing suppliers to meet the performance, scalability and flexibility needs of large-scale data processing has led to the development and adoption of alternative data management technologies.

Jonathan will provide more detail on Apache Cassandra and DataStax, including a number of real-world projects including Netflix, Backupify, Ooyala and Constant Contact.

You can register for the event here and find more details about our NoSQL, NewSQL and Beyond research report here.