The Data Day: January 25, 2019

The meaning and importance of #automation in data-centric environments. And more.

And that’s the Data Day, today.

The Data Day: October 24, 2018

Blockchain in financial services. And more.

And that’s the Data Day, today.

The Data Day, A few days: April 16-22, 2016

MemSQL raises $36m series C. And more

And that’s the data day, today.

The Data Day, A few days: March 19-April 1, 2016

Funding for Domo, Cockroach Labs, MapD and data Artisans. And more

And that’s the data day, today.

The Data Day, A few days: March 21-30, 2015

Apple acquires FoundationDB (and previously Acunu). And more.

And that’s the data day, today.

The Data Day, A few days: March 7-13, 2015

Looker raises $30m, Elasticsearch becomes Elastic. And more

And that’s the data day, today.

The Data Day, A few days: November 27-December 5 2014

OpenText to acquire Actuate for $272m

And that’s the data day, today.

The Data Day, A few days: March 22-April 4 2014

Cloudera raises $900m. Pivotal launches Big Data Suite. And more.

And that’s the data day, today.

7 Hadoop questions. Q5: SQL in Hadoop, SQL on Hadoop, or SQL and Hadoop?

What is your preferred approach to integrating SQL and Hadoop? Until recently that was a straight shoot-out between Hive and Pig, but in 2013 the options for making use of existing SQL skills to analyze data in Hadoop have increased dramatically. That’s why the choice of approach to SQL in/on/and Hadoop is one of the primary questions being asked in the 451 Research 2013 Hadoop survey.


I write in/on/and as I believe that is a good way of understanding the various approaches and how they compare at this point.

SQL in Hadoop
Hive’s classic approach of converting SQL queries into MapReduce jobs falls into this category, but lacks the performance that some users are looking for to enable more interactive analysis. Hortonworks has started the Stinger Initiative to align HiveQL more closely with standard SQL, optimize Hive’s query execution plans and introduce a new columnar file format for storing Hive data.

SQL on Hadoop
Rather than attempting to improve the performance of SQL-via-MapReduce, several efforts are underway to create a SQL engine that enables native SQL-based processing of data in HDFS while avoiding MapReduce. Key efforts include Cloudera’s Impala project and Cloudera Enterprise RTQ product, the MapR-initiated Apache Drill project, Pivotal’s HAWQ and JethroData. IBM’s Big SQL also appears to fit into this category.

SQL and Hadoop
Co-location of relational database technologies and Hadoop enables data to be processed in each platform, using SQL in the RDBMS and MapReduce in HDFS. Hadapt pioneered this approach, while RainStor launched RainStor Big Data Analytics on Hadoop in early 2012, combining its column-based database software, and Microsoft has been previewing PolyBase, which will offer the ability to join tables from SQL Server PDW with data from HDFS to return a combined result. SQL and Hadoop is a broader category in which we would also include Citus Data, which takes advantage of PostgreSQL’s foreign data wrapper technology to query data in HDFS via the local query execution, as well as Teradata’s SQL-H, which enables SQL analysts to invoke MapReduce and SQL-MapReduce jobs against Hadoop from Teradata’s databases. We would absolutely concede that there are distinct differences between the approaches in this category.


It is naturally early stages for most of these approaches given that most of them only appeared in 2013 and some are still in development and testing. So far the responses to our Hadoop survey suggest higher levels of interest in Cloudera Impala, Cloudera RTQ, and Apache Drill, followed by IBM Big SQL, Hadapt and Pivotal HAWQ

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

The Data Day, A few days: March 29-April 8 2013

Tableau preps IPO. Funding for SiSense and Deep. And more.

* For 451 Research clients: Citus Data brings SQL to foreign data environments, starting with Hadoop

* For 451 Research clients: With $20m in series B funding in the bag, Platfora makes its Hadoop-based analysis debut

* Tableau Software Files Registration Statement For Proposed Initial Public Offering.

* SiSense raises $10m series B funding.

* Deep Information Sciences scores $10M for its general-purpose database.

* IBM launches BLU Acceleration, PureData System for Hadoop.

* SAP: Is HANA growth overstated?

* VMWare announces the launch of Serengeti 0.8.0.

* MySQL and the forks in the road.