Back home

Other solutions currently critisized for being slow

Currently considered faster:

* Apache Drill like imapla to provide sql queries on Hadoop and Hbase Drill supports standard ANSI SQL:2003 Correlated subqueries, analytic functions Use any SQL-based tool with Apache Drill Standard ODBC and JDBC drivers

Largely driven by team members at MapR, Apache Drill is different from other Big Data SQL engines. Instead of working with schema-based Hive-formatted tables, along with some support for HBase data, Drill features a plug-in capable engine that at present can query schema-less files, JSON, Hive, HBase and even MongoDB data. And it can reach files stored locally, in HDFS, and cloud storage systems from Amazon, Microsoft and Google.

The key takeaway is that Drill is designed to be a distributed SQL query engine for pretty much everything, and Spark is a general computation engine which offers some limited SQL capabilities. If you are considering Spark only for SparkSQL my suggestion is to reconsider and move in the direction of Apache Drill.

Apache Oozie

Oozie v3 is a server based Bundle Engine that provides a higher-level oozie abstraction that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun a set coordinator jobs in the bundle level resulting a better and easy operational control.

MapR

Dev Install On Windows and other platforms MapR can be installed with docker

docker run maprtech/mapr-sandbox --storage-opt dm.basesize=30G --storage-opt dm.loopdatasize=200G

Cleanup if neccessary you can remove files at /var/lib/docker/devicemapper/devicemapper Warning: This will delete your existing images and containers.)

Clients

  • Apache Hue
  • DbBeaver Drill
 
query_libraries.txt · Last modified: 2017/02/10 04:45 by root
 
RSS - 200 © CrosswireDigitialMedia Ltd