How to Write an Explicit Hive Queries and Explicit SQL Passthrough
The purpose of this article is to show the difference between an explicit and implicit Hive query. Writing an EXPLICIT Hive query will yield a FASTER run time for a query.
Explicit vs. Implicit Hive Queries
Implicit query is not recommended because:
- SAS will do the interpretation.
- Not all SAS procedures and functions are supported.
Explicit query is HIGHLY RECOMMENDED because:
- Faster performance time.
- Utilizes Hive rather than SAS Syntax.
%let sql = %str(CREATE TABLE &myschema.va_partd_2 as SELECT * FROM csat_201908_baseline.csat_dbase_201908_vi WHERE '2019-10-01' <= to_date(rx_dos_id) and to_date(rx_dos_id) <= '2019-06-30'); %hive_exec_sql(&sql);
proc sql; create table lib_hive.va_partd as select * from csatbase.csat_dbase_201908_vi where '01Jul2018'd<=datepart(rx_dos_id)<='30Jun2019'd; quit;
It took 30 seconds to run the Explicit query vs. 250 seconds for the Implicit query.
Utilizing Explicit SQL Passthrough
Purpose
This process ensures that code is submitted directly to the database and does not go through the SAS engine. It allows the use of Hive-specific syntax that is not part of SAS SQL. It also ensures all processing is done in the database and only the results are downloaded in SAS.
There have also been instances of SAS procedures and explicit SQL passthrough returning different results. These cases seem to be bugs but are avoided by using explicit SQL passthrough, which has provided correct results in the instances encountered.
Best Practice
There are two methods to submit explicit SQL code:
SELECT:
- Use this syntax if Hive code will return results. This could be rows of data or system generated information
- CREATE TABLE can be specified to save the results to a SAS database
- If only SELECT statement is used, results will be printed in SAS results
PROC SQL; CONNECT TO HADOOP ([connection details]); CREATE TABLE LIB1.DATASET1 AS SELECT * FROM CONNECTION TO HADOOP ( [query text] ); DISCONNECT FROM HADOOP; QUIT;
EXECUTE:
- Use this syntax if the Hive code will not return results
- This is used when creating or dropping Hive Tables
PROC SQL; CONNECT TO HADOOP ([connection details]); EXECUTE( [query text] ) BY HADOOP; QUIT;
Related articles