Page tree



Back to Knowledge Base

How to Write an Explicit Hive Queries and Explicit SQL Passthrough 

The purpose of this article is to show the difference between an explicit and implicit Hive query. Writing an EXPLICIT Hive query will yield a FASTER run time for a query. 

Explicit vs. Implicit Hive Queries

Implicit query is not recommended because:

  1. SAS will do the interpretation.
  2. Not all SAS procedures and functions are supported.

Explicit query is HIGHLY RECOMMENDED because:

  1. Faster performance time.
  2. Utilizes Hive rather than SAS Syntax.
Explicit Code
%let sql = %str(CREATE TABLE &myschema.va_partd_2 as 
SELECT * FROM csat_201908_baseline.csat_dbase_201908_vi
WHERE '2019-10-01' <= to_date(rx_dos_id) and to_date(rx_dos_id) <= '2019-06-30');
%hive_exec_sql(&sql);
Implicit Code
proc sql;
create table lib_hive.va_partd as
select *
from csatbase.csat_dbase_201908_vi
where '01Jul2018'd<=datepart(rx_dos_id)<='30Jun2019'd;
quit;

It took 30 seconds to run the Explicit query vs. 250 seconds for the Implicit query.

Utilizing Explicit SQL Passthrough

Purpose

This process ensures that code is submitted directly to the database and does not go through the SAS engine. It allows the use of Hive-specific syntax that is not part of SAS SQL. It also ensures all processing is done in the database and only the results are downloaded in SAS. 

There have also been instances of SAS procedures and explicit SQL passthrough returning different results. These cases seem to be bugs but are avoided by using explicit SQL passthrough, which has provided correct results in the instances encountered. 

Best Practice

There are two methods to submit explicit SQL code: 

SELECT: 

  • Use this syntax if Hive code will return results. This could be rows of data or system generated information 
  • CREATE TABLE can be specified to save the results to a SAS database 
  • If only SELECT statement is used, results will be printed in SAS results 
PROC SQL;
CONNECT TO HADOOP ([connection details]);
CREATE TABLE LIB1.DATASET1 AS
SELECT * FROM CONNECTION TO HADOOP ( [query text] ); 
DISCONNECT FROM HADOOP;
QUIT; 

EXECUTE:

  • Use this syntax if the Hive code will not return results 
  • This is used when creating or dropping Hive Tables 
PROC SQL;
CONNECT TO HADOOP ([connection details]);
EXECUTE( [query text] ) BY HADOOP;
QUIT;