Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Excerpt Include
Collaboration
Collaboration
nopaneltrue

Panel
borderWidth0

CCSQ D&A HOME | Communications Calls | CCSQ Data & Analytics Data Camp | QNET Analytics Distro List

Anchor
Return to Top
Return to Top


Excerpt Include
Requesting & Getting Access
Requesting & Getting Access
nopaneltrue

Panel
borderWidth0

Livesearch
spaceKeyDATA
placeholderSearch CCSQ Data & Analytics site



CCSQ Data & Analytics Communications Call
DateFebruary 24, 2022 at 1:00pm ET
Recording


Agenda

  • General Announcements
    • User Collaboration Wiki - Demo
    • CDR HDFS Migration
  • System Updates
    • File Lock Issues
    • Lagging Issues
    • GitHub Issue
    • Maintenance Schedule
  • Data Source and Usage Updates
  • Best Practices Demo
    • Partition Pruning
    • Optimizing Joins
    • Breaking up Large Queries
  • Q&A  
General Announcements 
User Collaboration Wiki - New! 

What is the User Collaboration Wiki?

  • A collaboration forum through which you can submit your solutions and best practices for others within the CAP & CDR user community.

How can you use the User Collaboration Wiki?

  • Submit best practices or how-to articles
  • Read through other user submissions, providing feedback, comments, or likes
  • Leverage solutions from other users to enhance your analytic work
  • Articles that have been reviewed and approved by D&A SMEs will be posted out on the Knowledge Base

Important Notes

  • The forum is currently available to those who have HARP Confluence/Atlassian licenses. Either request a license via HARP, or work with your COR to see if an Atlassian licenses can be granted for your organization.
  • After the demo, we will open the mic for any feedback or suggestions. This is an MVP release, so we are open to your feedback to ensure this space is designed for your benefit.
  • Questions from users should continue to be posted to the #ccsq_data_analytics Slack channel
CDR HDFS Migration 
  • Data & Analytics Team will start reaching out to CDR users to migrate legacy hive data sets from HDFS to S3 in phases
  • After migration, Data & Analytics Team will provide guidance and documentation on storing future data sets in S3 instead
  • Benefits:
    • Frees up disk space within CDR Cluster
    • S3 storage provides unlimited capacity, better persistence, and lifecycle capabilities
    • Supports future transformations of CDR Architecture to EMR
  • Dependency: Data & Analytics will depend on teams for coordination and validation of migrated data sets
System Updates
SAS Viya Issues
File Lock Issues
  • Issue Description: SAS Viya users have reported file lock errors when reading data from their workbenches during SAS job runs
  • What to Expect: The team has found a solution to this, which requires re-mounting storage gateway workbenches. In order to mitigate the impact to production workflows, D&A team is coordinating with each affected organization to re-mount their workbenches at the most convenient time
  • Affected Communities: Select SAS Viya Users with affected workbenches
  • Call to Action: If you are facing this issue, please open a ServiceNow ticket for our team to remount your workbench
Lagging Issues
  • SAS Viya users have reported intermittent lagging and freezing while utilizing the application.
  • Due to the issue's intermittent nature, some users have reported that their issue has gone away by the time that a help desk ticket has been created.
  • In order to capture the details for the D&A Team to investigate, please follow these instructions found within the Known Issues Log
  • After following these instructions, please provide the information via a helpdesk ticket
Github Issue
  • Issue Description: SAS Viya users have reported issues accessing data within Hive due to issues with two of the environment’s worker nodes.
  • What to Expect: The team continues to work with the SAS Vendor to obtain a patch that will resolve the issue. More updates will be shared once more information about the patch is available.
  • Affected Communities: SAS Viya Users
  • Call to Action: A new workaround has been published on this issue. Please follow these instructions found within the Known Issues Log
Maintenance Schedule
  • Updating dates for scheduled CAP & CDR Maintenance events: 
    • March 4
    • April 1
    • May 6
    • June 3
  • All events will begin at 8:00 pm ET and end approximately at 11:00pm ET. A communication will be sent out once maintenance is complete. As a reminder, whenever there is maintenance on the environment, you will need to make sure all of your code and table changes are saved.

Data Source and Usage Updates

Known Issues

  • BIC MBI Sequence - 8/25/2021
    • Ongoing – TBD
  • QMARS Appeals - 6/18/2021
    • Ongoing – TBD
    • Consider using healthcare_service_qmars_ng if your DUA supports it
  • QMARS - 1/12/2021
    • Ongoing - TBD
    • Consider using healthcare_service_qmars_ng if your DUA supports it
Best Practices Demo

Learning Objectives:

  • Partition Pruning
  • Optimizing Joins
  • Breaking Up Large Queries 
Partition Pruning
  • Partition pruning is the mechanism where a query can skip reading data files that correspond to one or more partitions
    • Critical to reducing amount of data scanned in hive query for optimization
  • Reference the recording for an example
  • Best Practice: Always add lower and upper bounds for partitions in query criteria
  • Hint: You can check whether you’re applying proper partition pruning by viewing execution plan of your hive query with %hive_explain(<sql>)
  • When viewing the execution plan, there are a few thing to watch out for: 
  • Check to make sure you are using partition keys in your query  
  • Large number of rows
  • Reference the powerpoint for more examples 
Optimizing Joins 
Q&AThe questions are currently being answered by the team and will be posted shortly. 


Excerpt Include
CCSQ Data & Analytics
CCSQ Data & Analytics
nopaneltrue