You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Error rendering macro 'excerpt-include'

No link could be created for 'Collaboration'.


Error rendering macro 'excerpt-include'

No link could be created for 'Requesting & Getting Access'.

CCSQ Data & Analytics Communications Call
DateFebruary 24, 2022 at 1:00pm ET
Recording

Agenda

  • General Announcements
    • User Collaboration Wiki - Demo
    • CDR HDFS Migration
  • System Updates
    • File Lock Issues
    • Lagging Issues
    • GitHub Issue
    • Maintenance Schedule
  • Data Source and Usage Updates
  • Best Practices Demo
    • Partition Pruning
    • Optimizing Joins
    • Breaking up Large Queries
  • Q&A  
General Announcements 
User Collaboration Wiki - New! 

What is the User Collaboration Wiki?

  • A collaboration forum through which you can submit your solutions and best practices for others within the CAP & CDR user community.

How can you use the User Collaboration Wiki?

  • Submit best practices or how-to articles
  • Read through other user submissions, providing feedback, comments, or likes
  • Leverage solutions from other users to enhance your analytic work
  • Articles that have been reviewed and approved by D&A SMEs will be posted out on the Knowledge Base

Important Notes

  • The forum is currently available to those who have HARP Confluence/Atlassian licenses. Either request a license via HARP, or work with your COR to see if an Atlassian licenses can be granted for your organization.
  • After the demo, we will open the mic for any feedback or suggestions. This is an MVP release, so we are open to your feedback to ensure this space is designed for your benefit.
  • Questions from users should continue to be posted to the #ccsq_data_analytics Slack channel
CDR HDFS Migration 
  • Data & Analytics Team will start reaching out to CDR users to migrate legacy hive data sets from HDFS to S3 in phases
  • After migration, Data & Analytics Team will provide guidance and documentation on storing future data sets in S3 instead
  • Benefits:
    • Frees up disk space within CDR Cluster
    • S3 storage provides unlimited capacity, better persistence, and lifecycle capabilities
    • Supports future transformations of CDR Architecture to EMR
  • Dependency: Data & Analytics will depend on teams for coordination and validation of migrated data sets
System Updates
SAS Viya Issues
File Lock Issues
  • Issue Description: SAS Viya users have reported file lock errors when reading data from their workbenches during SAS job runs
  • What to Expect: The team has found a solution to this, which requires re-mounting storage gateway workbenches. In order to mitigate the impact to production workflows, D&A team is coordinating with each affected organization to re-mount their workbenches at the most convenient time
  • Affected Communities: Select SAS Viya Users with affected workbenches
  • Call to Action: If you are facing this issue, please open a ServiceNow ticket for our team to remount your workbench
Lagging Issues
  • SAS Viya users have reported intermittent lagging and freezing while utilizing the application.
  • Due to the issue's intermittent nature, some users have reported that their issue has gone away by the time that a help desk ticket has been created.
  • In order to capture the details for the D&A Team to investigate, please follow these instructions found within the Known Issues Log
  • After following these instructions, please provide the information via a helpdesk ticket
Github Issue
  • Issue Description: SAS Viya users have reported issues accessing data within Hive due to issues with two of the environment’s worker nodes.
  • What to Expect: The team continues to work with the SAS Vendor to obtain a patch that will resolve the issue. More updates will be shared once more information about the patch is available.
  • Affected Communities: SAS Viya Users
  • Call to Action: A new workaround has been published on this issue. Please follow these instructions found within the Known Issues Log
Maintenance Schedule
  • Updating dates for scheduled CAP & CDR Maintenance events: 
    • March 4
    • April 1
    • May 6
    • June 3
  • All events will begin at 8:00 pm ET and end approximately at 11:00pm ET. A communication will be sent out once maintenance is complete. As a reminder, whenever there is maintenance on the environment, you will need to make sure all of your code and table changes are saved.

Data Source and Usage Updates
Known Issues
  • BIC MBI Sequence - 8/25/2021
    • Ongoing – TBD
  • QMARS Appeals - 6/18/2021
    • Ongoing – TBD
    • Consider using healthcare_service_qmars_ng if your DUA supports it
  • QMARS - 1/12/2021
    • Ongoing - TBD
    • Consider using healthcare_service_qmars_ng if your DUA supports it
Best Practices Demo
Learning Objectives:
  • Partition Pruning
  • Optimizing Joins
  • Breaking Up Large Queries 
Partition Pruning
  • Partition pruning is the mechanism where a query can skip reading data files that correspond to one or more partitions
    • Critical to reducing amount of data scanned in hive query for optimization
  • Reference the recording for an example
  • Best Practice: Always add lower and upper bounds for partitions in query criteria
  • Hint: You can check whether you’re applying proper partition pruning by viewing execution plan of your hive query with %hive_explain(<sql>)
  • When viewing the execution plan, there are a few thing to watch out for: 
  • Check to make sure you are using partition keys in your query  
  • Large number of rows
  • Reference the powerpoint for more examples 
Optimizing Joins 
  • Optimizing JOINs in hive queries can significantly reduce cluster resource utilization and improve query performance
  • Strategies include:
    • Always include JOIN conditions
    • Choosing the proper JOIN type
  • INNER JOIN: The INNER JOIN keyword selects all rows from both the tables as long as the condition satisfies.
  • LEFT JOIN: This join returns all the rows of the table on the left side of the join and matching rows for the table on the right side of join.
  • RIGHT JOIN: RIGHT JOIN is similar to LEFT JOIN. This join returns all the rows of the table on the right side of the join and matching rows for the table on the left side of join
  • Best Practice: Use LEFT JOIN and RIGHT JOIN deliberately

  • Reference the recording for examples 
Break Up Large Queries
  • Break up queries spanning large time periods into smaller time windows and merge them later
  • Benefits:
    • Reduces the total amount of data scanned in single query
    • Reduces use of cluster resources in single query
    • De-risks your work and allows you to restart where jobs may have stopped
  • Reference the recording for an example 
Q&AThe questions are currently being answered by the team and will be posted shortly. 

  • No labels