2022-2-24 Communications Call

Error rendering macro 'excerpt-include'

No link could be created for 'Collaboration'.

CCSQ D&A HOME | Communications Calls | CCSQ Data & Analytics Data Camp | QNET Analytics Distro List

Error rendering macro 'excerpt-include'

No link could be created for 'Requesting & Getting Access'.

CCSQ Data & Analytics Communications Call
Date	February 24, 2022 at 1:00pm ET
Recording	2.24.2022 Communications Call Recording 2.24.2022 Communications Call Transcript
Agenda	General Announcements User Collaboration Wiki - Demo CDR HDFS Migration System Updates File Lock Issues Lagging Issues GitHub Issue Maintenance Schedule Data Source and Usage Updates Best Practices Demo Partition Pruning Optimizing Joins Breaking up Large Queries Q&A
General Announcements	User Collaboration Wiki - New! What is the User Collaboration Wiki? A collaboration forum through which you can submit your solutions and best practices for others within the CAP & CDR user community. How can you use the User Collaboration Wiki? Submit best practices or how-to articles Read through other user submissions, providing feedback, comments, or likes Leverage solutions from other users to enhance your analytic work Articles that have been reviewed and approved by D&A SMEs will be posted out on the Knowledge Base Important Notes The forum is currently available to those who have HARP Confluence/Atlassian licenses. Either request a license via HARP, or work with your COR to see if an Atlassian licenses can be granted for your organization. After the demo, we will open the mic for any feedback or suggestions. This is an MVP release, so we are open to your feedback to ensure this space is designed for your benefit. Questions from users should continue to be posted to the #ccsq_data_analytics Slack channel CDR HDFS Migration Data & Analytics Team will start reaching out to CDR users to migrate legacy hive data sets from HDFS to S3 in phases After migration, Data & Analytics Team will provide guidance and documentation on storing future data sets in S3 instead Benefits: Frees up disk space within CDR Cluster S3 storage provides unlimited capacity, better persistence, and lifecycle capabilities Supports future transformations of CDR Architecture to EMR Dependency: Data & Analytics will depend on teams for coordination and validation of migrated data sets
System Updates	SAS Viya Issues File Lock Issues Issue Description: SAS Viya users have reported file lock errors when reading data from their workbenches during SAS job runs What to Expect: The team has found a solution to this, which requires re-mounting storage gateway workbenches. In order to mitigate the impact to production workflows, D&A team is coordinating with each affected organization to re-mount their workbenches at the most convenient time Affected Communities: Select SAS Viya Users with affected workbenches Call to Action: If you are facing this issue, please open a ServiceNow ticket for our team to remount your workbench Lagging Issues SAS Viya users have reported intermittent lagging and freezing while utilizing the application. Due to the issue's intermittent nature, some users have reported that their issue has gone away by the time that a help desk ticket has been created. In order to capture the details for the D&A Team to investigate, please follow these instructions found within the Known Issues Log After following these instructions, please provide the information via a helpdesk ticket Github Issue Issue Description: SAS Viya users have reported issues accessing data within Hive due to issues with two of the environment’s worker nodes. What to Expect: The team continues to work with the SAS Vendor to obtain a patch that will resolve the issue. More updates will be shared once more information about the patch is available. Affected Communities: SAS Viya Users Call to Action: A new workaround has been published on this issue. Please follow these instructions found within the Known Issues Log Maintenance Schedule Updating dates for scheduled CAP & CDR Maintenance events: March 4 April 1 May 6 June 3 All events will begin at 8:00 pm ET and end approximately at 11:00pm ET. A communication will be sent out once maintenance is complete. As a reminder, whenever there is maintenance on the environment, you will need to make sure all of your code and table changes are saved.
Data Source and Usage Updates	Known Issues BIC MBI Sequence - 8/25/2021 Ongoing – TBD QMARS Appeals - 6/18/2021 Ongoing – TBD Consider using healthcare_service_qmars_ng if your DUA supports it QMARS - 1/12/2021 Ongoing - TBD Consider using healthcare_service_qmars_ng if your DUA supports it
Best Practices Demo	Learning Objectives: Partition Pruning Optimizing Joins Breaking Up Large Queries Partition Pruning Partition pruning is the mechanism where a query can skip reading data files that correspond to one or more partitions Critical to reducing amount of data scanned in hive query for optimization Reference the recording for an example Best Practice: Always add lower and upper bounds for partitions in query criteria Hint: You can check whether you’re applying proper partition pruning by viewing execution plan of your hive query with %hive_explain(<sql>) When viewing the execution plan, there are a few thing to watch out for: Check to make sure you are using partition keys in your query Large number of rows Reference the powerpoint for more examples Optimizing Joins Optimizing JOINs in hive queries can significantly reduce cluster resource utilization and improve query performance Strategies include: Always include JOIN conditions Choosing the proper JOIN type INNER JOIN: The INNER JOIN keyword selects all rows from both the tables as long as the condition satisfies. LEFT JOIN: This join returns all the rows of the table on the left side of the join and matching rows for the table on the right side of join. RIGHT JOIN: RIGHT JOIN is similar to LEFT JOIN. This join returns all the rows of the table on the right side of the join and matching rows for the table on the left side of join Best Practice: Use LEFT JOIN and RIGHT JOIN deliberately Reference the recording for examples Break Up Large Queries Break up queries spanning large time periods into smaller time windows and merge them later Benefits: Reduces the total amount of data scanned in single query Reduces use of cluster resources in single query De-risks your work and allows you to restart where jobs may have stopped Reference the recording for an example
Q&A	The questions are currently being answered by the team and will be posted shortly.

2022-2-24 Communications Call

User Collaboration Wiki - New!

CDR HDFS Migration

SAS Viya Issues

File Lock Issues

Lagging Issues

Github Issue

Maintenance Schedule

Known Issues

Learning Objectives:

Partition Pruning

Optimizing Joins

Break Up Large Queries