2022-3-24 Communications Call

Back to Communications Calls

CCSQ Data & Analytics Communications Call
Date	Thursday, March 24, 2022 at 1:00pm ET
Recording	Recording Passcode: Rm83KM
Agenda	Data & Analytics HCD Research System Updates File Lock Issues Lagging Issues GitHub Issue Maintenance Schedule - Updated! Data Sources and Usage Updates Known Issues Claims Part A Update BIC Updates Best Practices Demo Creating internal, intermediate Hive tables Converting internal tables to external tables in Hive Developing workflows with small, test data sets Restoration process for files and folders User Collaboration Wiki Article Highlight Q&A User Experience Survey
HCD Research	CDR Data Catalog Study (March 2022) When: March 10th - April 15th; 45 mins (via Zoom) Goal: To understand your current experience consuming and/or contributing to the CDR Data Catalog. Purpose: To inform the evaluation of potential future data catalog tool(s)
System Updates	SAS Viya Issues File Lock Issues Issue Description: SAS Viya users have reported file lock errors when reading data from their workbenches during SAS job runs What to Expect: The team has found a solution to this, which requires re-mounting storage gateway workbenches. In order to mitigate the impact to production workflows, D&A team is coordinating with each affected organization to re-mount their workbenches at the most convenient time Affected Communities: Select SAS Viya Users with affected workbenches Call to Action: If you are facing this issue, please open a ServiceNow ticket for our team to remount your workbench. Lagging Issues SAS Viya users have reported intermittent lagging and freezing while utilizing the application. Due to the issue's intermittent nature, some users have reported that their issue has gone away by the time that a help desk ticket has been created. In order to capture the details for the D&A Team to investigate, please follow these instructions found within the Known Issues Log After following these instructions, please provide the information via a helpdesk ticket Github Issue Issue Description: SAS Viya users have reported issues accessing data within Hive due to issues with two of the environment’s worker nodes. What to Expect: The team continues to work with the SAS Vendor to obtain a patch that will resolve the issue. More updates will be shared once more information about the patch is available. Affected Communities: SAS Viya Users Call to Action: A new workaround has been published on this issue. Please follow these instructions found within the Known Issues Log Scheduled CAP & CDR Maintenance Scheduled CAP & CDR Maintenance Schedule April 8th - Updated! May 6th June 3rd All events will begin at 8:00 pm ET and end approximately at 11:00pm ET. A communication will be sent out once maintenance is complete. As a reminder, whenever there is maintenance on the environment, you will need to make sure all of your code and table changes are saved.
Data Source & Usage Updates	Known Issues BIC MBI Sequence - 8/25/2021 Ongoing – TBD QMARS Longer Text Fields - 1/12/2021 Ongoing - TBD Consider using healthcare_service_qmars_ng if your DUA supports it Claims Part A Updates New fields to be added in mid-April 2022 CR20 new fields being added Pricer Version Field (CR12463) added to display the Prospective Payment System (PPS) Pricer Version to Inpatient/SNF, Outpatient, Home Health and Hospice claims at the claim level Medicare-Severity Diagnosis Related Group (MS-DRG) Grouper Version field (CR12463) added to display the MS-DRG Grouper Version to Inpatient/SNF claims at the claim level BIC Updates New columns added in mid-April 2022: HMO_CONTRACT_NUM field HMO_LKIN_PMT_OPTN_CD field More information will be provided in the data catalog in the future
Best Practices Review	Learning Objectives: Creating internal, intermediate hive tables Converting internal tables to external tables in hive Developing workflows with small, test data sets Restoration Process for Files and Folders Creating Internal, Intermediate Hive Tables When running large workflows with multiple data transformations, it is important to create hive tables to store intermediate results Intermediate results are materialized and written to persistent data storage (e.g. S3, etc.) Benefits: Intermediate tables serve as “checkpoints” to allow for review and validation of the data Intermediate tables improve resiliency and allows users to avoid re-generating data sets when troubleshooting and re-running Best Practice: Ensure that intermediate data sets are stored as internal tables, even if you are not sure if they need to be persisted later Reference the recording for an example. Converting Internal Tables to External Tables External tables protect against data loss, in case hive tables are accidentally dropped External tables preserve the underlying data objects even if the table is removed from the hive metastore Important: External tables should only be used for data that must be persisted and will not need to be deleted later Option #1: Create a new external table under a new name (see example in the recording) Option #2: Create external table and preserve original name (see example in the recording) Developing Workflows with Small, Test Datasets When writing new code, running them on large, production-size data sets can be problematic Running code can be time-consuming only to identify an issue with code logic or other bugs later Best Practices: Create small, test data sets for your development Update your code to point to production data sets only after validating logic and outputs Restoration Process for Files and Folders Missing or deleted files and folders within SAS workbenches are recoverable in S3 version-enabled buckets However, D&A will need very specific information in order to run the recovery Recovery scenarios: Individual file Entire folder Multiple files within a folder Restoration Process for Individual File If an individual file was inadvertently deleted, the user needs to submit a ServiceNow ticket with the following information: Filename Filepath Approximate deletion date Any details about the file contents that may distinguish the desired version from others Restoration Process for Entire Folder If an entire folder was inadvertently deleted, the user needs to submit a ServiceNow ticket with the following information: Folder name Folder path Approximate deletion date Any details about the folder contents that may distinguish the desired version from others Restoration Process for Multiple Files within a Folder If multiple files were inadvertently deleted, the user needs to submit a ServiceNow ticket with the following information: File names - please specify if the file names follow a naming convention File paths (parent folder) - If missing files are within different folders, please provide all file paths Approximate deletion date - If files were deleted at different times, lease specify and provide all dates Any details about the files and file contents that may distinguish the desired version from others
User Collaboration Wiki Articles Highlight	Please reference the recording for a walkthrough.
Q&A	Is S3 considered an internal or external data storage location? A - The term internal/external on an S3 bucket is a misnomer because it does not reflect the storage type. External tables just means that it leaves the data in place after dropping it and could point to data stored in HDFS. Similarly, you can make an internal hive table that is stored in S3, and when you drop the table, it'll also remove the underlying data from S3 buckets. Why would users want to create external tables - versus internal tables. A - External tables would represent data sets that should be protected against data loss (i.e. a dataset that should be kept for future analytical work). In HDFS, there would be a back up of that data in case the external table is accidently deleted. Internal tables do not have this protection for intermediate tables. Are external tables available to QuickSight? If external tables are available for direct consumption into QuickSight - do they need to be table type of parquet? A - We are exploring a data publishing standard through which datasets generated or aggregated data can be shared as an external table through S3 buckets to QuickSight. As we finalize the approach to this, we will share further guidance and information. I am getting a permissions error using hive_sql when I try to write to the default database. Is the default and temp databases available to everyone or is it something I need to request access to? A - Temp and default databases are used for demonstration purposes. When you are generating intermediate tables, they should be generated in the Hive schema and databases that your organizations have access to (which depends on your use of the CDR). If there is a request for that particular temporary space to create intermediate tables, please create a request ticket for this. Does this recovery process refer to data and .sas program files, etc., on workbench, or just data files? A - All files are coverable in a workbench as long as it's in a version-enabled S3 bucket. What is the status on access to R from Zeppelin notebooks? A - We will be reaching out to users of Zeppelin to see if they can leverage other languages. As part of the HCD studies that we have been conducting, a pilot for new notebooks solutions will be conducted wherein this new tool would execute R code. More information will be made available in the future.

2022-3-24 Communications Call

SAS Viya Issues

Scheduled CAP & CDR Maintenance

Known Issues

Claims Part A Updates

BIC Updates

Learning Objectives:

Creating Internal, Intermediate Hive Tables

Converting Internal Tables to External Tables

Developing Workflows with Small, Test Datasets

Restoration Process for Files and Folders