You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »



Back to Communications Calls

CCSQ Data & Analytics Communications Call
DateThursday, March 24, 2022 at 1:00pm ET
Recording

Recording
Passcode: *Rm*83KM

Agenda
  1. Data & Analytics HCD Research
  2. System Updates
    1. File Lock Issues
    2. Lagging Issues
    3. GitHub Issue
    4. Maintenance Schedule - Updated! 
  3. Data Sources and Usage Updates
    1. Known Issues
    2. Claims Part A Update
    3. BIC Updates
  4. Best Practices Demo
    1. Creating internal, intermediate Hive tables
    2. Converting internal tables to external tables in Hive
    3. Developing workflows with small, test data sets
    4. Restoration process for files and folders
  5. User Collaboration Wiki Article Highlight
  6. Q&A
  7. User Experience Survey 
HCD Research

CDR Data Catalog Study (March 2022) 

  • When: March 10th - April 15th; 45 mins (via Zoom)
  • Goal: To understand your current experience consuming and/or contributing to the CDR Data Catalog.
  • Purpose: To inform the evaluation of potential future data catalog tool(s)
System Updates

SAS Viya Issues

File Lock Issues

  • Issue Description: SAS Viya users have reported file lock errors when reading data from their workbenches during SAS job runs
  • What to Expect: The team has found a solution to this, which requires re-mounting storage gateway workbenches. In order to mitigate the impact to production workflows, D&A team is coordinating with each affected organization to re-mount their workbenches at the most convenient time
  • Affected Communities: Select SAS Viya Users with affected workbenches
  • Call to Action: If you are facing this issue, please open a ServiceNow ticket for our team to remount your workbench.

Lagging Issues

  • SAS Viya users have reported intermittent lagging and freezing while utilizing the application.
  • Due to the issue's intermittent nature, some users have reported that their issue has gone away by the time that a help desk ticket has been created.
  • In order to capture the details for the D&A Team to investigate, please follow these instructions found within the Known Issues Log
  • After following these instructions, please provide the information via a helpdesk ticket

Github Issue

  • Issue Description: SAS Viya users have reported issues accessing data within Hive due to issues with two of the environment’s worker nodes.
  • What to Expect: The team continues to work with the SAS Vendor to obtain a patch that will resolve the issue. More updates will be shared once more information about the patch is available.
  • Affected Communities: SAS Viya Users
  • Call to Action: A new workaround has been published on this issue. Please follow these instructions found within the Known Issues Log

Scheduled CAP & CDR Maintenance 

Scheduled CAP & CDR Maintenance Schedule

  • April 8th - Updated!
  • May 6th
  • June 3rd

All events will begin at 8:00 pm ET and end approximately at 11:00pm ET. A communication will be sent out once maintenance is complete. As a reminder, whenever there is maintenance on the environment, you will need to make sure all of your code and table changes are saved.

Data Source & Usage Updates

Known Issues

  • BIC MBI Sequence - 8/25/2021
    • Ongoing – TBD
  • QMARS Longer Text Fields - 1/12/2021
    • Ongoing - TBD
    • Consider using healthcare_service_qmars_ng if your DUA supports it

Claims Part A Updates

  • New fields to be added in mid-April 2022
  • CR20 new fields being added
  • Pricer Version Field (CR12463)
    • added to display the Prospective Payment System (PPS) Pricer Version to Inpatient/SNF, Outpatient, Home Health and Hospice claims at the claim level
  • Medicare-Severity Diagnosis Related Group (MS-DRG) Grouper Version field (CR12463)
    • added to display the MS-DRG Grouper Version to Inpatient/SNF claims at the claim level

BIC Updates

  • New columns added in mid-April 2022:
  • HMO_CONTRACT_NUM field
  • HMO_LKIN_PMT_OPTN_CD field

More information will be provided in the data catalog in the future

Best Practices Review 

Learning Objectives:

  • Creating internal, intermediate hive tables

  • Converting internal tables to external tables in hive

  • Developing workflows with small, test data sets

  • Restoration Process for Files and Folders

Creating Internal, Intermediate Hive Tables

    • When running large workflows with multiple data transformations, it is important to create hive tables to store intermediate results
    • Intermediate results are materialized and written to persistent data storage (e.g. S3, etc.)
    • Benefits:
      • Intermediate tables serve as “checkpoints” to allow for review and validation of the data
      • Intermediate tables improve resiliency and allows users to avoid re-generating data sets when troubleshooting and re-running
    • Best Practice: Ensure that intermediate data sets are stored as internal tables, even if you are not sure if they need to be persisted later

    • Reference the recording for an example. 

Converting Internal Tables to External Tables 

    • External tables protect against data loss, in case hive tables are accidentally dropped
    • External tables preserve the underlying data objects even if the table is removed from the hive metastore
    • Important: External tables should only be used for data that must be persisted and will not need to be deleted later
    • Option #1: Create a new external table under a new name (see example in the recording)
    • Option #2: Create external table and preserve original name (see example in the recording)

Developing Workflows with Small, Test Datasets 

  • When writing new code, running them on large, production-size data sets can be problematic
  • Running code can be time-consuming only to identify an issue with code logic or other bugs later
  • Best Practices:
    • Create small, test data sets for your development
    • Update your code to point to production data sets only after validating logic and outputs

Restoration Process for Files and Folders

  • Missing or deleted files and folders within SAS workbenches are recoverable in S3 version-enabled buckets
  • However, D&A will need very specific information in order to run the recovery
  • Recovery scenarios:
    • Individual file
    • Entire folder
    • Multiple files within a folder
  • Restoration Process for Individual File
    • If an individual file was inadvertently deleted, the user needs to submit a ServiceNow ticket with the following information:
      • Filename 
      • Filepath
      • Approximate deletion date
      • Any details about the file contents that may distinguish the desired version from others 
  • Restoration Process for Entire Folder
    • If an entire folder was inadvertently deleted, the user needs to submit a ServiceNow ticket with the following information:
      • Folder name
      • Folder path
      • Approximate deletion date
      • Any details about the folder contents that may distinguish the desired version from others
  • Restoration Process for Multiple Files within a Folder 
    • If multiple files were inadvertently deleted, the user needs to submit a ServiceNow ticket with the following information:
      • File names - please specify if the file names follow a naming convention
      • File paths (parent folder) - If missing files are within different folders, please provide all file paths
      • Approximate deletion date - If files were deleted at different times, lease specify and provide all dates
      • Any details about the files and file contents that may distinguish the desired version from others 
User Collaboration Wiki Articles HighlightPlease reference the recording for a walkthrough.
Q&A
  1. Is S3 considered an internal or external data storage location?
    A - The term internal/external on an S3 bucket is a misnomer because it does not reflect the storage type. External tables just means that it leaves the data in place after dropping it and could point to data stored in HDFS. Similarly, you can make an internal hive table that is stored in S3, and when you drop the table, it'll also remove the underlying data from S3 buckets.
  2. Why would users want to create external tables - versus internal tables.
    A - External tables would represent data sets that should be protected against data loss (i.e. a dataset that should be kept for future analytical work). In HDFS, there would be a back up of that data in case the external table is accidently deleted. Internal tables do not have this protection for intermediate tables. 
  3. Are external tables available to QuickSight? If external tables are available for direct consumption into QuickSight - do they need to be table type of parquet?
    A - We are exploring a data publishing standard through which datasets generated or aggregated data can be shared as an external table through S3 buckets to QuickSight. As we finalize the approach to this, we will share further guidance and information. 
  4. I am getting a permissions error using hive_sql when I try to write to the default database. Is the default and temp databases available to everyone or is it something I need to request access to?
    A - Temp and default databases are used for demonstration purposes. When you are generating intermediate tables, they should be generated in the Hive schema and databases that your organizations have access to (which depends on your use of the CDR). If there is a request for that particular temporary space to create intermediate tables, please create a request ticket for this. 
  5. Does this recovery process refer to data and .sas program files, etc., on workbench, or just data files?
    A - All files are coverable in a workbench as long as it's in a version-enabled S3 bucket. 
  6. What is the status on access to R from Zeppelin notebooks?
    A - We will be reaching out to users of Zeppelin to see if they can leverage other languages. As part of the HCD studies that we have been conducting, a pilot for new notebooks solutions will be conducted wherein this new tool would execute R code. More information will be made available in the future. 
  • No labels