Back to Townhalls Home Page

CCSQ Data & Analytics Townhall

Date

Wednesday, September 27, 2023, at 1:00 pm ET

Recording

September Townhall

Agenda

1.Data Camp This November

2.Monthly Satisfaction Survey Review & Poll

3.Ambari Decommission Update

4.Upcoming Efforts: Hive to Glue Migration and Schema Migration to Lakehouse 

5.Q&A

CCSQ Data & Analytics
Fall 2023 Data Camp

November 15th and 16th 

Data Camp Ad Video

Monthly Satisfaction Survey Review
June - August

Reference the recording to see the results of the previous months' poll (00:02:50)

Ambari Decommission and Transition Updates

Timeline Update

  • Decommissioning has been moved from Sept 29th.
  • We are resolving issues with Databricks (DBX) support before we can fully transition.
  • Please be on the lookout for the new decommissioning date that will be communicated as soon as possible.
  • Once the issues are fixed and users have migrated to DBX, we will decommission during our maintenance event.

Actions Users Need to Take

  • Request access through HCQIS Access Roles and Profile (HARP) to gain access to the Databricks workspace.
  • QualityNet Analytics
  • Organization Name
  • Databricks Developer
  • SAS Users
  • Connect to the new macro to ensure you have access to the Notebook structured query language (SQL) Cluster
  • Zeppelin Decommission
  • Migrate Zeppelin notebooks to DBX notebook.

Notebook SQL Cluster vs. Compute/R Clusters 

  • What is the difference?
  • Notebook SQL Cluster:
    • Released late July.
    • Allows for SAS users to connect to Databricks through a new macro.
      • Documentation for this available on CCSQ site
    • Utilize the power of the Databricks engine within SAS.
    • One cluster for whole community

Computer Clusters 

  • Released early April.
  • Using DBX user interface (UI), you can use a notebook, attach to your organization specific cluster, and conduct your analysis.
  • Languages Available: Python, R, SQL, Scala
  • All native notebook capabilities and library installations
  • Workbench and Centralized Data Repository (CDR) Data
  • Users can share their notebooks.
  • Users can visualize their results.

Issues Encountered SAS vs. Databricks 

  • Performance Related
    • Writing large amounts of data from SAS dataset to HIVE
    • Interpreting Date, Time, and Datetime using DBX Open Database Connectivity (ODBC) driver.
    • Currently testing new drivers
  • Format related
    • Writing SAS dataset with Date and/or Time formatted columns to Hive table.

Documentation and Trainings 

Contact Us for Help!

Upcoming Improvement Efforts

Migration of Table and Database Metadata from Hive Relational Database Service (RDS) to Glue

  • This is an informational announcement for all users of the CDR environment.
  • Once the Ambari Decommissioning takes place, we will be following it up with a migration of table and database metadata from Hive RDS to Glue.
  • This is a part of our efforts to modernize our object store and meta store infrastructure in the CDR to make it more maintainable, scalable, and performant.
  • Testing has taken place, and will continue to take place, to ensure that with the move in metadata there will be no impact to how you read tables and databases in the environment today (e.g., column order, column datatypes, column truncation, etc.)
  • More information will be coming as we get closer to implementation. Please note again, this will not take place until after the Ambari decommissioning, of which the new date has not yet been announced.

Consolidation of Tables and Databases in the CDR Lakehouse

  • Some users of the CDR environment may recall our efforts last quarter to migrate all workbench data from the legacy environment to the new consolidated CDR Lakehouse environment.
  • We are pleased to announce we will be continuing these modernization efforts in this upcoming quarter, with a migration of table and database object data from its current place in the legacy environment to the new consolidated CDR Lakehouse environment.
  • Like the last quarter, these efforts will be phased out and take place during maintenance windows over the next two quarters. A reminder that production maintenance windows are typically during the first Friday of each month.
  • A Confluence page will be released soon to indicate which databases will be migrated during which maintenance window.
  • We ask that if you have any concerns related to contractual deliverables during this period, please reach out to our team through the request form process, indicating when you would prefer to have your database migrated and we will do our best to accommodate.
  • CCSQ Data And Analytics Request Form - Data - QualityNet Confluence (cms.gov)
Q&A

9-23-2023 Townhall Q&A

  • No labels