Submit a request through the CCSQ Data & Analytics Request Form to start the contribution process. The request will be reviewed and prioritized by the CMS Business Owners of the CDR. Use the following as a guide for what information to include in the request.
The following information is required to be included in the request to become a contributor:
- A short description of the data with the minimum required information.
- The data dictionary with the minimum required information.
- The point of contact for the contribution effort.
- The anticipated timeline to load, validate, and release data to end users.
- The list of any validation groups that will require access to the pre-prod data for testing.
- The Enterprise Privacy Policy Engine (EPPE) DUA Entry associated with this data.
- The Access Control Model
The CDR will need one of the following for support purposes to forward user inquiries specific to the data:
- A CCSQ ServiceNow Support Group name
- A support email address
The team will also need to know:
- The access control model for this data (see below).
- Whether a data catalog entry can be made for this data on the public data catalog or if access to the data documentation should be restricted.
Required Data Documentation
All documentation submitted for contributed data must meet the minimum required standards as outlined below.
Short Description
Short description of data must include the following:
- Title - The name of the dataset or resource.
- Domain - The category of the resource such as (eg., Clinical, Provider, Claims, Beneficiary)
- Source - The system or activity that generates or provides the dataset.
- Granularity - The level of data detail that the resource covers, such as Atomic, Aggregated, Hybrid
- Load/Refresh Frequency - How often this data source will be refreshed/updated.
- Geographic Scope - What level of geographic scope this data entails (e.g., national, regional, state, county, zip)
- Timespan Covered - What timespan this data covers (e.g., 2019-2022)
- Personally Identifiable Information (PII)/Protected Health Information (PHI)/Sensitive Information - Whether this data contains PII/PHI/Other Sensitive information or not.
Data Dictionary
At minimum, documentation for the data contributed to the CDR should be in the form of a data dictionary that includes:
- Table name
- Table description
- Field Name (Short Name)
- Field Name (Long Name)
- Data Type/Length
- Field Description (provide details)
- Possible Values/Range
- Partition Key
- Comments
Submitted data documentation should be in a comma-separated values (CSV) format that meets the required data dictionary template.
The preferred data format for partnering systems is parquet due to superior performance and lower storage cost. However, CSV or JavaScript Object Notation (JSON) may also be used. This data should be stored in the partnering Application Development Organization’s (ADO's) Simple Storage Service (S3) bucket. It is also recommended to create directory versioning so that if parquet files need to be re-created, they can be created in a different version directory and not impact the current parquet files.
- Preferred Data Format: Parquet
- Accepted Data Formats: CSV, JSON
Access Control Model
The access control model refers to how contributors would like to restrict or grant access to the data that they are contributing. The CDR supports two access control models:
- Automated (DUA Based)
- Access to this data will automatically be granted to any end user organization that has the appropriate DUA entry on their DUA. Contributors will provide the DUA entry that corresponds to the contributed data.
- Manual (Notification Based)
- In addition to the DUA entry, access to this data is manually approved by the Business Owners of the data. End user organizations request access to the data, and we notify the Business Owners for either approval or rejection.
Contributors must indicate which access model they would like to utilize for your data.
CDR Onboarding Steps
Onboarding into the CDR is central to becoming a Data Contributor. If the prospective contributor is not already onboarded, see the training and onboarding page for more information.
Being onboarded into the CDR comes with the following automatically provisioned resources for the contributor's organization/group:
- A production database/schema
- A pre-production (test) database/schema
- A Service Principal/Service Account
Once onboarded, the team will generate a schema specific to the contributor's BYOD data in accordance with CMS Data Taxonomy.
Contribution Workflow
The following is a generalized workflow that outlines the steps involved in contributing to the CDR.
Phase | Step | POC |
---|
Phase 1 | Request Submitted with Required Information | Contributor |
Phase 1 | Onboarding Initiated (if not already onboarded) | Contributor |
Phase 1 | Service Principal Automatically Generated | Automated upon onboarding |
Phase 2 | Bring Your Own Data (BYOD) Prod and pre-PROD Schemas created | CDR |
Phase 2 | Elevated Service Principal granted r/w/x access | CDR |
Phase 2 | Data Documentation Reviewed | CDR |
Phase 2 | Data Catalog Entry Created (if applicable) | CDR |
Phase 3 (optional) | Data published to pre-PROD schema | Contributor |
Phase 3 (optional) | Green light for Validation Groups | Contributor |
Phase 3 (optional) | Validation groups granted r/x access | CDR |
Phase 3 (optional) | Validation period conducted | Contributor |
Phase 4 | Data published to PROD BYOD Schema | Contributor |
Phase 4 | Green light for end user access | Contributor |
Phase 4 | End users granted r/x access per access model | CDR |
Phase 4 | End user communication released | CDR |