- Created by Angel Tucker, last modified by Alex Hosch on Aug 10, 2022
You are viewing an old version of this page. View the current version.
Compare with Current View Page History
« Previous Version 65 Next »
QualityNet Operations Dashboard provides CMS Executives with:
- A layer of automation that conveys reporting from disparate sources to a dashboarding solution.
- A user interface used to provide reports/visualizations to system information, status, and customer experience.
- A user interface that can be extended to CMS leadership/product owners, ADO DevSecOps, ADO Developers, HD and end-users.
- A multi-tenant solution that allows reports/visualizations for each team (common and custom reports/visualizations).
- Provide a “single pane of glass” view across various HCQIS IT systems.
The QualityNet Operations Dashboard consists of 3 opensource software (Grafana, Telegraf, and InfluxDB), as well as some AWS components (RDS, DynamoDB, Kinesis, etc). The future of this dashboard will incorporate Artificial Intelligence (AI) and Machine Learning (ML) to proactively detect anomalies.
Login at https://qnetdashboard.cms.gov/
Getting Started
Step 1: If you do not yet have a HARP account, please register for a HARP ID. For instructions on the HARP registration process, refer to the HARP page.
Step 2: Once the HARP account has been created, log into HARP and request a "Service" entitlement via a HARP User Role.
- Select User Roles from the top of the page, and select Request a Role.
- Select QualityNet Operations Dashboard
- Select your Organization.
- Don't see your organization listed?
- Currently the personas are CMS and Ventech. If your organization would like to obtain early access to the dashboard, please reach out to our Slack Channel #help-qnod-dashboard.
- Select the following user role:
- Viewer
Step 3: The organization's Security Official reviews and approves/denies the user role request. You will be notified via email that your request has been submitted, and again when your role has been approved or denied.
Step 4: Log into QualityNet Operations Dashboard using your HARP credentials.
Note: Only 1 organization currently exists and that is ADO-HIDS-Ventech Solutions. On June 2, CMS users will have the Viewer role automatically assigned to their EUA account in HARP.
Environments:
- TEST - https://test.qnetdashboard.cms.gov/
- IMPL - https://impl.qnetdashboard.cms.gov/
- PROD - https://qnetdashboard.cms.gov/
FAQs
Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources. Please visit https://grafana.com/ for more information.
Please contact us at #help-qnod-dashboard
For instructions on the HARP registration process, refer to the HARP page.
Users must register for a HARP ID. Once the HARP account has been created, log into HARP and request the QualityNet Operations Dashboard role (See "Requesting a User Role" process above).
Login at https://idm.cms.gov/ using your HARP credentials. Select QualityNet Operations Dashboard after logging in.
Grafana plugins are either Panel (visualizations), Data source (communicate with external sources of data), or App (application monitoring). You can also choose to build your own plugin. For more information please visit: https://grafana.com/docs/grafana/latest/plugins/
- Alert List
- Azure Monitor
- Bar gauge
- Blendstat
- CloudWatch
- D3 Gauge
- Dashboard list
- Diagram
- Discrete
- Dynamic text
- Elasticsearch
- FlowCharting
- Gauge
- Google Cloud Monitoring
- Graph
- Graphite
- Heatmap
- InfluxDB
- Jaeger
- Logs
- Loki
- Microsoft SQL Server
- MySQL
- New Relic
- News
- Node Graph
- OpenTSDB
- Pie chart v2
- Polystat
- PostgreSQL
- Prometheus
- Singlestat
- Stat
- Status Panel
- Table
- Table (old)
- Tempo
- TestData DB
- Text
- Time series
- Zipkin
- ePict Panel
For a list of available plugins, please visit https://grafana.com/grafana/plugins/
Please contact us at #help-qnod-dashboard
QualityNet Operations Dashboard v5.2
New Feature(s):
- Synthetics for the following service(s):
- MedTrak
- Full Decomposition for the following service(s):
- QCOR
- QDIVS
- MedTrak
- Machine Learning Enablement – 30-Minute Uptime Prediction Model for the following service(s):
- Confluence – This AI/ML capability with deep learning models provides the service owner with a look ahead of 30 minutes into the future for any potential issues at the KPI level for the Confluence service, with an 86% confidence level in this prediction. This gives service owners an opportunity to investigate their service and look for potential issues.
- Machine Learning Enablement – Anomaly Detection for the following service(s):
- Barracuda – This AI/ML capability with deep learning models provides service owners with a 24-hour historical view of anomalies that may have occurred with their service. This capability will aid service owners with investigating the root cause and fixing any issues with their service that would otherwise lead to a potential future service issue or degradation.
Issue(s) Resolved:
- None this release
Known Issue(s):
- Time picker for service drilldown dashboards has been temporarily disabled to address possible issues with system stability.
QualityNet Operations Dashboard v5.1.2
New Feature(s):
- Machine Learning Enablement – 30 Minute Uptime Prediction Model for the following service(s):
- Barracuda
- Syslog
QualityNet Operations Dashboard v5.1.1
Roll Back New Feature(s):
- Roll Back Machine Learning Enablement – 30 Minute Uptime Prediction Model for the following service(s):
- Barracuda
- Syslog
QualityNet Operations Dashboard v5.1
New Feature(s):
- Full Decomposition for the following service(s):
- QIES
- QTSO
- Machine Learning Enablement – 30 Minute Uptime Prediction Model for the following service(s):
- Barracuda
- Syslog
Issue(s) Resolved:
- The following issues with service drilldown dashboards are fixed:
- AD – Process Count KPI panels updated to show process names along with host names.
- Certificate Authority – Process Count KPI panels updated to show process names along with host names.
- ClamAV – Process Count KPI panels updated to show process names along with host names.
- PRS – Updated legend name for the KPI panels.
- Hive – Average Healthy Host Count and Process Count KPI panels updated with correct legend names.
- Drilldown dashboards for Office365, DELWeb, and McAfee WG services fixed to show host names in KPI panel labels.
Known Issue(s):
- Time picker for service drilldown dashboards has been temporarily disabled to address possible issues with system stability.
QualityNet Operations Dashboard v4.6
New Feature(s):
- Synthetic Monitoring for the following service(s):
- MedTrax
- Full Decomposition for the following service(s):
- EQRS Portal Service
- HARP/HIDS Automation (Additional KPIs)
- WAN - New devices added into New Relic and reporting the same in QNOD
Issue(s) Resolved:
- Resolved the issue with the Alerter process to be able to send notifications for service state changes
- Fixed the ‘Disk Free Percent’ KPI reported via New Relic for multiple services
- Updated F5 URLs synthetic test scripts to accommodate the F5 network device’s move to new hardware
QualityNet Operations Dashboard v4.5
New Feature(s)
- Synthetic Monitoring for the following service(s):
- EQRS Scoring and Feedback
- QIES
- QTSO
- Full Decomposition for the following service(s):
- iQIES/PASCID
- Additional reports have been added to the Grafana Metrics API:
- The Recovery Rate report calculate the ratio of failed deployments to the total number of deployments, shown on a quarterly bases.
- The Mean Time to Recover report shows, on a quarterly basis, shows the average amount of time it takes for application to recover from a failed deployment
QualityNet Operations Dashboard v4.4
New Feature(s)
- Synthetic Monitoring for the following services
- FAS
- QCOR
- HARP/HIDS Automation
- MFT
- AWS RSS messages
- US-East-1 and global regions are continuously retrieved from AWS and available at #aws-rss-alerts Slack channel
- New dashboards available
- 24-hour Service Issues Summary
- Service Issues Reports
- New component added for New Relic service drilldown to capture New Relic minion (synthetic test monitors) health
Bug Fixes:
- FireEye ETP
- Updated Synthetic Availability to where it no longer reports a constant degraded state.
- FireEye vNX
- Updated Interface KPIs to where they are no longer reporting a constant degraded state
- McAfee GW
- Updated Interface KPIs to where they are no longer reporting a constant degraded state
- Certificate Authority
- Both Disks are now reporting correctly, and the overall service health is more accurate
- Nexus
- Fixed thresholds for Disk, previously reporting “Insufficient Data” when there was indeed data
- Slack
- Broken incident links to https://status.slack.com have been fixed
QualityNet Operations Dashboard v4.3
New Feature(s)
- Synthetic Monitoring for the following services:
- QDIVS
- Bonnie/MAT
- QSEP/ITSP
- CCSQ QuickSight
- Full Decomposition for the following services:
- DEL
- Anomaly detection displayed for the following services:
- Confluence
- Improved reporting
- AWS evaluation now includes reports from the AWS global and us-east-1 public health API
- Landing page Enhancements
- Service panels now include a hover function to display current service health %.
- New service status icon to identify services which have KPI issues yet do not fully affect a service’s state.
- Metric Sources Dashboard
- A dashboard displaying a summary of metric sources is now available (metric summaries dashboard) and is available as a link from the landing page.
- A dashboard drilling into each metric source, detailing degraded, failed, and missing values is available from the metric summaries dashboard.
Bug Fixes:
- Services where KPI’s were appearing as blank on the service decomposition diagram on a service drilldown now appear as No Data (grey color)
- Missing KPI’s for services now contribute to the overall state of a service.
- Most services were not being evaluated in their ‘minutes_threshold’ value via their service decomposition definitions. This caused a single point to affect the overall state of a service. Service KPI’s are now being evaluated correctly.
QualityNet Operations Dashboard v4.2
New Feature(s)
- Full Decomposition for the following services:
- Ambari Infrastructure
- TestRail
- SAS Viya
- Zeppelin
QualityNet Operations Dashboard v4.1
New Feature(s)
- Full Decomposition for the following services:
- Airflow
- Hive
- Ranger
Infrastructure Upgrade
- Grafana upgraded to v8.4.4 from v8.1.8
QualityNet Operations Dashboard v3.5
New Services
- EQRS Portal
- iQIES
New Features/Bug Fixes
- Thresholds for FileCloud and Routing services are adjusted to display the status of the application accurately.
- Fixed the Request API Count KPI to display data and services having Request Count start reporting data.
QualityNet Operations Dashboard v3.4
Issues Resolved
- DNS, Office365, AD, Certificate Authority - resolved issue where KPIs were not reporting in QNOD after a New Relic upgrade
New Features
- HQR - added Application and Network metrics, expanded Compute metrics, and added two more subsystems
- HARP - moved from Collaboration panel to Identity & Access panel
- Modified thresholds from 0 minutes to 3 minutes to reduce noise
QualityNet Operations Dashboard v3.3
New Services
- CDR (Ambari, HIVE, and Ranger subsystems)
- HQR
- DELWeb
New Features
- HARP Service drilldown updated to include the subsystems along with HOMER subsystem.
- Updated service drilldown dashboards to include Jira issues panel.
QualityNet Operations Dashboard v3.2
New Services
- Airflow
- SAS Viya
- Zeppelin
New Features
- FireEye vNX
- Added Device Availability and Response Time KPIs
- Metrics API
- Added new report APIs to communicate deployment recovery metrics and new roles to API keys to enhance security
QualityNet Operations Dashboard v3.1.1
Category Changes in Current Service Status Overview Dashboard:
- Zscaler has moved from Security to Network
- Syslog has moved from Security to Monitoring
Issues Resolved
- Provided a fix to the NewRelic service to reflect the current health status more accurately.
- Implemented update to the F5 service synthetic tests to reflect the current service health status in the dashboard.
QualityNet Operations Dashboard v3.1
New Features:
- New Services
- Network
- AWS
- QMARS Fax (Biscom)
- Network Routing
- Presentation Zone
- WAN Connectivity
- Collaboration
- TestRail
- Network
QualityNet Operations Dashboard v2.7.1
New Features:
- DevSecOps Metrics API MVP
- Usable functionality will be a dashboard presenting the number of deployments per day, per application.
- Please reach out to Tim Regulski for an API Key and instructions on configuration
QualityNet Operations Dashboard v2.7.0
New Features:
- Entity Discovery Automation
- We can catalog all devices on our data sources.
- Provides functionality for us to be much faster in understanding what devices exist and what data gaps we may have for a particular service.
- DAS QNOD Integration Prep
- Worked with the HARP team to create a new DAS entitlement with HARP to provide the teams access to QNOD.
- SaaS Issue Reporting
- Slack service drill down can display RSS feeds for the latest active and resolved incidents.
New Services:
- SNOW
- F5
- FirePower IPS
- Office 365/Exchange
- QNET
- Mailman
- PRS
- FireEye VNX
QualityNet Operations Dashboard v2.6.0
New Features:
- Implemented Grafana upgrade from 8.1.2 to 8.1.8 to address CVE-2021-43798.
- QNOD alert notifications per service can be configured to send to multiple Email distros and Slack channels. Alerts will be sent on service state changes.
New Services:
- McAfee Web Gateway
- Trend Micro DS
QualityNet Operations Dashboard v2.5.0
Architecture Improvements:
- Data collection processes have been decoupled from each other on a per-service basis. This will improve performance and make it less likely for a failure in on service's data pull to affect others.
New Features:
- Enable QNOD notifications to send alerts to Email distribution or Slack channel
- Service Status will now be derived from the weighted system health score. This will improve the accuracy of the system status and make it less subject to a single KPI status
New Services:
- CA Certificates
- Survey Monkey
- FireEye ETP
Issues Resolved:
KPIs that are designated to alarm only after a specified period of time will now alarm only after the specified period as intended
QualityNet Operations Dashboard v2.4.0
New Features:
- Added Service Health Panel to all dashboards to display the weighted score of the service over time
- Added Current Issues Dashboard so that KPI issues are seen on a single dashboard
New Service:
- Active Directory
QualityNet Operations Dashboard v2.3.1
Issues Resolved:
- Updated the view for Current Service Status Overview Dashboard
- Fixed the unittype in the FileCloud and MFT service drilldown panels
- Nexus service divided into subsystems for more visibility into the service
QualityNet Operations Dashboard v2.3
New Features
- Migrated to serverless metric ingestion, increasing reliability and efficiency
- Improved the layout of the current status dashboard to more succinctly show service groups
- Fixed the display of certain metrics for Jira and Confluence
- Implemented Notifications sent to Slack for service status change
New Services:
- DNS
- Slack
QualityNet Operations Dashboard v2.2
New Features
- Added weighted system health score to represent system health in a more dynamic way
- Implemented automatic creation of dashboard drill-downs to ensure consistency and improve velocity
- Produced POC/MVP of automatic discovery and metric ingestion engine
- Redesigned the current status overview dashboard
- Incorporated Unit Tests for Lambda functions
QualityNet Operations Dashboard v2.0
Issues Resolved
- KPIs, where no data is okay, will no longer affect the component or system status
- Tuning of the logging frequency for Flux tasks
QualityNet Operations Dashboard v1.5
New Services:
- MFT
- Tenable Nessus
New Features:
- Enhanced the layout of the drilldown and service dependency tree diagram to improve the viability of the KPIs.
QualityNet Operations Dashboard v1.4.1
New Features:
- Improved Performance -- serverless computing (AWS Lambda) was deployed to increase the efficiency and timing of queries, which lowers the load up to 90% on database tier
- No Data -- the dashboard now displays 'No Data' if there is insufficient data to represent the service's status and KPI
- UI Improvements -- various small improvements to the UI, color uniformity, panel type, etc
Bug Fixes:
- "No Query Returned Results" error has been fix on the Executive Dashboard
QualityNet Operations Dashboard v1.4
New Service(s):
The following service(s) will be integrated with the new release:
- Barracuda (Mailman)
- HARP
New Features:
- FileCloud, Nexus, and Splunk are fully decomposed with metrics provided in their drilldowns
- Syslog synthetic test panels added to drilldowns
- Confluence and Jira data ingest processing and visualization for Network and Database metrics added in drilldowns
Bug Fixes:
- Added info to panels in drilldown dashboard
QualityNet Operations Dashboard v1.3.1
Bug Fixes:
- Infrastructure upgraded to address the intermittent "query not returning results" error on the Dashboard panels
- Updated AMI to address security compliance findings
QualityNet Operations Dashboard v1.3
New Service(s):
The following service(s) will be integrated with the new release:
- GitHub
- NewRelic
New Feature(s):
Service Drill Downs
- Ansible, Jenkins, GitHub and New Relic drill-downs provide fully decomposed metrics.
Bug Fixes:
The following issues will be resolved with the new release:
- QNet Dashboard Logo updated with a new transparent icon.
- Logic to determine component status updated to reflect correct status of component.
- Service Dependency Diagram panel updated for better visibility.
- Updated the Executive Dashboard view to render the service status correctly.
QualityNet Operations Dashboard v1.2
New Service(s):
- ZScaler
What is new:
Service Drill Downs
- Confluence, JIRA, and ZScaler drill-downs provide fully decomposed metrics.
Added scan latency panel to ClamAV User Experience component.
Link to Confluence page with Jira issues fixed to ensure that it opens in a new tab instead of same tab.
QualityNet Operations Dashboard v1.1.1
Bug Fixes:
- Increased the number of containers to 2 for Grafana to fix the "503 service temporarily unavailable"
- Fixed the link to Jira issues page on confluence from dashboard.
- Increased the query timeout and query concurrency in Influxdb to resolve the "query length limit exceeded" error.
- Increased the CPU and memory allocation for Grafana, Influxdb and Telegraf.
QualityNet Operations Dashboard v1.1
What is New:
The following systems have their own drilldown dashboard:
- Confluence
- JIRA
- Service Now
- FileCloud
- Ansible
- Nexus
- Jenkins
- Splunk
- ClamAV
- Syslog
What is included in this release:
- Upgraded to Grafana v8.0.3
- 508 Accessibility
- Added 'alt' attributes to images
- Removed heading <h1,2,3,4> attributes
- Fixed some contrast issues
- Removed 3 semi-hidden panels, reduced code by 180 lines
QualityNet Operations Dashboard v1.0
The following Applications are currently being monitored for availability (up/down):
- Confluence
- JIRA
- Service Now
- FileCloud
- Ansible
- Nexus
- Jenkins
- Splunk
What is included in this release:
- Removed the ability for users to log into the dashboard with local accounts, users are forced to have a HARP account
- Okta/HARP integration for authentication
- 4 hour authentication timeout after no activity
- Automated vulnerability scanning utilizing Netsparker and Nessus
- Implemented Sonar Scanner to validate code in GitHub for vulnerabilities and bugs
- Fixed Overlay UI issues
- Fixed Panel Lengths so they all match and are even
- Updated the queries to fix the service status results in Grafana
- Grafana synthetic testing to validate dashboard availability
- Container and Host based alerts to Slack ie CPU Utilization %, Memory Usage %, Disk Space, Host not responding, and Database storage utilization alerts
Known Issues:
- Internet Explorer 11 not currently supported. For more information https://grafana.com/docs/grafana/latest/installation/requirements/#supported-web-browsers
- Application logs are not ingesting into Splunk
- ADO-HIDS-Ventech Solutions is the only available organization in HARP
For any questions please contact the Service Center by phone, email, or via Slack.
Phone: 1-866-288-8914 (TRS: 711)
Email: ServiceCenterSOS@cms.hhs.gov
Slack: #help-service-center-sos
Dashboard Help Channel: #help-qnod-dashboard
Visit the Post-Consumer Onboarding page
- No labels