Horizontal Navigation Bar Page |
---|
| QualityNet Operations Dashboard provides CMS Executives with: - A layer of automation that conveys reporting from disparate sources to a dashboarding solution.
- A user interface used to provide reports/visualizations to system information, status, and customer experience.
- A user interface that can be extended to CMS leadership/product owners, ADO DevSecOps, ADO Developers, HD and end-users.
- A multi-tenant solution that allows reports/visualizations for each team (common and custom reports/visualizations).
- Provide a “single pane of glass” view across various HCQIS IT systems.
The QualityNet Operations Dashboard consists of 3 opensource software (Grafana, Telegraf, and InfluxDB), as well as some AWS components (RDS, DynamoDB, Kinesis, etc). The future of this dashboard will incorporate Artificial Intelligence (AI) and Machine Learning (ML) to proactively detect anomalies.
Login at https://qnetdashboard.cms.gov/ |
Horizontal Navigation Bar Page |
---|
| Getting Started
Step 1: If you do not yet have a HARP account, please register for a HARP ID. For instructions on the HARP registration process, refer to the HARP page. Step 2: Once the HARP account has been created, log into HARP and request a "Service" entitlement via a HARP User Role. - Select User Roles from the top of the page, and select Request a Role.
- Select QualityNet Operations Dashboard
- Select your Organization.
- Select the following user role:
Step 3: The organization's Security Official reviews and approves/denies the user role request. You will be notified via email that your request has been submitted, and again when your role has been approved or denied. Step 4: Log into QualityNet Operations Dashboard using your HARP credentials. Note: Only 1 organization currently exists and that is ADO-HIDS-Ventech Solutions. On June 2, CMS users will have the Viewer role automatically assigned to their EUA account in HARP. |
Horizontal Navigation Bar Page |
---|
| FAQs
Panel |
---|
borderColor | #254b78 |
---|
titleColor | #ffffff |
---|
borderWidth | 1 |
---|
titleBGColor | #254b78 |
---|
borderStyle | solid |
---|
title | General |
---|
|
Expand |
---|
| Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources. Please visit https://grafana.com/ for more information. |
|
Panel |
---|
borderColor | #254b78 |
---|
titleColor | #ffffff |
---|
borderWidth | 1 |
---|
titleBGColor | #254b78 |
---|
borderStyle | solid |
---|
title | Access |
---|
|
Expand |
---|
title | How do I register for a HARP account? |
---|
| For instructions on the HARP registration process, refer to the HARP page. |
Expand |
---|
title | How do I request access to QualityNet Operations Dashboard? |
---|
| Users must register for a HARP ID. Once the HARP account has been created, log into HARP and request the QualityNet Operations Dashboard role (See "Requesting a User Role" process above). |
Expand |
---|
title | How do I log into QualityNet Operations Dashboard? |
---|
| Login at https://idm.cms.gov/ using your HARP credentials. Select QualityNet Operations Dashboard after logging in. |
|
Panel |
---|
borderColor | #254b78 |
---|
titleColor | #ffffff |
---|
borderWidth | 1 |
---|
titleBGColor | #254b78 |
---|
borderStyle | solid |
---|
title | Plugins |
---|
|
Expand |
---|
| Grafana plugins are either Panel (visualizations), Data source (communicate with external sources of data), or App (application monitoring). You can also choose to build your own plugin. For more information please visit: https://grafana.com/docs/grafana/latest/plugins/ |
Expand |
---|
title | What Grafana plugins are currently installed? |
---|
| - Alert List
- Azure Monitor
- Bar gauge
- Blendstat
- CloudWatch
- D3 Gauge
- Dashboard list
- Diagram
- Discrete
- Dynamic text
- Elasticsearch
- FlowCharting
- Gauge
- Google Cloud Monitoring
- Graph
- Graphite
- Heatmap
- InfluxDB
- Jaeger
- Logs
- Loki
- Microsoft SQL Server
- MySQL
- New Relic
- News
- Node Graph
- OpenTSDB
- Pie chart v2
- Polystat
- PostgreSQL
- Prometheus
- Singlestat
- Stat
- Status Panel
- Table
- Table (old)
- Tempo
- TestData DB
- Text
- Time series
- Zipkin
- ePict Panel
|
|
|
Excerpt |
---|
Horizontal Navigation Bar Page |
---|
|
Tabs Container |
---|
|
Tabs Page |
---|
| QualityNet Operations Dashboard v5.2New Feature(s): - Synthetics for the following service(s):
- Full Decomposition for the following service(s):
- Machine Learning Enablement – 30-Minute Uptime Prediction Model for the following service(s):
- Confluence – This AI/ML capability with deep learning models provides the service owner with a look ahead of 30 minutes into the future for any potential issues at the KPI level for the Confluence service, with an 86% confidence level in this prediction. This gives service owners an opportunity to investigate their service and look for potential issues.
- Machine Learning Enablement – Anomaly Detection for the following service(s):
- Barracuda – This AI/ML capability with deep learning models provides service owners with a 24-hour historical view of anomalies that may have occurred with their service. This capability will aid service owners with investigating the root cause and fixing any issues with their service that would otherwise lead to a potential future service issue or degradation.
Issue(s) Resolved: Known Issue(s): - Time picker for service drilldown dashboards has been temporarily disabled to address possible issues with system stability.
|
Tabs Page |
---|
| QualityNet Operations Dashboard v5.1.2New Feature(s): - Machine Learning Enablement – 30 Minute Uptime Prediction Model for the following service(s):
|
Tabs Page |
---|
| QualityNet Operations Dashboard v5.1.1Roll Back New Feature(s): - Roll Back Machine Learning Enablement – 30 Minute Uptime Prediction Model for the following service(s):
|
Tabs Page |
---|
| QualityNet Operations Dashboard v5.1
New Feature(s): - Full Decomposition for the following service(s):
- Machine Learning Enablement – 30 Minute Uptime Prediction Model for the following service(s):
Issue(s) Resolved: - The following issues with service drilldown dashboards are fixed:
- AD – Process Count KPI panels updated to show process names along with host names.
- Certificate Authority – Process Count KPI panels updated to show process names along with host names.
- ClamAV – Process Count KPI panels updated to show process names along with host names.
- PRS – Updated legend name for the KPI panels.
- Hive – Average Healthy Host Count and Process Count KPI panels updated with correct legend names.
- Drilldown dashboards for Office365, DELWeb, and McAfee WG services fixed to show host names in KPI panel labels.
Known Issue(s): - Time picker for service drilldown dashboards has been temporarily disabled to address possible issues with system stability.
|
Tabs Page |
---|
| QualityNet Operations Dashboard v4.6New Feature(s): - Synthetic Monitoring for the following service(s):
- Full Decomposition for the following service(s):
- EQRS Portal Service
- HARP/HIDS Automation (Additional KPIs)
- WAN - New devices added into New Relic and reporting the same in QNOD
Issue(s) Resolved: - Resolved the issue with the Alerter process to be able to send notifications for service state changes
- Fixed the ‘Disk Free Percent’ KPI reported via New Relic for multiple services
- Updated F5 URLs synthetic test scripts to accommodate the F5 network device’s move to new hardware
|
Tabs Page |
---|
| QualityNet Operations Dashboard v4.5New Feature(s) - Synthetic Monitoring for the following service(s):
- EQRS Scoring and Feedback
- QIES
- QTSO
- Full Decomposition for the following service(s):
- Additional reports have been added to the Grafana Metrics API:
- The Recovery Rate report calculate the ratio of failed deployments to the total number of deployments, shown on a quarterly bases.
- The Mean Time to Recover report shows, on a quarterly basis, shows the average amount of time it takes for application to recover from a failed deployment
|
Tabs Page |
---|
| QualityNet Operations Dashboard v4.4New Feature(s) - Synthetic Monitoring for the following services
- FAS
- QCOR
- HARP/HIDS Automation
- MFT
- AWS RSS messages
- US-East-1 and global regions are continuously retrieved from AWS and available at #aws-rss-alerts Slack channel
- New dashboards available
- 24-hour Service Issues Summary
- Service Issues Reports
- New component added for New Relic service drilldown to capture New Relic minion (synthetic test monitors) health
Bug Fixes: - FireEye ETP
- Updated Synthetic Availability to where it no longer reports a constant degraded state.
- FireEye vNX
- Updated Interface KPIs to where they are no longer reporting a constant degraded state
- McAfee GW
- Updated Interface KPIs to where they are no longer reporting a constant degraded state
- Certificate Authority
- Both Disks are now reporting correctly, and the overall service health is more accurate
- Nexus
- Fixed thresholds for Disk, previously reporting “Insufficient Data” when there was indeed data
- Slack
|
Tabs Page |
---|
| QualityNet Operations Dashboard v4.3New Feature(s) - Synthetic Monitoring for the following services:
- QDIVS
- Bonnie/MAT
- QSEP/ITSP
- CCSQ QuickSight
- Full Decomposition for the following services:
- Anomaly detection displayed for the following services:
- Improved reporting
- AWS evaluation now includes reports from the AWS global and us-east-1 public health API
- Landing page Enhancements
- Service panels now include a hover function to display current service health %.
- New service status icon to identify services which have KPI issues yet do not fully affect a service’s state.
- Metric Sources Dashboard
- A dashboard displaying a summary of metric sources is now available (metric summaries dashboard) and is available as a link from the landing page.
- A dashboard drilling into each metric source, detailing degraded, failed, and missing values is available from the metric summaries dashboard.
Bug Fixes: - Services where KPI’s were appearing as blank on the service decomposition diagram on a service drilldown now appear as No Data (grey color)
- Missing KPI’s for services now contribute to the overall state of a service.
- Most services were not being evaluated in their ‘minutes_threshold’ value via their service decomposition definitions. This caused a single point to affect the overall state of a service. Service KPI’s are now being evaluated correctly.
|
Tabs Page |
---|
| QualityNet Operations Dashboard v4.2New Feature(s) - Full Decomposition for the following services:
- Ambari Infrastructure
- TestRail
- SAS Viya
- Zeppelin
|
Tabs Page |
---|
| QualityNet Operations Dashboard v4.1New Feature(s) - Full Decomposition for the following services:
Infrastructure Upgrade - Grafana upgraded to v8.4.4 from v8.1.8
|
Tabs Page |
---|
| QualityNet Operations Dashboard v3.5New Services New Features/Bug Fixes - Thresholds for FileCloud and Routing services are adjusted to display the status of the application accurately.
- Fixed the Request API Count KPI to display data and services having Request Count start reporting data.
|
Tabs Page |
---|
| QualityNet Operations Dashboard v3.4Issues Resolved - DNS, Office365, AD, Certificate Authority - resolved issue where KPIs were not reporting in QNOD after a New Relic upgrade
New Features - HQR - added Application and Network metrics, expanded Compute metrics, and added two more subsystems
- HARP - moved from Collaboration panel to Identity & Access panel
- Modified thresholds from 0 minutes to 3 minutes to reduce noise
|
Tabs Page |
---|
| QualityNet Operations Dashboard v3.3New Services - CDR (Ambari, HIVE, and Ranger subsystems)
- HQR
- DELWeb
New Features - HARP Service drilldown updated to include the subsystems along with HOMER subsystem.
- Updated service drilldown dashboards to include Jira issues panel.
|
Tabs Page |
---|
| QualityNet Operations Dashboard v3.2New Services New Features - FireEye vNX
- Added Device Availability and Response Time KPIs
- Metrics API
- Added new report APIs to communicate deployment recovery metrics and new roles to API keys to enhance security
|
Tabs Page |
---|
| QualityNet Operations Dashboard v3.1.1Category Changes in Current Service Status Overview Dashboard: - Zscaler has moved from Security to Network
- Syslog has moved from Security to Monitoring
Issues Resolved - Provided a fix to the NewRelic service to reflect the current health status more accurately.
- Implemented update to the F5 service synthetic tests to reflect the current service health status in the dashboard.
|
Tabs Page |
---|
| QualityNet Operations Dashboard v3.1New Features: - New Services
- Network
- AWS
- QMARS Fax (Biscom)
- Network Routing
- Presentation Zone
- WAN Connectivity
- Collaboration
|
Tabs Page |
---|
| QualityNet Operations Dashboard v2.7.1New Features: - DevSecOps Metrics API MVP
- Usable functionality will be a dashboard presenting the number of deployments per day, per application.
- Please reach out to Tim Regulski for an API Key and instructions on configuration
|
Tabs Page |
---|
| QualityNet Operations Dashboard v2.7.0New Features: - Entity Discovery Automation
- We can catalog all devices on our data sources.
- Provides functionality for us to be much faster in understanding what devices exist and what data gaps we may have for a particular service.
- DAS QNOD Integration Prep
- Worked with the HARP team to create a new DAS entitlement with HARP to provide the teams access to QNOD.
- SaaS Issue Reporting
- Slack service drill down can display RSS feeds for the latest active and resolved incidents.
New Services: - SNOW
- F5
- FirePower IPS
- Office 365/Exchange
- QNET
- Mailman
- PRS
- FireEye VNX
|
Tabs Page |
---|
| QualityNet Operations Dashboard v2.6.0New Features: - Implemented Grafana upgrade from 8.1.2 to 8.1.8 to address CVE-2021-43798.
- QNOD alert notifications per service can be configured to send to multiple Email distros and Slack channels. Alerts will be sent on service state changes.
New Services: - McAfee Web Gateway
- Trend Micro DS
|
Tabs Page |
---|
| QualityNet Operations Dashboard v2.5.0Architecture Improvements: - Data collection processes have been decoupled from each other on a per-service basis. This will improve performance and make it less likely for a failure in on service's data pull to affect others.
New Features: - Enable QNOD notifications to send alerts to Email distribution or Slack channel
- Service Status will now be derived from the weighted system health score. This will improve the accuracy of the system status and make it less subject to a single KPI status
New Services: - CA Certificates
- Survey Monkey
- FireEye ETP
Issues Resolved: KPIs that are designated to alarm only after a specified period of time will now alarm only after the specified period as intended |
Tabs Page |
---|
| QualityNet Operations Dashboard v2.4.0New Features: - Added Service Health Panel to all dashboards to display the weighted score of the service over time
- Added Current Issues Dashboard so that KPI issues are seen on a single dashboard
New Service: |
Tabs Page |
---|
| QualityNet Operations Dashboard v2.3.1Issues Resolved: - Updated the view for Current Service Status Overview Dashboard
- Fixed the unittype in the FileCloud and MFT service drilldown panels
- Nexus service divided into subsystems for more visibility into the service
|
Tabs Page |
---|
| QualityNet Operations Dashboard v2.3New Features - Migrated to serverless metric ingestion, increasing reliability and efficiency
- Improved the layout of the current status dashboard to more succinctly show service groups
- Fixed the display of certain metrics for Jira and Confluence
- Implemented Notifications sent to Slack for service status change
New Services: |
Tabs Page |
---|
| QualityNet Operations Dashboard v2.2New Features - Added weighted system health score to represent system health in a more dynamic way
- Implemented automatic creation of dashboard drill-downs to ensure consistency and improve velocity
- Produced POC/MVP of automatic discovery and metric ingestion engine
- Redesigned the current status overview dashboard
- Incorporated Unit Tests for Lambda functions
|
Tabs Page |
---|
| QualityNet Operations Dashboard v2.0Issues Resolved - KPIs, where no data is okay, will no longer affect the component or system status
- Tuning of the logging frequency for Flux tasks
|
Tabs Page |
---|
| QualityNet Operations Dashboard v1.5New Services:
New Features: - Enhanced the layout of the drilldown and service dependency tree diagram to improve the viability of the KPIs.
|
Tabs Page |
---|
| QualityNet Operations Dashboard v1.4.1New Features: - Improved Performance -- serverless computing (AWS Lambda) was deployed to increase the efficiency and timing of queries, which lowers the load up to 90% on database tier
- No Data -- the dashboard now displays 'No Data' if there is insufficient data to represent the service's status and KPI
- UI Improvements -- various small improvements to the UI, color uniformity, panel type, etc
Bug Fixes: - "No Query Returned Results" error has been fix on the Executive Dashboard
|
Tabs Page |
---|
| QualityNet Operations Dashboard v1.4New Service(s): The following service(s) will be integrated with the new release:
New Features: - FileCloud, Nexus, and Splunk are fully decomposed with metrics provided in their drilldowns
- Syslog synthetic test panels added to drilldowns
- Confluence and Jira data ingest processing and visualization for Network and Database metrics added in drilldowns
Bug Fixes: - Added info to panels in drilldown dashboard
|
Tabs Page |
---|
| QualityNet Operations Dashboard v1.3.1Bug Fixes: - Infrastructure upgraded to address the intermittent "query not returning results" error on the Dashboard panels
- Updated AMI to address security compliance findings
|
Tabs Page |
---|
| QualityNet Operations Dashboard v1.3New Service(s): The following service(s) will be integrated with the new release:
New Feature(s): Service Drill Downs - Ansible, Jenkins, GitHub and New Relic drill-downs provide fully decomposed metrics.
Bug Fixes: The following issues will be resolved with the new release: - QNet Dashboard Logo updated with a new transparent icon.
- Logic to determine component status updated to reflect correct status of component.
- Service Dependency Diagram panel updated for better visibility.
- Updated the Executive Dashboard view to render the service status correctly.
|
Tabs Page |
---|
| QualityNet Operations Dashboard v1.2New Service(s):
What is new: Service Drill Downs |
Tabs Page |
---|
| QualityNet Operations Dashboard v1.1.1Bug Fixes: - Increased the number of containers to 2 for Grafana to fix the "503 service temporarily unavailable"
- Fixed the link to Jira issues page on confluence from dashboard.
- Increased the query timeout and query concurrency in Influxdb to resolve the "query length limit exceeded" error.
- Increased the CPU and memory allocation for Grafana, Influxdb and Telegraf.
|
Tabs Page |
---|
| QualityNet Operations Dashboard v1.1What is New: The following systems have their own drilldown dashboard: - Confluence
- JIRA
- Service Now
- FileCloud
- Ansible
- Nexus
- Jenkins
- Splunk
- ClamAV
- Syslog
What is included in this release: - Upgraded to Grafana v8.0.3
- 508 Accessibility
- Added 'alt' attributes to images
- Removed heading <h1,2,3,4> attributes
- Fixed some contrast issues
- Removed 3 semi-hidden panels, reduced code by 180 lines
|
Tabs Page |
---|
| QualityNet Operations Dashboard v1.0The following Applications are currently being monitored for availability (up/down): - Confluence
- JIRA
- Service Now
- FileCloud
- Ansible
- Nexus
- Jenkins
- Splunk
What is included in this release: - Removed the ability for users to log into the dashboard with local accounts, users are forced to have a HARP account
- Okta/HARP integration for authentication
- 4 hour authentication timeout after no activity
- Automated vulnerability scanning utilizing Netsparker and Nessus
- Implemented Sonar Scanner to validate code in GitHub for vulnerabilities and bugs
- Fixed Overlay UI issues
- Fixed Panel Lengths so they all match and are even
- Updated the queries to fix the service status results in Grafana
- Grafana synthetic testing to validate dashboard availability
- Container and Host based alerts to Slack ie CPU Utilization %, Memory Usage %, Disk Space, Host not responding, and Database storage utilization alerts
Known Issues: |
|
|
|
|