Grafana-Based Visualization System for Applications Monitoring
Find out how we implemented a visualization system for application monitoring and created custom dashboards for the system's components.
Executive Summary
Developing a Monitoring and Visualization System
Our customer
Trinity Audio is a company which develops an AI-driven ecosystem of solutions aiding to manage audio experiences for publishers and content creators, including voice editing, content discovery, virtual assistant skills, data dashboards, etc.
The obstacles they faced
The client’s solutions built on node.js needed a visualization system for monitoring so as to enable observing the impact of new code deployments, detecting early possible issues after code deployments, as well as getting insights into the performance, health and status of application components.
How we helped
We implemented a visualization system for monitoring the necessary metrics and events of the required applications, as well as created custom dashboards for each part of the system which needed to be monitored.
The Challenge
Performance and Usability Tracking
The previous lack of dedicated application monitoring visualization resulted in a few core challenges for Trinity Audio as the solution provider:
- Lack of confidence during and after the deployment
The absence of access to required metrics and real-time insights led to uncertainties in the deployment process. This hindered the client’s ability to ensure the application’s alignment with expectations after the app’s features were released or updated. - Ambiguous system health understanding
Without an intuitive visualization system, it was difficult for the client to grasp the health status of their applications promptly. - Delayed response to incidents
The lack of a responsive alert system hindered the client’s ability to promptly respond to any issues and make needed interventions timely. - Undetectable performance regressions
Without historical data overviews and metrics comparison, pinpointing the causes and timings of application components degradation was problematic. - Limited data on functionality usage
The client faced constraints of proper metrics and meaningful insights. It posed a challenge in understanding which functionalities were effectively utilized and which ones could potentially be phased out.
The Solution
Comprehensive Application Monitoring
In order to settle on addressing the all above-mentioned obstacles we’ve decided that using Grafana will be the best solution in our case. To do so we needed, firstly, to host Grafana itself (it could be a SaaS or self-hosted), and secondly, a place to establish keeping required metrics or events in the (DB storage, e.g. MySQL, any time series database).
In order to simplify managing and support of the chosen platform we decided to use Hosted Graphite, which provides both DB (Graphite) and Grafana as one package (SaaS). Under the hood, they have a graphite DB, which provides fast, reliable storage of numeric data over time. The huge advantage is a very simple and easy Grafana integration, that makes adding a query to render metrics quite an effortless process for an end-user.
Steps we’ve taken:
- Create a Grafana account on the Hosted Graphite cloud.
- Start sending metrics into the Hosted Graphite endpoint.
- Install statsd agent for our node.js applications which allows us to send these metrics into Hosted Graphite in the most efficient way;
- Create dashboards with all the required charts for monitoring the application.
Because the application stack consisted of different parts and components, it was a good idea to create a separate dashboard for each of them as well as add multiple charts to the dashboards.
Graphite features we used for the solution building:
- Counting: allows us to count how many events occurred during a specific period of time.
- Timing: allows to measure time for any operation, e.g. function executing, time for SQL query, etc.
We also used grafana variables, in order to switch between two development environments and compare stage vs. production (for our particular case).
Finally, we configured the alerts for the desired threshold to establish holistic notification systems which let us know if something goes wrong, whereupon we supplemented it with a slack integration that notifies us immediately on such alerts.
Services utilized:
- Grafana + Hosted Graphite
- Amazon EC2
The Results
Elevating System Health Visibility
The implementation of the visualization system for application monitoring brings the following improvements:
- Release confidence
All visualized metrics and real-time insights allow for increased confidence in the deployment process. They enable users to monitor performance, compare staging versus production environments, and ensure that the application performs as expected.
- Effortless performance assessment
The system enables easy assessment of the performance and goals of any deployed functionality. This implies that evaluating the success and efficiency of deployed features becomes straightforward. - Intuitive system health assessment
System health can be easily understood at first glance – the visualization system represents the status of the apps’ health in a clear and immediate manner. - Quick incident response
The alert system allows for swift responses to accidents or issues. Notifications are generated to promptly address unexpected events and ensure timely intervention. - Overview for regression identification
By providing a historical overview and the ability to compare with previous data, it facilitates the identification of regressions for each component of the apps. - Functionality usage analysis
Based on the metrics we can understand what functionality is being used and vice versa: adding metrics to older endpoints aids in assessing their relevance, potentially leading to their deprecation.
Each of these accomplishments contributes to enhancing the overall efficiency of the client’s business operations, as follows:
- The reduction in deployment risk ensures less disruption to overall business activity.
- Stable application performance and system reliability improve user satisfaction.
- Swift incident response minimizes disruptions to business operations and customer experiences.
- Clear system health assessment led to minimized application downtime and maintenance costs.
- Data-driven decisions on functionality usage lead to cost savings and more efficient allocation of development resources for features with higher user demand.
Why Romexsoft
Optimizing Applications Based on Data
Romexsoft is an AWS-certified Consulting Partner, trusted Software Development Company and Managed Service Provider, founded in 2004. We help customer-centric companies build, run, and optimize their cloud systems on AWS with creative, stable, and cost-efficient solutions.
Our key values
- Delivery of quality solutions
- Customer satisfaction
- Long-term partnership
We have successfully delivered 100+ projects and have a proven track record in FinTech, HealthCare, AdTech, and Media industries.
Romexsoft possesses a 5-star rating on Clutch due to its strong expertise, responsiveness, and commitment. 60% of our clients have been working with us for over 4 years.