The best business application struggles from time to time, responds slow to user queries or is temporarily not available. Organizations learned that monitoring of critical components is the way to go. In this post, I will explain benefits and drawbacks involved in this development.
More than 30 years back in time system resource utilization and algorithm used was the most important performance criteria. Bad programming practices and a shortage of system resources resulted in long calculation cycles. In those good old days, engineers were incredibly focused on code optimization because computing power was too expensive. They monitored mainly program duration and resource utilization.
With the rise of new technology client-server based applications replaced the former centralized systems.
Organizations started with the digitalization of their services and invested millions in new computerized products.
Computing power has become less expensive, and the salary of developers rose at the same time. Therefore, hardware was the universal solution when it comes to solving performance bottlenecks. Operation staff re-used the existing infrastructure monitoring for capturing system resource metrics of their backend tiers and added robot based uptime checks to verify system availability.
Services and infrastructure virtualization flattened the way for new, highly distributed services. Nowadays, a typical business application runs in virtual machines deployed on virtual containers hosted on virtual servers. Along with this virtualization across all layers, the system and application monitoring has fundamentally changed. Capturing of uptime and system resource utilization metrics reached it’s boundaries because they don’t provide enough insights to the root-cause of actual issues. Therefore, organizations started with capturing of all user interactions 24×7 across all involved tiers. User experience is the new mission critical factor.
Eliminate the Blindspots
The operation of modern applications with an outdated monitoring stack is risky and a waste of money. Downtimes are bad for companies reputation. Surprisingly, according to major research institutes, performance slowdowns result in 200% more customers banning your services than unavailabilities. Keeping this in mind, I highly recommend eliminating all blind spots in your monitoring chain.
Start with collecting infrastructure resource metrics of every server involved such as CPU, memory, IO and network statistics. Make sure that you keep historical data at least for six months in raw format and use a sampling interval of less than 1 minute to get high-quality data.
Secondly, put a uptime or availability monitoring in place. Use monitoring tools or implement your uptime monitoring solution which executes relevant use cases or service requests on a 5-15 minute schedule. Add verifications to your scripts and send out alerts when thresholds exceeded. This robot based availability monitoring will eliminate manual checks and inform you proactively whenever an outage arise.
Lastly, collect all transactions from an end user perspective across all components involved. User experience monitoring is an essential puzzle in your monitoring stack. Uptime monitoring can alert you if speed limits exceed or services are not available but will not provide enough insights to tackle the cause of actual hotspots. User transaction details enable you to start a horizontal analysis to identify the problematic tier and continue with a vertical analysis through the affected stack.
Don’t put your successful business at risk. Start right now with the elimination of blind spots in your monitoring stack.