Why Monitoring is essential in DevOps

Time to market is more important than ever before because the competition is on the rise. Successful retailers are deploying every 12 seconds a new release. In this blog post, I will outline reasons for short release sprints and shine a light on the fundamental role of monitoring in a DevOps environment.

Benefits of the DevOps approach

Tight release sprints are a challenge for your development pipeline.  Design-implement-test-deploy slots are short and require an excellent collaboration between your teams. Daily standup meetings will help to improve and distribute the knowledge about the new product quickly.

Ongoing learning and product optimization is another key aspect for teams working in DevOps mode. Especially if you develop innovative products, the requirements are often not in place, and therefore the design is very tricky. Tight feedback loops are essential. Your teams will continuously create and develop new features, and your clients will review those. Due to this involvement, the acceptance for new products is high.

Also testing and deployment at production is no longer a pain in DevOps projects. All repetitive tasks such as functional regression tests, performance tests, and necessary security checks are highly automated. QA specialists focus mainly on the new features. Automated tests verify the core functionality and quickly provide an overview of the actual quality of the new product.

Automated deployment solutions enable teams promptly roll-out new features or switch back to the previous version if something does not work as expected. Thanks to this high automation of test and deployment activities the failure rate is low. All parties involved are willing to share their product knowledge. Once the new release successfully deployed your operation teams provide the necessary insight and feedback. They continuously collect key performance metrics and share those across the organization.

Monitoring in DevOps environments

DevOps is no guarantee for error free software. The pace of new features and changed components is high. Typically, complexity is increasing, and often hundreds of micro-services are used. Whenever a failure occurs, there is not much time to identify the root-cause and work on an appropriate resolution.

Successful DevOps teams found a solution for this dilemma. They understood that a profound monitoring is a key to adequate mean time to repair cycles. And appropriate monitoring approach includes all layers, all transactions, and all environments. Often they use a monitoring platform which captures the end user experience, allows a drill down through technical components and a vertical analysis of error hot spots.

Your Takeaways

  • Automation is essential in DevOps
  • Monitoring of all transactions provides the required insights
  • Horizontal and vertical drill down allows a quick hotspot analysis
  • Share monitoring data across organization

Don’t put the advantages of DevOps projects at risk. I recommend improving your monitoring strategy to hold the pace with your competitors and realize more benefits of the powerful DevOps procedure.


Performance is Everyone’s Matter

Retailer such as Amazon set the user experience bar extremely high and it seems that this is one of their secret recipes. I am not a passioned Amazon shopper, but sometimes I buy technical stuff from their fantastic online shop. Independent whether I use my mobile, tablet or desktop computer, their websites load fast. Several clicks later I place my order, and within a few days, the new equipment arrives.

Performance is a vertical discipline

Maybe you are not in competition with those resellers, but in the past, they quickly adopted to new fields, and suddenly your former uniqueness may disappear. Once you are in direct competition, the available time to speed up your applications will be very short.

Responsive and reliable services require a holistic approach. Let’s assume that your developers did an excellent job and considered performance from day one in design decisions and your test teams simulated adequate multi-user tests on a close to a production environment. Several months after this app has been deployed into production your former responsive system sucks, and the blame game starts.

Your user becomes extremely frustrated. They avoid using your business application. Some of them raise tickets and talk hours with your support team about the slowness of this application. Business units escalate this topic through your upper management and the pressure on IT gets higher and higher. Daily war room sessions end without any outcome. There is a large proportion of try and error, but nobody can tackle the performance issue.

After a while, your teams identify gaps in the monitoring chain because they are not able to correlate their data lakes, and there is no data flow capturing solution in place. In fact, nobody has an idea concerning the interaction between the application components. This learning is essential because from this point on your teams understood two things. The first beeing that performance is a vertical discipline and the second being that they need a transaction monitoring solution which captures the flow across their application components 24 x 7.

Benefits of a performance first enterprise

Those businesses which consider performance from day one in their development pipeline save money. Often they learned this through an experience such as outlined above. However, they implement, test and operate their business applications with performance in mind. All parties agreed that ongoing analysis, optimization, and innovation is the best solution for reliable and responsive business applications. Due to this proactive thinking mentality war room sessions are no longer required.

There is no reason for endless firefighting. Besides short resolution times and excellent user experience, your teams will have more time for challenging tasks such as optimization and innovation. Make performance everyone’s matter is a comfortable way to have more fun at work.

Turn the Ship around; From Firefighting to Performance Driven

Jumping from one performance hotspot to the next can be very frustrating because there is never enough time to eliminate the issues. Successful companies addressed those troubles years ago. If you are still in the firefighting mode; don’t worry, I will give you insights to this dilemma and also a resolution for this frustrating development.

The Problem

Slow loading or pure performing business applications are a nightmare. The longer this slowdown continues, the higher the pressure from your customers. They expect a quick bugfix and do not understand why their IT department is not able to provide a better service quality. Endless war room sessions, try end error experiments and regularly detailed status reports make those exercises even more annoying.

Traditional software development initiatives often neglect non-functional aspects. Your business clearly describes their functional demands. Software designer and developer construct the required software. Testing specialists capture those requirements with sufficient test cases. After deployment on production, the response time sucks and your business application is almost unusable during peak hours.

It takes ages for your support teams to become aware of those slowdowns. Their log file based monitoring approach does not deliver the necessary insight. System resource utilization is below the agreed boundaries. Your infrastructure teams recommend solving this issue with a ramp up in hardware.  Several days later your user community is nevertheless frustrated because the performance is still unacceptable.

The Solution

This performance disaster mentioned above is no one-way road. You can always turn the ship around by integrating performance considerations in your development chain. Obviously, organizations struggling with frustrated users due to slow responding applications need to understand their gaps which hold them back.

Based on my experience from hundreds of performance engineering projects non-functional requirements are essential and should never treat as an afterthought. Once you’ve documented the required aspects, your developers can consider those in their implementation decisions and your test teams can organize the required test scenarios.

Finally, you should monitor all transactions from the end user perspective. There are powerful application and user experience monitoring solutions out there which will give you the essential insights. Also, those tools come out with analytics features and enable your support teams to triage complex performance issues.

The easiest way to turn the ship around is to assess your performance engineering maturity and improve it step by step according to my maturity model.

Keep doing the good work!

A Forward-Looking Application Monitoring Strategy

Over the past few years, I’ve worked with companies on the transformation of their monitoring strategy and the outcome was fantastic. User experience and reliability of their business critical applications have been dramatically improved. In fact, a modern application monitoring strategy is more a matter of doing the right things.

Organizations often rely on an out-dated monitoring approach. They don’t have an active monitoring of their business critical applications in place. Only their customers who work with the applications creates a ticket if the expected functionality doesn’t work properly. Whenever a ticket arrives, a support analyst tries to reproduce the identified problem, which is often not possible due to the lack of information and data available. Regrettably, hours or even days later the problem will be solved, and the customers are not happy that they had to wait so long for the solution to their issues.

Outages are a pain because they lead to shortages in financial revenue and in worst cases to a bad reputation. There is no error-free software and therefore you have to find ways to deal with this uncertainty. I will give you now three simple steps which help you to mitigate those risks and gain excellent insight into your business applications.

Step 1

Actively monitor user experience in production applications. A robot executes your important use cases according to the specified schedule and depending on the result of those executions your support team will be alerted. Especially in non-working hours when nobody is using your application this synthetic execution of important use cases is essential. When it comes to tools, I recommend using Silk Performance Manager from Microfocus because it’s easy to use and very powerful.

Step 2

You should monitor all transactions from the end users perspective. Some problems have an impact on several users while others affect the whole user community. For ongoing improvements and efficient root-cause analysis, this kind of monitoring is essential. dynaTrace is the market leader in this user experience and application monitoring field. Their platform provides many outstanding features such as automatic problem detection, artificial intelligence, and excellent integration possibilities.

Step 3

Finally, collect system monitoring metrics. Your application won’t deliver adequate user experience if CPU, memory, network or IO metrics are permanent in critical areas. Therefore, collect low-level metrics and raise alerts if thresholds exceeded. Tool wise you can choose between commercial and open source solutions. The most companies have this kind of monitoring already in place. The low-level monitoring landscape is huge. Look at the solution from Nagios if you consider removing gaps in this discipline. A good user and performance monitoring solution provides also infrastructure monitoring features.

Once you’ve implemented your proactive monitoring strategy don’t forget a continuous review of the collected metrics. Take 30 minutes per month for each of your applications and review the captured user experience, response times, throughput, error rate and system resource utilization metrics of the last 30 days.

Some data scientists argue:

“The truth is in your data”

I fully agree with this argument and I believe that, once you’ve implemented a forward-thinking application monitoring strategy, you will share the same opinion.