Page 5 of 5

Pitfalls in Performance Engineering

For those hoping that reaching a satisfying user experience is a cake walk, there is the bad news, reliable and responsive applications are not for free. Those who have failed to consider certain principal activities in their software development chain won’t reach satisfactory user experience.

One first failure is to solve your application performance bottlenecks with a ramp up of infrastructure. Based on my experience, from several hundred performance investigations, fewer than a tenth of these matters could be settled with a ramp up of infrastructure.

Secondly, it is advisable to specify performance requirements in early stages and make them part of the contracts with your software providers. At least the load pattern, including the number of concurrent users and the throughput during regular and peak business hours must be specified. Also, some response time and system utilization thresholds are desirable.

Finally, performance requirements must regularly be verified. Make sure that you have performance and capacity management specialists in charge who monitor the end-to-end user experience on development, test and production stages. Small changes in usage or data volumes could have a massive impact on your application response times.

Don’t be reactive, identify issues before your user community will be affected.

Meaningful Performance Requirements

Slow responding applications can be a nightmare
Imagine that your customer sits next to you while you try to open his new account. You press the “Create Customer” button and expect that seconds later their account has been created. Surprisingly, after 60 seconds there is still no confirmation available, and your Create Customer process seems not to respond. That is a very embarrassing situation because you will lose the confidence of your new client and in the worst case, you will start the whole process once again including the insertion of the customer’s details.

The Problem
It’s a common practice that the functionality is precisely specified. Requirement specialists outline use cases from a functional perspective. Developer’s implement expected features, and during system or user acceptance test, quality assurance specialists verify and validate the requirements. However, after rollout, serious performance issues occur.

The Solution
Nobody would build a house without a detailed plan that outlines both, the visual representation and the architectural characteristics. The same practice should be applied when it comes to software. You should start with the specification of functional and also non-functional requirements. Architects should incorporate both in their decisions, and testers should validate the former and the latter.

But, vaguely formulated performance requirements are not useful at all. Based on my experience, meaningful performance requirements consists of:
•    Expected number of concurrent users
•    Expected number of use cases per interval
•    90th percentile response times for user interactions
•    90th percentile response times for backend service calls
•    Maximum error rate per interval
•    Acceptable response times on WAN locations

However, forward thinking companies have performance considerations already integrated into their development chain. Keep doing the good work and consider non-functional requirements in your development process.

Web Page Performance Analysis

Responsive websites consist of much more than a well-sized backend and fast service response times. Our modern web applications provide rich content in order to attract a broad user community. The navigation needs to be as easy as possible and the sites must support many devices such as mobiles, tablets as well as desktop computers.

With perfect user experience in mind, web developers often use rich frameworks, which allow them to create fancy user interfaces. In addition, every engineer comes up with their favourite framework and integrates this into their new web-based business applications.

I understand those developers but the problem might be that the webpage loads slowly due to having too many resources that need to be downloaded. Therefore, my personal recommendation to avoid such issues is twofold:

  1. A lead developer or architecture specialist should provide strict policies and regularly verify the used frameworks.
  2. Web page design analysis should become part of your software development process to avoid major web page design faults.

So, keep web performance best practices in mind and avoid critical design mistakes. There are excellent tools such as PageSpeed or YSlow available for free. Use those tools during software development stages and verify the score of your application regularly.


Performance Issues by Design

In the past, performance issues were often solved using investment in infrastructure. In many cases this approach worked fine but those days are long gone. In our highly-distributed micro-services based IT landscape the biggest chunk of performance issues are no longer related to network or infrastructure.

FAT Clients have been widely eliminated or migrated to web-based applications. While web-based applications provide many benefits, they also have their flipsides. One being that there are more components such as presentation, business, data access layer involved. Another one being that too much functionality is implemented from the client’s side, which cannot be re-used and very often results in bad user experience.

Based on my experience from several hundred performance investigations, the biggest proportion of performance issues are design related. Such issues could not be solved with investments in infrastructure.

For instance, if single users click results in ten web service calls, the response time might not be acceptable to users from abroad. Your only chance to solve such performance problems is a rework of your application design in order to improve the communication pattern.

Therefore, don’t underestimate the performance impact of your application design. Verify application and page design during the development stage and follow web performance best practices provided by Google and Yahoo.

Network Performance Impact

Centrally hosted and globally used applications don’t always deliver satisfied user experience. Some people may argue; “don’t worry, let’s increase the bandwidth to solve our performance problem.. simple!”. Based on my experience the biggest proportion of WAN performance issues are not related to a shortage in bandwidth.

In one of my previous projects, I was involved in a performance related fire-fighting task. Some business customers from abroad claimed to be having very bad user experience. The initial analysis done by the responsible application manager demonstrated that this problem could not be reproduced and that all application components including infrastructure and database were delivering excellent response times. However, the business manager escalated this issue and increased the pressure on IT department.

So how does one investigate and solve such problems? I analysed the performance metrics and identified that there was no real user monitoring available. Therefore, I integrated a user-experience monitoring suite, which injected java script to the web pages. Using this approach I was able to measure the response times from the end user’s perspective.

An analysis of the end user performance metrics allowed me to get excellent insight into the root-cause of this issue. The login procedure to this business application was very chatty and processed 50 sub requests. For a local user with a low latency this would have been no problem at all. However, due to the high latency from Asia for instance, the login took up to 50 requests x 300ms latency 15 seconds longer.

The moral of the story is never underestimate the performance impact of your network. Make web page design analysis and real user monitoring part of your application delivery pipeline to keep your user community satisfied.


Effective Synthetic Monitoring

Why do we need a synthetic monitoring and how should we integrate it into our software development chain?

Nowadays, downtimes of business applications result in a loss of revenue because a user order cannot be placed and this user may not return in the future. Uptime has now become a major concern of business and IT departments.

Based on my experience, manual monitoring of application availability, accuracy and performance is very time consuming and too expensive. A much better approach is to identify some critical use cases within the affected applications, automate those and execute them regularly at the required business locations.

This so-called synthetic monitoring allows you to identify downtimes prior to they affecting the end user. In addition, performance, accuracy and availability metrics could be permanently collected and used to raise a ticket if certain thresholds were violated.

The flipside of synthetic monitoring is that a change in an application under monitoring could result in false alerts. To avoid such situations you should make synthetic monitoring part of your development chain and test your synthetic scripts also during acceptance test of your business applications.

Ideally, you should make your performance engineering team responsible for the monitoring platform and maintenance of the synthetic scripts. In addition, I recommend selecting one of the synthetic monitoring suites, which would allow you a re-use of existing performance testing scripts.


Risk based Performance Engineering

Performance test effort estimation can be tricky. Especially, in integration projects, when a high number of applications need to be tested and the time line is very limited, a better approach is required.

In one of my past projects, I led a non-functional testing stream and I had to plan performance-testing efforts for more than 70 affected applications. My testing team consisted of 8 experienced performance test experts. However, the go-live date was already in 6th month. Initially, I thought we would never meet the target timeline and we would need a very efficient performance engineering approach.

Due to the high number of affected applications and the limited testing time it was not possible to plan, implement and execute performance tests for all 70 applications. Our limited chance was a risk based performance engineering approach.

In the first place, we created a non-fnunctional-requirement questionnaire, which captured performance-requirements related topics such as response time, expected transaction volume and growth estimates. In addition, this questionnaire contained a formula, which allowed us to calculate the performance risk. We used performance risk category from 1-4, specified performance test activities for each of the categories and decided that only performance tests for applications with performance risk category 1 and 2 would be conducted.

Secondly, we sent this non-functional-requirement questionnaire to our product owner of all affected applications, and set a deadline as to when this had to be returned and also presented the idea behind this questionnaire.

Finally, we received the response from all product managers. Surprisingly, just 10 per cent of our 70 applications had a performance risk category of 1 or 2. We prepared and implemented the required performance tests for these 10 applications and were easily able to meet the tough timeline.

Risk based performance engineering has helped us to focus our efforts on performance critical applications and brought transparency to our performance engineering activities.