Web 2.0 Application Performance Testing

Fat client applications have been widely replaced with rich browser-based, so-called Web2.0, applications. Those companies who have not followed this trend have already lost or are very likely to lose important customers in the future.

But, what is the magic behind Web2.0 applications? Just to mention a few, Facebook, Amazon or Zalando are successful examples how Web2.0 applications can lead to absorbed users who continuously return again. Also, Web2.0 applications run on any device are easy to use and provide a first class user experience.

What performance test strategy should be used for Web2.0 applications? Those who have ever conducted performance tests of such a Web2.0 application sometimes failed to identify the root-cause of performance issues. Based on my experience Web2.0 performance testers need to consider three main aspects:

  1. First, web page design analysis according to best practices from Google or Yahoo needs to be considered in early stages
  2. Second, WAN impact tests should be executed to verify if response time requirements could also be held even if network latency is high
  3. Lastly, protocol level virtual user simulation does not provide the real end-to-end response times. Successful Web2.0 performance tests require browser-based user simulation including the correct caching strategy. First time or revisiting user could have a massive impact on your response time measurements.

I recommend making performance testing part of your development process and adapting your Web2.0 testing approach to avoiding reliability issues.

Advertisements

What Wikipedia can’t tell you about Performance Testing in Production

Certainly, a realistic production test setting is not for free. Due to the high costs involved some companies skip real user load scenarios entirely while others simulate such load tests on production stage. In this post, I will share now advantages and also flipsides related to performance testing in production.

Why companies should consider performance testing in production

First of all, they can save money by using production infrastructure for load testing. Especially if applications consist of many components, a realistic testing environment could be a massive investment which won’t be required.

Secondly, they can verify the reliability of their new production stage under realistic conditions. A performance test in this brand new environment also gives them more confidence regarding correct sizing, configuration, and production readiness.

Finally, real production tests require actual production data. A small data set can lead to incorrect load testing results. Therefore, performance tests in production with realistic data will lead to meaningful test results.

Why companies should avoid performance testing in production

Early detection of performance issues is essential. Bottlenecks such as chatty behavior require an application redesign which is time-consuming. If they were too late identified the bugfix would be both, expensive and time intensive.

Another reason against performance test on production is that cleanup of experimental data at production can become a real challenge. Mass load tests will require thousands input data sets which must be removed after the trial run completed.

The simulation of realistic load on production can also have an impact on other business applications and in worst case also on your customers. Users are very sensitive when it comes to response time degradations, and they often never forget slow running applications. In worst cases, the system under test crashes during your performance test.

 

On the one hand risks related to performance tests in productions are quite high, but on the contrary, there are also good reasons why companies should consider this test setting. In my opinion, early detection of bottlenecks or design failures is essential which is contrary to performance testing on production.

Did you find some great strategies for your non-functional testing environment? What are the exciting ideas dealing with performance test stages – and how are you implementing them?

Do you know your Apps Single Point of Failures?

Based on my experience a high availability analysis (HA) will help you to identify weaknesses in your IT services. Conduct it regularly because small changes in your landscape can have an enormous impact on the reliability of your business services. Once you’ve identified a gap between expected and actual availability I highly recommend executing a single point of failure analysis (SPOF).

I will give you now some simple steps which you can use right away followed by a complete HA and SPOF analysis example.

An outage of sub-services will have an impact on the entire system reliability.
Therefore, verify now all your sub-services regarding:

  • expected failures
  • preventive measures
  • business impact of an outage

If there is no preventive measure available, you’ve identified a single point of failure. Try to eliminate any SPOF as soon as possible to avoid critical outages. Good failure detection and redundancy will help you to improve the reliability of your business critical application.

Sample SPOF Analysis

As mentioned above, we use the single point of failure analysis to verify our services in terms of potential errors, mitigations and the business impact of outages.

Our sample application used for this analysis consists of three components. The table below contains the result of this SPOF analysis for our sample application.

spof

We’ve identified a single point of failure because there is no mitigation for a hardware failures available.

I’ve prepared a high availability and single point of failure calculation matrix which you can use right away.

A Guide to verify High Availability Requirements

Our service portfolio is increasing, and frustrated users can nowadays quickly move away to more reliable sites. Besides this commercial aspect, downtimes can also result in serious problems with a regulatory agency. I’ll give you now some simple steps you can use right away to conduct a high availability analysis followed by another post how to identify single point failure risks in your critical services.

A high available system continues it’s business function even if a sub service crashed. Any single point of failure (SPOF) risk is not acceptable in such a system. Therefore you should consider redundancy of critical components and fault tolerance of your core system. Usually, a high availability analysis starts with a comparison of expected and actual availability followed by a single point of failure identification.

Expected Availability

The expected availability is often part of service specifications or contracts with service providers. Alternatively, you can use this formula to calculate it:

Availability = MTBF / (MTBF + MTTR)

Service Time: Hours service daily available for your users

MTTR: Mean time to repair

MTBF: Mean time between failure (Service Time – MTTR)

Let’s assume that our CRM application users are using the application 21.5 hours a day, 20 days a month and the monthly downtime is 4 hours. What is the availability of our CRM application?

Service Time: 25800 minutes per month

MTTR: 240 minutes per month

MTBF: 26800 – 240 = 25560 minutes per month

Availability: 25560 / ( 25560 + 240) = 99.07%

Actual Availability

Our applications consist of many sub-services such as database, middleware, load balancer, network and external service providers. An outage of a load balanced server will eventually not have an impact on the reliability while a crash of a single database will often result in a downtime. The former is called availability in series and the latter is called availability in parallel.

Availability in series:    A = Ax Ay
Availability in parallel: A = 1 (1 – Ax)²

Typically, the actual availability of a given system will be calculated in the following four steps:

  1. Prepare a block diagram of the system
  2. Develop a reliability model of the system
  3. Calculate the availability of sub-services
  4. Determine availability of the entire system

The block diagram will help you to visualize the essential components. Once you’ve created the visual representation of your services, you’ll derive the reliability model by identification of serial and parallel availabilities. Finally, you can calculate the availability of your sub services and the entire system.

Let’s continue with our given sample CRM application which consists of three sub-services and their service time is 21.5 hours. Sub-service 1 and Sub-service 3 have a MTTR of 240 minutes and Sub-service 2 has an MTTR of 480 minutes per month.

What is the actual availability of this CRM application?

Step 1: block diagram

Our CRM application consists of 3 sub services.

blockdiagr

Step 2: Reliability Model

We have no loadbalancer or service redundancy in place – so its a availability in series.

blockdiagr

Step 3: Availability per sub-service

S1: 25560 / ( 25560 + 240) = 99.07%

S2: 25320 / ( 25320 + 480) = 98.14%

S3: 25560 / ( 25560 + 240) = 99.07%

Step 4: System Availability

Availability: 99.07% * 98.14% * 99.07% = 96.32%

The example above demonstrates how to compare expected with actual availability. Keep in mind that there are measures such as proactive error detection and redundancy which improves availability and others such as complexity have an adverse impact on system reliability.

In my next post, I will outline a single point of failure analysis and provide a high availability calculation sheet which you can use right away.

 

 

 

 

 

The Best Ways to Triage Performance Problems

Slow responding Apps can be very nasty and lead to unsatisfied users. Based on my experience performance analysis is no cake walk because applications complexity is increasing. You can easily get lost in a blind alley. I’ll give you three performance triage tips you can use right away.

You’ll see that once you switched to a user-centric performance analysis approach, you will identify annoying bottlenecks and war rooms will become history.

Step 1 – User Experience

Firstly, start your investigation at the last mile. Maybe there are some long-running backend service requests, but those are eventually not relevant. User experience metrics such as conversion rate, abandon rate and user action response times compared to last24 hrs or last7 days will give you a much better picture.

Step 2 – Horizontal Analysis

Secondly, verify the performance compensation of client, network application, and database layer. Are there any spikes? Has the compensation changed over the last24 hrs or last7 days? Once you’ve identified the critical component, proceed with the vertical analysis.

Step 3 – Vertical Analysis

Finlay, you’ve detected the problem-causing stack. Whether it’s a client, network or backend related issue, drill down to the performance hotspot and figure out how you can eliminate this problem. Depending on the nature of this bottleneck, collaborate with your specialized teams to get a second opinion on the tuning recommendations.

Good job! If you’ll follow these steps the chances to tackle critical performance bottlenecks are very high.