Understanding Performance (Part 1)
As previously discussed in Software Quality Attributes, defining and measuring software quality attributes is critical to the success of any distributed application. Performance is no exception. Distributed applications must demonstrate performance in order to assure immediacy. As such, companies must test and measure how fast their solution can respond under a peak load.
Performance is defined as the system's ability to meet latency, throughput and resource utilization requirements. Generally speaking, an application's main objective is to maintain low latency, high throughput and low resource utilization. Let's define these 3 concepts briefly before drilling down to the details.
Latency: Latency is defined as the period of time that one component in a system is waiting for a response from another component. From an end user's perspective, latency can also be defined as the amount of time that an end user has to wait between the time he clicks on a submit button and the time he receives a response. Latency is measured in units of time such as seconds or milliseconds.
Throughput: Throughput is defined as the amount of data that is transferred from one component to another in a specified unit of time. It is typically measured in requests per second, or bytes per second.
Utilization: Utilization refers to the usage level of system resources, such as memory or CPU. Resource utilization is usually measured as a percentage of the maximum available level of the specific resource.
Latency
Generally speaking, latency increases along with resource utilization. At low levels of utilization, latency increases slowly in a linear fashion. However, once a system resource approaches 100% utilization, latency increases dramatically, as illustrated in the next diagram.

A sudden increase in latency during your performance tests usually means at least one of the system resources is utilized at or near maximum capacity. In other words, because a resource is over utilized, components that depend on this resource can't keep up with requests, and therefore can't respond immediately to other components. As a result, the component waiting for a response just sits and waits.
Idle periods are not always caused by system resources. A perfect example is a Web server that runs out of idle threads. Web servers are generally configured to handle a maximum amount of threads. If that configuration is set too low, the server cannot handle the number of concurrent users. As a result, requests get queued up and must wait for a thread to become available. This waiting period evidently increases latency experienced by the end user.
While measuring the overall latency of the system allows you to evaluate the end user's experience, it doesn't help you identify bottlenecks and specific problem areas. That being said, you should carefully measure latency between each component instead, and present both the segmented and aggregated results.

In the diagram above, measuring the latency strictly from the client perspective might tell me that it takes 7 seconds for the client to receive a response. However, it doesn't inform me as to where the majority of this time is spent. Is the high latency caused by the low bandwidth network (network latency), or is the Web server taking more time than expected to process the request (application latency)? How much time does it take the DBMS to respond to a Web server request? Answering these questions will help you identify the real latency blockage, and therefore develop a plan of action to remedy the situation.
Once you've measured the application and network latency within and between each component, and identified the bottlenecks, follow these guidelines:
- Optimize components that create a traffic jam. If your Web server runs out of idle threads, change its configuration to make sure it can handle the maximum number of threads required during a peak load. If the problem area is your DBMS or LDAP server, tune them with the use of stored procedures or indexes.
- Minimize the number of requests whenever possible. Requests in a distributed environment have two negative impacts on latency. First, each request increases the aggregate network latency. Second, requests require resource utilization, which creates bottlenecks if over utilized. Minimize the number of round trips your application requires by combining some of your requests, or caching data that is frequently used and seldom changed.
- Scale your system horizontally or vertically. Increase your system CPU, add memory, or load balance your Web servers. If resource utilization is a problem, throwing more resources at a system will definitely improve its performance, and thereby reduce overall latency.
This article was originally published on www.gantthead.com.

