Archive for category Performance

A developer’s guide to load testing

Found this interesting presentation made by Simon Brown about performance testing. Probably nothing new to senior performance testers, but it might help explaining to others what we do, like a Performance Test for Dummies ;-)

A developer’s guide to load testing (PDF Version)

Via codingthearchitecture.com

, , , , , , , , ,

No Comments

How MySpace Tested Their Live Site with 1 Million Concurrent Users

In December of 2009 MySpace launched a new wave of streaming music video offerings in New Zealand, building on the previous success of MySpace music.  These new features included the ability to watch music videos, search for artist’s videos, create lists of favorites, and more. The anticipated load increase from a feature like this on a popular site like MySpace is huge, and they wanted to test these features before making them live.

If you manage the infrastructure that sits behind a high traffic application you don’t want any surprises.  You want to understand your breaking points, define your capacity thresholds, and know how to react when those thresholds are exceeded.  Testing the production infrastructure with actual anticipated load levels is the only way to understand how things will behave when peak traffic arrives.

For MySpace, the goal was to test an additional 1 million concurrent users on their live site stressing the new video features.  The key word here is ‘concurrent’.  Not over the course of an hour or day… 1 million users concurrently active on the site. It should be noted that 1 million virtual users are only a portion of what MySpace typically has on the site during its peaks.  They wanted to supplement the live traffic with test traffic to get an idea of the overall performance impact of the new launch on the entire infrastructure.  This requires a massive amount of load generation capability, which is where cloud computing comes into play. To do this testing, MySpace worked with SOASTA to use the cloud as a load generation platform.

Here are the details of the load that was generated during testing.  All numbers relate to the test traffic from virtual users and do not include the metrics for live users:

  • 1 million concurrent virtual users
  • Test cases split between searching for and watching music videos, rating videos, adding videos to favorites, and viewing artist’s channel pages
  • Transfer rate of 16 gigabits per second
  • 6 terabytes of data transferred per hour
  • Over 77,000 hits per second, not including live traffic
  • 800 Amazon EC2 large instances used to generate load (3200 cloud computing cores)

Test Environment Architecture

SOASTA CloudTest™  manages calling out to cloud providers, in this case Amazon, and provisioning the servers for testing.  The process for grabbing 800 EC2 instances took less than 20 minutes.  Calls we made to the Amazon EC2 API and requests servers in chunks of 25.  In this case, the team was requesting EC2 Large instances with the following specs to act as load generators and results collectors:

  • 7.5 GB memory
  • 4 EC2 Compute Units (2 virtual CPU cores with 2 EC2 Compute Units each)
  • 850 GB instance storage (2×420 GB plus 10 GB root partition)
  • 64-bit platform
  • Fedora Core 8
  • In addition, there were 2 EC2 Extra-Large instances to act as the test controller instance and the results database with the following specs:
  • 15 GB memory
  • 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
  • 1,690 GB instance storage (4×420 GB plus 10 GB root partition)
  • 64-bit platform
  • Fedora Core 8
  • PostgreSQL Database

Once it has all of the servers that it needs for testing it begins doing health checks on them to ensure that they are responding and stable.  As it finds dead servers it discards them and requests additional servers to fill in the gaps.  Provisioning the infrastructure was relatively easy.  The diagram (figure 1.) below shows how the test cloud on EC2 was set up to push massive amounts of load into MySpace’s datacenters.

While the test is running, batches of load generators report their performance test metrics back to a single analytics service.  Each of the analytics services connect

to the PostgreSQL database to store the performance data in an aggregated repository.  This is part of the way that tests of this magnitude can scale to generate and store so much data – by limiting access to the database to only the metrics aggregators and scaling out horizontally.

Challenges

Because scale tends to break everything, there were a number of challenges encountered throughout the testing exercise.

The test was limited to using 800 EC2 instances

SOASTA is one of the largest consumers of cloud computing resources, routinely using hundreds of servers at a time across multiple cloud providers to conduct these massive load tests.  At the time of testing, the team was requesting the maximum number of EC2 instances that it could provision.  The limitation in available hardware meant that each server needed to simulate a relatively large number of users.  Each load generator was simulating between 1,300 and 1,500 users.  This level of load was about 3x what a typical CloudTest™ load generator would drive, and it put new levels of stress on the product that took some creative work by the engineering teams to solve.  Some of the tactics used to alleviate the strain on the load generators included:

  • Staggering every virtual user’s requests so that the hits per load generator were not all firing at once
  • Paring down the data being collected to only include what was necessary for performance analysis

A large portion of MySpace assets are served from Akamai, and the testing repeatedly maxed out the service capability of parts of the Akamai infrastructure

CDN’s typically serve content to site visitors based on their geographic location from a point of presence closest to them.  If you generate all of the test traffic from, say, Amazon’s East coast availability zone, then you are likely going to be hitting only one Akamai point of presence.

Under load, the test was generating a significant amount of data transfer and connection traffic towards a handful of Akamai datacenters.  This equated to more load on those datacenters than what would probably be generated during typical peaks, but that would not necessarily be unrealistic given that this feature launch was happening for New Zealand traffic only.  This stress resulted in new connections being broken or refused by Akamai at certain load levels, and generating lots of errors in the test.

This is a common hurdle that needs to be overcome when generating load against production sites.  Large-scale production tests need to be designed to take this into account and accurately stress entire production ecosystems.  This means generating load from multiple geographic locations so that the traffic is spread out over multiple datacenters. Ultimately, understanding the capacity of geographic POPs was a valuable takeaway from the test.

Because of the impact of the additional load, MySpace had to reposition some of their servers on-the-fly to support the features being tested

During testing the additional virtual user traffic was stressing some of the MySpace infrastructure pretty heavily.  MySpace’s operations team was able to grab underutilized servers from other functional clusters and use them to add capacity to the video site cluster in a matter of minutes.

Probably the most amazing thing about this is that MySpace was able to actually do it.  They were able to monitor capacity in real time across the whole infrastructure and elastically shrink and expand where needed.  People talk about elastic scalability all of the time and it’s a beautiful thing to see in practice.

Lessons Learned

  1. For high traffic websites, testing in production is the only way to get an accurate picture of capacity and performance.  For large application infrastructures there are far too many ‘invisible walls’ that can show up if you only test in a lab and then try to extrapolate.
  2. Elastic scalability is becoming an increasingly important part of application architectures.  Applications should be built so that critical business processes can be independently monitored and scaled.  Being able to add capacity relatively quickly is going to be a key architecture theme in the coming year and the big players have known this for a long time.  Facebook, Ebay, Intuit, and many other big web names have evangelized this design principle.  Keeping things loosely coupled has a whole slew of benefits that have been advertised before, but capacity and performance are quickly moving to the front of that list.
  3. Real-time monitoring is critical.  In order to react to capacity or performance problems, you need real-time monitoring in place.  This monitoring should tie in to your key business processes and functional areas, and needs to be as real time as possible.

Via highscalability.com

, , , , , , , , , , ,

1 Comment

A Formal Performance Tuning Methodology: Wait-Based Tuning

Performance tuning enterprise Java applications can be an arduous and sometimes unfruitful task because of both the complexity of modern applications as well as a lack of formal tuning methodologies. Modern enterprise applications differ significantly from their counterparts as recent as a decade ago in that they support multiple inputs, multiple outputs, and complex frameworks and business processing engines. Ten years ago, web-based enterprise applications could expect input from a web browser, backend processing through interactions with a database or a legacy system, and output back out to a web browser (HTML). Today, input can come in the form of an HTML browser, a thick client, a mobile device, or a web service, which can pass through servlets running in one of a dozen different architectures or a portal container, that in turn may call enterprise beans, external web services, or delegate processing to a business rules engine. Each of these components may then interact with a content management system, a caching layer, a plethora of databases, and legacy systems. The output is then usually contained in a presentation independent form that is then translated to HTML, XML, WML, or any other format that client applications expect. Modern applications have more moving parts and more “black boxes” than in the past, which presents significant performance tuning challenges.

In addition to this increase in complexity, performance tuning is still more “art” than “science” with most performance tuning guides focusing on performance metrics that are sometimes cryptic and may or may not impact the end user experience. This article attempts to transition the process of performance tuning into the realm of “science” by presenting a repeatable process that focuses on the end user experience by analyzing an application’s architecture in terms of “wait-points”, or portions of an application that can cause a request to wait. In short, Wait-Based Tuning allows performance engineers to quickly realize measurable performance gains by optimizing the end-user experience.

Performance Tuning Process

Before reviewing the details of Wait-Based Tuning and Wait-Point Analysis, this section presents an overview, or roadmap, of the process of effective performance tuning. Performance tuning can be summarized simply in four steps:

  1. Load Test
  2. Container Tuning
  3. Application Tuning
  4. Iterate

As with most of computer science, performance tuning is an iterative process. It begins by constructing a proper load test, which contains both balanced and representative service requests, that is met by a container tuning exercise. As containers are tuned, application bottlenecks will emerge, resulting from the increased load. As application bottlenecks are identified and resolved, the application will behave differently, which will require the container to be retuned. This process of alternating between container and application tuning can be repeated until performance is acceptable (or until the project has run out of time and needs to be released.)

Load Testing Methodology

A prerequisite to being able start a performance tuning exercise is the construction of a proper load test suite. A load test must address the following two points:

  • The load must be representative of what end users are doing (or expected to do)
  • The load must be balanced in the same proportion to mimic end user behavior

That is to say that the load must reproduce end user actions in the same proportion that end users are performing them. To illustrate the importance of balancing end user actions, consider the following scenario: in an insurance claims department, employees exhibit the following behavior:

  1. Users login at 8am
  2. On average they process five claims in the morning
  3. About 80% of users forget to logoff before leaving for lunch and hence their sessions expire
  4. After lunch, users re-login into the application
  5. They process an average of five claims in the afternoon
  6. They generate two reports before leaving
  7. 80% of the users logout from the system before going home

This example is probably an over simplification of a real-world application, but it suffices to establish a balance between service requests. This scenario presents the following balance: two logins, ten claims, two reports, and one logout.

What would happen if the load generator distributed load equally among the different service requests? In such a scenario, the user login and logout functionality would receive the same amount of load as the claim processing functionality. Considering an expected load of 1000 simultaneous users, the login functionality might quickly fall apart and cause the organization to invest money to build out a login component that can handle load that it will never receive. Worse yet, tuning efforts focused on tuning the login functionality, which presented the greatest bottleneck in this scenario, but to the expense of missing the claim processing functionality. In short, an unbalanced load can result in tuning portions of an application to support load that they will never receive while not tuning other portions of an application to support load that they will receive!

Determining what load is balanced and representative for an application is different when examining an existing application (or a new version of an existing application) than when building a new application.

Existing Applications

An existing application presents a distinct advantage over its new application counterparts: real user behaviors can be observed in a production environment. Depending on the nature of requests and how they are identified by an application, there are two options to identify end user behavior:

  • Access Logs
  • End User Experience Monitor

For most web-based applications, access logs provide enough insight to facilitate the discovery of the nature of service requests as well as their relative balance. Web servers can be configured to capture end user request information and store it in a log file referred to as an “Access Log” (because the file is typically named “access.log”.) The key to being able to use an access log to identify user behavior is that application interactions need to be distinguishable by their URIs. For example, if the actions in the previous example equated to URIs like “/login.do”, “/processClaim.do”, and “/logout.do”, then it would be very simple to find those in the access log file to determine their relative balance. Furthermore, sorting an access log file by the most frequent URIs would quickly identify the top “n” percent of requests – where “n” should be around 80%.

Access logs are text files that can be examined manually (not a very fruitful task), can be programmatically parsed, or can be analyzed by a tool. There are many commercial solutions, but Quest Software has a product called Funnel Web Analyzer that was retired some years ago, but due to popular demand, they renewed the product as Freeware. Funnel Web Analyzer can analyze most access log files and present the information required to construct proper load tests.

Some applications are not quite as simple and user interactions cannot be easily identified by a simple URI. For example, consider an application that has a single front-controller servlet that accepts an XML payload – and the business logic to process is contained inside the payload. In such a scenario, another tool is needed to inspect that payload to determine the business case being satisfied. This could potentially be built manually using a servlet filter or could require a hardware device known as an end user experience monitor.

Irrespective of how user behavior is obtained, it is a core prerequisite before starting any performance tuning exercise.

New Applications

New applications present a unique challenge because there are not any end user behaviors to analyze. There are three steps to consider when identifying user behaviors in a new application, as illustrated in Figure 1 .

Figure 1 Estimating End User Behavior for a New Application

The first step is to estimate what end users are expected to do. This step is a formal way of saying “take a guess,” but an educated guess. The estimation should come from a discussion between two parties: the application business owner and the application technical owner. The application business owner, which is typically a product manager, is responsible for detailing how the end user is expected to use the application – for example, he might report that the end user is expected to login, process five claims, timeout, process five more claims, generate two reports, and then logout. The application technical owner, which might be the architect or technical lead, is responsible for translating this abstract list of business interactions to technical steps needed to generate the load test – for example, he might report that login is accomplished through the “/login.do” URI and there are five URIs that comprise the steps in processing a claim. Together, these individuals (or groups or committees in some large projects) should provide enough information to construct a baseline load test.

After the load test has been created and used to tune the application and containers and the application has been deployed to a production environment, the tuning work is not complete. The next step in this load testing methodology is to validate the load test suite. This is typically a multi-stage activity:

  • Smoke test validation: validate the estimations against live production end user behavior in the first week or two of operations. This validation step is required to ensure that there were not any gross errors made during the estimation process.
  • Production Validation: some applications require time before users fall into a consistent pattern of usage. This amount of time is application dependent and may take a month or a quarter, but it is important to validate end user behavior against estimations once users settle into using the application.
  • Regression Validation: it is a best practice to validate user behavior periodically throughout an application’s production lifecycle in case user behavior changes or new features or new workflows are introduced that change end user behavior.

The final step, which is typically overlooked, is reflection. It is important to reflect upon the accuracy of estimations against actual end user behavior, because it is only through reflection that user behaviors become better understood and estimations improve in subsequent applications. Without reflection, the same mistakes will be made time after time, which will increase the amount of tuning work in the end.

Wait-Based Tuning

With a load test in hand, it is time to determine where tuning efforts are best spent. Most tuning guides are concerned with “performance ratios” or the relationships between metrics. For example, a tuning guide might emphasize that a cache hit ratio should be 80% or higher, so load test the application while adjusting the cache size until the hit ratio is at 80%. Then move to the next metric in the list, while constantly validating that tuning the new metric does not invalidate the tuning of the previous metrics.

Not only is this is difficult task, but it can also be highly unfruitful. For example, tuning the cache hit ratio to 80% might be a good thing, but there are more important questions such as:

  • How dependent is the application on the cache (what percentage of requests interact with the cache)?
  • How important are these requests with respect to the other requests in the application?
  • What is the nature of the items being cached? Should they be cached at all?

Wait-based tuning promotes the concept of analyzing the business interactions of an application, the underlying architecture that implements those business interactions, and optimizing the processing of those business interactions. The first step is to analyze the architecture of an application to identify the technologies that are employed in satisfying requests. Each employed technology may present a “wait-point”, or a location in the application in which a request may have to wait for something before it can continue processing. For example, if a request performs a database query then it must obtain a database connection from a connection pool – if the connection pool does not have an available connection then the request must wait for a connection to become available. Likewise, if the request invokes a web service, that web service will have a request queue (with an associated thread pool that processing incoming requests) that can potentially cause the request to wait before a thread becomes available.

From this analysis, referred to as Wait-Point Analysis, two categories of wait-points can be identified:

  • Tier-based wait points
  • Technology-based wait points

This section begins by reviewing Wait-Point Architectural Analysis and then surveys the various types of wait-points.

Wait-Point Architectural Analysis

The most important take away from this discussion is that performance tuning must be performed in the context of the architecture of the application being tuned. This is one reason why tuning performance ratios can be so ineffective: tuning an arbitrary performance metric to a best practice setting may or may not be good for the application being tuned – and may or may not positively affect the end user experience.

Wait-Point Analysis is the process of dissecting the major request processing paths through an application in order to identify resources that can potentially cause a request to wait. The most effective strategy to employ in a wait-point analysis exercise is to identify the core processing paths in the application and white board those paths. Include all tiers that a request may pass between, all external services that the request may interact with, all objects that are pooled, and all objects that are cached.

Tier-Based Wait Points

Any time a request passes across a physical tier, such as between a web tier and a business tier, or makes a call to an external server, such as when invoking a web service, there is an implicit wait-point associated with that transition. Consider that servers would not be effective if they were to only service a single request at a time, so they implement a multi-threading strategy. Typically a server listens on a socket for incoming requests – when it receives a request then it quickly places that request in a request queue and returns to listening for additional incoming requests. The request queue then has an associated thread pool that removes the request from the queue and processes it. Figure 2 illustrates how this process is performed with three tiers: a web server, a dynamic web tier, and a business tier.

Figure 2 Tier-Based Wait Points

Because the action of a request passing across a tier involves a request queue, which is serviced by an associated thread pool, the thread pool presents a potentially significant wait-point. The size of each thread pool must be tuned with the following considerations:

  • The pool must be large enough so that incoming requests do not need to wait unnecessarily for a thread
  • The pool must not be so large that it saturates the server. Too many threads will cause the server to spend more time switching between the individual thread contexts and less time processing requests. This is typified by a high CPU utilization and a decrease in request throughput
  • The pool should be optimally sized so as not to saturate any backend resources that it interacts with. For example, if a database can only support 50 requests from an individual server then that server should not send more than 50 requests to the database.

The optimal size for a server thread pool is the number of threads that generate sufficient load on its limiting dependencies – to maximize their usage, but without causing them to saturate. See the “Tuning Backwards” section below for more on sizing limiting dependency pools.

Technology-Based Wait Points

While tier-based wait points are concerned with moving a request between servers, technology-based wait points are concerned with moving a request efficiently through the inner workings of an individual server. Tier-based tuning, which is somewhat analogous to IBM’s Queue Tuning, is an effective first step in tuning an application, but neglecting to tune the inner workings of an application server can have huge ramifications on the performance of an application. This is analogous to tuning JDBC connection pools to send the most optimal amount of load to the database, but neglecting to review the SQL being executed – if the query is joining ten tables each with a million records then the optimal load may be two connections but if the query is optimized then the database may be able to support two hundred connections.

Looking inside an application server and the potential technologies that an application can utilize yields the following common technology-based wait points:

  • Pooled objects (such as Stateless Session Beans or any business objects that the application pools)
  • Caching infrastructure
  • Persistent storage or external dependency pools
  • Messaging infrastructure
  • Garbage collection

In most cases, Stateless Session Bean pool sizes are optimized by the application server and do not present a significant wait-point, unless the pool size has been manually configured improperly. But there are objects that are pooled in applications that must be manually sized – and these can present valid wait-points. Consider that when an application needs a pooled resource, it must obtain an instance of that resource from the pool, use it, and then return it to the pool. If the pool is sized too small and all object instances are in use then a request will be forced to wait for an object to become available. Waiting for a pooled resource increases response time (obviously), but can cause a significant performance degradation if more and more requests continue to backup waiting on the pooled resource. If, on the other hand, the pool is sized too large then it may consume too much memory and negatively affect the performance of the JVM as a whole. It is a tradeoff and the optimal size for these pools can only be determined after a thorough analysis of the pool’s usage.

Pooled objects are stateless, meaning that it does not matter which instance the application obtains from the pool – any instance will suffice. Cached objects, on the other hand, are stateful, meaning that when the application requests an object from the cache, it needs a specific instance. A very crude analogy that illustrates this difference is this: consider two common activities that occur in many people’s day: shopping at a supermarket and then picking up one’s child from school. At the supermarket, any cashier can check out any customer, it does not matter which cashier a customer selects, any cashier will suffice. Therefore cashiers would be pooled. But when picking up a child from school, a parent wants his or her child, another child will not suffice. Therefore children would be cached.

With that said, caches present a unique tuning challenge. The purpose of a cache, from a simplistic perspective, is to store objects locally in memory and make them readily available to the application rather than obtaining them on demand. A properly sized cache can provide a significant performance improvement over making a remote call to load an object. An improperly sized cache, however, can create a significant performance hindrance. Because caches hold stateful objects, it is important for the cache to maintain the most frequently accessed objects in the cache and provide enough additional space in the cache for infrequently accessed objects to pass through. Consider the behavior of a request that accesses a cache that is sized too small:

  1. The request checks the cache for an object, but it is not in the cache
  2. The request then needs to query the external resource for the object’s data and build an object from that data
  3. Because caches are typically meant to maintain the most recently accessed data, the new item needs to be added to the cache (it is being accessed now)
  4. But if the cache is full, an object must be selected from the cache to be removed using an algorithm like the “least recently used” algorithm
  5. If the cached object’s state is not persisted to the external resource then the external resource must be updated before the object is discarded
  6. The new object can now be added to the cache
  7. The new object can finally be returned to the request

This is a cumbersome process and if the majority of requests have to perform each of these steps then the cache will truly hinder performance. The cache must be sized large enough to minimize cache “misses”, where a miss essentially equates to performing each of the seven aforementioned steps, but not so large as to consume too much JVM memory. If the cache needs to be substantially large in order to be effective then it is important to reconsider the nature of the objects being cached and whether they should be cached at all.

Similar to object pools, external resource pools, such as database connection pools, must be sized large enough so that requests are not forced to wait for a connection to become available in the pool, but not so large that the application saturates the external resource. The “Tuning Backwards” section below discusses how to determine the optimal size for these pools, but in the context of this section, be aware that they present another significant wait-point.

Tuning messaging infrastructures is well beyond the scope of this article, with implementations varying significantly between major products like MSMQ, MQSeries, TIBCO, and so forth, but be aware that, if an application is interacting with a messaging server, it must be properly tuned or it too can present a wait-point.

The final wait-point that can significantly impact the performance of a JVM is garbage collection. It does not fit nicely into the wait-point analysis process described in this article (examining a request with the intention of identifying technologies that can cause a request to wait), but because it can have such a profound impact on performance, it is listed here. Different JVM implementations and different garbage collection strategies affect how garbage collection is performed, but in many cases, a major garbage collection (or a mark-sweep-compact garbage collection) can cause an entire JVM to freeze until the garbage collection is complete. One of the single biggest performance improvements that can be made to a JVM is to optimize its garbage collection behavior. For more information on garbage collection, join the GeekCap discussions on Application Infrastructure Tuning.

Tuning Backwards

Now that all of the tier-based and technology-based wait-points have been called out, the final step is to optimize the configuration of each wait-point. This step is sometimes referred to as “tuning backwards” and is conceptually very simple:

  1. Open all tier-based wait-points and external dependency pools – in other words configure them to allow too much load to pass through the server
  2. Generate balanced and representative service requests against the application
  3. Identify the wait-points that saturate first, which will typically be external dependencies, such as a database
  4. Tighten the configuration of the limiting wait-points to allow enough load to pass to the external dependency without saturating it
  5. Tune all other tier-based wait-points to only send enough load through the server to maximize the limiting wait-points but not cause requests to wait
  6. Allow all other requests to wait at a business logic-lite tier, such as at the web server

The principle in place here is that the application should only send the amount of load to its external dependencies to maximize their usage without causing saturation – and all other wait-points should be configured to only pass enough load to maximize the limiting wait-points. For example, if a database becomes saturated by 50 connections from each application server then the database connection pool should be configured to send less than 50 requests to the database (for example, configure the pool to contain 40 or 45 connections.) Next, if 80 threads generate 40 database connections, then the thread pool for the application should be configured to 80. Finally, the web server should not send more than 80 requests to each application server at any given time.

All technology wait-points, such as object pools, caches, and garbage collection, should be tuned to maximize the throughput of a request so that it can pass through a server, or between tier-based wait points, as quickly as possible.

Summary

Performance tuning was once more “art” than “science”, but after a combination of abstract analysis and trial-and-error, wait-based tuning has proven to make the exercise far more scientific and far more effective. Wait-based tuning begins by performing a wait-point analysis of an application’s architecture in order to identify technologies employed by the architecture that can potentially cause a request to wait. Wait-points come in two flavors: tier-based wait-points, which are indicative of any transition between application tiers, and technology-based wait-points, which are technology features such as caches, pools, and messaging infrastructures that can improve or hinder performance. With a set of wait-points identified, the tuning process is implemented by opening all tier-based wait-points and external dependency pools, generating balanced and representative load against the application, and tuning backwards, or tightening wait-points to maximize the performance of a request’s weakest link, but without saturating it.

Wait-based tuning has proven itself time and time again in real-world production environments to not only be effective, but to allow a performance engineer to realize measurable performance improvements very quickly.

By Steven Haines (Via InfoQ)

, , , , , , ,

No Comments

Performance Anti-Patterns in Database-Driven Applications

Nearly every modern application relies on databases for data persistence. The database access layer is very often responsible for serious performance problems. In the case of database problems most people start searching in the database itself. Appropriate indexes and database structures are vital for achieving adequate performance. Often, however, the application layer is responsible for poor performance or scalability problems.

The application layer controls and drives database access. Problems at this layer cannot be compensated in the database itself. Therefore the design of adequate data-access logic is vital for achieving performance and scalability. While there a nearly endless different use cases for database-driven applications, the problems can be nailed down to a small set of anti-patterns. Analyzing whether your application implements the following anti-patterns and resolving them will help to easily implement faster and more scalable software with minimal additional effort.

Misuse of O/R Mappers

O/R mappers have become a central part in modern database applications. O/R mappers take away the burden of translating and accessing relational data from object-oriented software. They hide great parts of the complexity of data access from the application programmer. This results in higher productivity as the developer can concentrate on the actual application logic rather than infrastructural details. Complex data graphs can be easily navigated at the object-relational layer without seeing what is going on under the hood. This often creates the wrong impression that these frameworks take away the burden of designing data-access logic.

Often developers think that their data-access framework will simply do things right; however, using O/R mapping frameworks without understanding their inner workings in many cases results in poor application performance. There are two central misunderstandings that cause these problems – loading behavior and load time.

O/R mappers load data on a per-object base. This means when an object is requested or accessed the necessary SQL statements are created and executed. This principle is very generic and at first sight works well in most situations. At the same time it is very often the source of performance and scalability problems.

Let’s take a simple example. In a database for storing address information, we have one table for persons and one for addresses. If we want to get the name for each person and the city they live in, we have to iterate over the persons and then access the address information. The image below shows the result if the out-of-the box query mechanisms are used. This simple use case results in a high number of database queries.

This directly brings up the second important detail of O/R mappers – load time. O/R mappers – if not told otherwise – try to load data as late as possible. This behaviour is referred to as lazy loading. Lazy loading ensures that data is loaded as late as possible with the goal to perform as few database queries as possible and at the same time avoid unnecessary creation of objects. While this approach is generally feasible, it may result in serious performance problems and so-called LazyLoadingExceptions on accessing data that has not been loaded when no database connection is present.

In situations like the one described above data loading and at the same time performance can be significantly improved by using specialized data queries.

So while O/R mappers can be of great help in the development of data access they still leave the burden of designing proper data access logic. Dynamic architecture validation with tools such as dynaTrace can be of great help here to identify performance weak points in the application and proactively resolve them.

Load More Data Then Needed

Another anti-pattern in database access that can often be found is that much more data is loaded that actually needed. There are a number of reasons for this. Rapid Application development tools provide easy ways of linking data structures to user interface controls. As the data layer is built of domain objects, they very often contain much more data than is actually visualized. The example uses the address book scenario again. This time the names of the persons and their home cities are visualized. Instead of just loading these three items both objects – addresses and persons – are loaded. This results in massive overhead at the database, network and application level. The usage of specific queries can help to massively reduce the amount of queried data. This performance improvement however comes along with additional effort for maintenance. Adding a new column to the table might require several changes to the data access layer.

This anti-pattern can also be found very often in case of improperly designed service interfaces. Service interfaces are often designed to be generic enough to support a large number of use cases. This has the advantage that services have small contracts which can be used in a wide variety of use cases. Additionally uses cases change faster than the backend service implementations. This can result in services interfaces being inappropriate for certain scenarios. Developers will then have to use workarounds which might result in highly inefficient data access logic. This problem often arises in data-driven Web Services.

In order to overcome these problems data access patterns should be continuously analyzed during development. In the case of agile development approaches, data access logic should be checked for each finished user story. Additionally data access patterns should also be analyzed across application use cases to understand data access logic to be able to optimize data access logic according during development.

Inadequate Usage of Resources

Databases are a bottleneck for resources in applications, so they should be used as little as possible. Very often too little attention is paid to the usage of database connections. As with any shared resource such connections massively affect overall system performance. Specifically, web applications and applications using O/R mapping frameworks with lazy initialization tend to keep database connections longer than needed. Connections are acquired at the beginning of processing and kept until rendering is finished or no further data access is required. In applications using O/R mappers, they are often kept to avoid nasty lazy initialization problems. By redesigning data access logic and separating it from post-processing (like rendering), the performance and scalability of an application can be dramatically improved.

The chart below shows the response time of ten concurrent data processing threads. In the first part one database connection is used. In the second scenario ten connections are used. In the third scenario two database connections are used but two thirds of the processing is performed after having returned the connection. With better designed data access the third scenario nearly achieves the same performance with a tenth of the resources.

One Bunch of Everything

One Bunch of Everything is an anti-pattern that can generally be observed in development but even more in agile teams. The characteristic of this anti-pattern is that primarily features are developed and all data access is treated equally, as if there would not be any differences. However treating different types of data and queries differently can significantly improve application performance and scalability.

Data should be analyzed regarding its lifetime characteristics. How often does it change or if it is modified or only read. Access frequency of data, together with access patterns, provides hints on potential sources for caching. Access frequency also provides hints as to where optimizations make the most sense. This avoids premature and unnecessary optimization and guarantees the highest impact of performance tuning.

Analyzing usage patterns of data also helps to tune the data access layer. Understanding which data is really used helps to optimize loading strategies. Understanding how users browse search results, for example, helps to optimize fetch sizes. Knowing whether users look at order details helps to select lazy or eager loading for order positions.

In addition to data, queries should also be analyzed and categorized. Important factors are query duration, execution frequency and whether they are used in an interactive user context or batch-processing scenario. Transactional characteristics further help to fine tune isolation levels of queries.

Running short-running interactive queries of users and long-running reporting queries on the same connection for example may easily result in bad end user experience. Long-running reporting queries can greedily acquire database connections leaving end-user queries starving. Using different connection pools for different query types results in much more predicable end user performance. Softening isolation level on database queries where they are not required can also lead to significantly improved performance and scalability.

Bad Testing

Finally, missing or improper testing is one of the major reasons for performance and scalability problems in database-accessing applications. I recently gave a talk on this topic and asked the audience whether they see database access as a performance problem in their applications. While all of them agreed, nobody had testing procedures in place to test data access performance. So while it seems to be an important topic, people do not seem to invest in it.

However, even if testing procedures are in place, this does not necessarily mean that testing is done correctly. Although a lot of problems in data access logic can be found right after the code has been developed, testing is performed much later in the load testing phase. This introduces unnecessarily high costs as changes are performed late in the lifecycle possibly requiring architectural changes leading to additional development and testing efforts.

Furthermore test cases have to be designed to test real world data access scenarios. Data access has to be tested in a concurrent mode and using different access types. Only combined read/write access helps to identify locking and concurrency problems. Additionally adequate variation of input data is required to avoid unrealistically frequent cache hits.

Very often people also do not know for which load to test as they have no adequate information on expected load. In my experience this is very often the case – unfortunately. This, however, is not an excuse for not defining load and performance criteria. It is still better to have some criteria defined instead of not defining them at all.

In case you really have no clue on performance characteristics the best approach is to use load testing criteria with increasing load until the saturation point of the application is reached. Then you have identified the peak load of the application. If this sounds reasonable and realistic you are on a good way. Otherwise you know where you have to improve performance. In most cases initial tests show that application can cope with much less load as expected.

Conclusion

Database access is one of the most critical areas impacting performance and scalability in modern applications. While frameworks support in building data access logic, a serious amount of thought still has to be put into the design of data access logic to avoid pitfalls and problems. The key is to understand the details of the dynamics and characteristics of an application’s data-access layer.

Via InfoQ

, , , ,

1 Comment

How to Change the TimeOut on LoadRunner

I’m seeing a lot of searches for Timeout issues landing on the blog these days.

This is a very common issue when executing a scenario, meaning basically that the server has not responded in a specified amount of time. LoadRunner defaults to 120 seconds on all Web based protocols (HTTP, WebServices, Click & Script), but this can be easily changed using a command on the begging of the script, web_set_timeout.

The command have only two parameters, the operation and the new value. The operation can be one of these three:

  • CONNECT: To establish the connection to the Web server.
  • RECEIVE: Time–out on the next “portion” of server response to arrive.
  • STEP: Time–out on each VuGen step.

Usually the one we see expiring the most is STEP, for obvious reasons. The error message should look something like “Step Download Timeout”.

The second parameter is the new value, expressed in seconds. So if we want to set up a new value for STEP we have to insert this code in the beginning of our action:

web_set_timeout("STEP","240");

Being 240 seconds our new value.

Usually I change the timeout value of all three operations, just to be sure:

web_set_timeout("STEP","240");
web_set_timeout("CONNECT","240");
web_set_timeout("RECEIVE","240");

Simple, isn’t it??

We just have to be careful when changing this configuration, because the default value is already too high for most user actions.

From my point of view, this should only be used in two cases:

  • When we really expect a transaction to be slow, like a large report or file upload, something that the users already expect to be slow;
  • Also when we need to troubleshoot a slow transaction, meaning, waiting for a longer period to get the response.

That’s it!!

, , , , , , ,

No Comments