At Rigor, we know that performance is very important to our users. As a monitoring company, we build tools that are very good at tracking site reliability and analyzing web performance trends. However, like the users we serve, we are always striving to build faster web applications and to optimize our user experience. This can be challenging because our focus on optimizing performance is often overshadowed by new feature requests and other resource constraints.
Historically we have solely used Rigor’s monitoring data to pinpoint performance issues, but with the acquisition of Zoompf we now have another tool to improve both our code and the experience for our users. While we are still working on adding new features (and have some really cool stuff in the pipeline), we are also committed to continual improvement of our existing features.
In our most recent efforts to optimize performance, we decided to start by tackling our login page. Although this is a simple page, we wanted to see what sort of performance gains we could achieve using the information from Zoompf’s scans.
In order to track our progress, the first step was to set up one of Rigor’s Real Browser checks to track and trend page performance. The next step was to set up a recurring analysis of the page within Zoompf. The initial Zoompf scan found 40 defects and rated the site as “fair.” Luckily no defects were critical, but there was certainly room for improvement. Zoompf made it easy to prioritize the work by categorizing the defects by severity, role, and ease of fix. Since we have a small team, we decided to prioritize by severity and ease of fix. In a larger organization it might make sense to refer to Zoompf’s categorization by role.
We chose to tackle the following defects in this initial round of fixes:
- CDN – Zoompf recommended we utilize a CDN to deliver our assets. This had the potential to benefit us in two ways: our assets would be physically closer to clients, reducing the round trip time and reducing the load on our web servers.
We decided to go with Amazon’s CloudFront service, as it fits well with our current infrastructure. Setup was easy from both the Rails and the CloudFront side, but we needed to provide CORS headers to properly load all of the assets. The configuration was relatively straightforward, but figuring out how to inject the configuration into our deployment process was a bit more difficult. Our development process was to SSH into our staging environment and modify the NGINX configuration by hand. After being stumped by unexpected behavior with some “if” statements, we discovered that the “if” statement does not behave as we expected (see the “If is Evil” blog post). Once we had a working configuration, we modified the Chef template to include our settings and allow us to modify some of the configurations through a custom JSON. To measure our progress, we added a Real Browser check that monitors our app from all of our locations.
The above scatterplot shows the performance history of a login test we scripted in Rigor. This test loads our login page, enters test credentials, clicks sign-in, and waits for our application homepage to load. We segmented the graph by location with each colored circle representing a test run from a specific Rigor testing node. The large blue vertical line in the middle annotates the time/date of our deployment of CloudFront clearly delineating performance before and after we deployed CloudFront CDN. On the left of the vertical line we can see that our test from Sydney Australia, signified by the dark blue circles, performed consistently worse than tests from Northern Virginia, signified by the light blue circles. This makes sense because our app web servers are located in Northern Virginia.
On the right, we can see that after we deployed the changes to the CDN, the performance in general is much better, but the performance gains are most pronounced with the load times of tests run from locations farther away from our web servers. As a result, all the test results after deployment are grouped more tightly together.
Overall we greatly improved the performance of the page by adding a CDN. The average login load time from all locations over 24 hours dropped by more than a second. However, if we analyze the performance data for international locations we see a much more pronounced improvement. Using the Sydney location called out above, we see the average login load time decrease from over 11 seconds before the use of a CDN to around 8 seconds after deployment.
- Cache headers – Zoompf recommended that we add far-future cache headers to all of our assets. This allows the browser to cache the resources and avoid unnecessary network requests. We append the timestamp of the last change for the file, so if the file changes the browser will request the new version of the file. This does not speed up the initial loading of a page, but will speed up subsequent loading of pages that share the same code. Luckily, adding the headers is easy, and we lumped this change in with our CORS changes for faster deployment. To measure and track this process using Rigor, we added a Real Browser Check that opens a URL and then goes to the URL again. With no caching both checks should take the same time, but with caching the second request should be faster.
The above two waterfall charts are created from a Real Browser check that loads our login page and then upon completion reloads the same login page. The first screenshot is of the initial load where the browser has to load everything. On the second load, we can identify the impact of caching on page load.
We can see that in the initial waterfall chart the browser had to load 16 requests totaling roughly 584 kb of content which took roughly 1.7 seconds. In the second waterfall we can see that the browser fetches significantly less content. In fact, the browser only has to make a total of 3 requests totaling 2.5 kb of content which took less than a second to load.
- N+1 query – Once we implemented all of these changes, we were pleased with our gains. After the initial login and loading of resources, the assets were loaded from the browser cache making the site noticeably faster. These improvements, however, exposed another problem. When we started looking at some of the waterfalls, we could see that the initial loading of the HTML was the slow point. After some troubleshooting, we discovered that the problem was that we had neglected to eager load a joined resource, resulting in a N+1 query (see section 13). Luckily, this problem only applied to Rigor employees, but identifying this issue saved close to 700 SQL queries on EVERY PAGE LOADED by a Rigor employee and reduced the loading of this resource from over 3 seconds per page to under a second. This is the kind of information that the combination of Rigor and Zoompf can provide when used in concert. We can’t always tell you why the back-end is slow, but by eliminating slowness on the front end we can expose performance bottlenecks on the back-end.
Overall, by implementing the above techniques we saw a marked improvement of the performance and reliability of our login page. Here are the highlights:
- Improved the amount of content we cache by 1.5 mb (99% of total weight)
- Decreased our content size by 116 kb (8%)
- Reduced the number of resources loaded on the page by 8 (42%)
- Improved overall load time from 2.2 seconds to 1.6 seconds (38% decrease)
We have committed as an engineering team to make continual performance improvements to our products. This is an ongoing process, and we are building a culture where performance is an integral part of the development process. In this instance, we focused on a single page in our application, but found that we laid the groundwork to make significant improvements to the whole application.
Combining Rigor’s performance monitoring with Zoompf’s performance analysis made it easy to identify defects and measure the results of our efforts. Performance directly impacts the user experience, so we’re excited to release these enhancements and improve our platform’s performance for our users.