Requesting a static file from a web server is a quick, simple way to monitor application health. If we get a successful response (200), we know that the web server is up. But unless your site is composed of static files, ensuring the availability of the web server is only part of the equation. What if your application stopped running? What if your database is unavailable? Are users able to complete their tasks on your site?

To better capture the health of your application, we can build and monitor a more comprehensive health check endpoint. This endpoint, purpose-built for validating the health of the application, can perform deeper checks on application components and return a response to signify the status of the system.

Building a health check endpoint

A simple health check endpoint allows you to answer questions like:

  • Is the web server up?
  • Is the application running?
  • Is the database connected?

Deeper checks to implement as part of a health check endpoint might include:

  • Validating the status of important processes or background jobs
  • Monitoring the size of job queues to ensure they are at expected levels

Here’s an example of one of our application health checks:

Monitoring application health: Example health check
Example health check response

(pretty JSON brought to you by the nifty JSON Formatter Chrome extension)

This example endpoint checks several components of the system. Let’s walk through the JSON response and identify what each piece tells us.

The first part of the response tells us that our application is running. Since the endpoint is served by the application itself (vs. just returning a static file), we know that the app is running because we received a response.

The second part of the response indicates that our database is connected. If the database server goes down, this will display an error and the success field will be false.

The final two sections of the response deal with background workers and job queues. The information in these sections indicates the health of the worker processes and their respective job queues.

We can see that background worker A is connected and its job queue is healthy. Background worker B, on the other hand, is dealing with an unhealthy queue. We know that it is connected, so the backed up queue is likely due to an increased volume of incoming jobs.

These checks give us deeper insight into the health of the system. Are emails backed up? Are we generating reports? With these checks in place, we can easily answer those questions.

Monitoring application health

With our health check endpoint up and running, we can automate the monitoring of application health with an Uptime check.
Create a new HTTP check
Since our health check is a single URL, we can use a single HTTP check to monitor it.

Monitoring application health: Create a new HTTP check
Create a new HTTP check

Configure the HTTP check
Monitoring application health is a simple as pointing our Rigor check to our new health endpoint.

Monitoring application health: Add check URL
Configure the HTTP check

We can choose to monitor up to every minute, and we can configure custom notification settings to get alerts where we want them.

Add success criteria
By default, Rigor will check for a successful response code (200). To customize this behavior, we can use “Success Criteria” in the advanced tab.

In our example endpoint, we set success to false if a health check fails. To ensure all health checks pass, we can ensure that such text is absence. In Rigor parlance, we’ll verify “Absence of Text”: "success": false.

Monitoring application health: Add success criteria
Add success criteria

 

With our Rigor check running, we can rest assured that our application is getting routine check-ups. If a health check fails, we’ll receive an alert (via Slack and OpsGenie) notifying us of the issue.

For those using Rails, here are a couple of gems we’ve encountered that aim to simplify the process of standing up a health check endpoint:

Are you using health check endpoints to monitor your applications? What other checks do you perform in your endpoints? Let us know in the comments or on Twitter @RigorEng.

Interested in monitoring the health of your application?

Suggested Blog Posts

The Perception Gap for Poor Web Performance

E-commerce revenue continues to grow,as consumers turn away from shopping in brick-and-mortar stores to shopping online.  However, many businesses are not prepared for this growth because they do not fully understand the market and how to invest in...

Read More

Using Browser Trends to Maintain a Better Site

Because of the multifarious nature of web clients today, it’s important to consider the usage statistics when designing, implementing, and managing your site. However, misconceptions often arise when determining what browsers to design...

Read More

Finding Causes of Intermittent Errors from Google Webmaster Tools

Google Webmaster Tools is a web service that allows webmasters to view the status of their sites as seen by Google and the Googlebot crawlers. In addition to indexing your site structure and content, the Googlebot crawlers also record data on perform...

Read More

Optimization Options not Always Optimal

Web designers and developers are always looking for ways to speed up their page load times. Yahoo has an excellent article written on the best practices for speeding up your page. How do you know if implementing one of their suggested practices will...

Read More