Requesting a static file from a web server is a quick, simple way to monitor application health. If we get a successful response (200), we know that the web server is up. But unless your site is composed of static files, ensuring the availability of the web server is only part of the equation. What if your application stopped running? What if your database is unavailable? Are users able to complete their tasks on your site?
To better capture the health of your application, we can build and monitor a more comprehensive health check endpoint. This endpoint, purpose-built for validating the health of the application, can perform deeper checks on application components and return a response to signify the status of the system.
Building a health check endpoint
A simple health check endpoint allows you to answer questions like:
- Is the web server up?
- Is the application running?
- Is the database connected?
Deeper checks to implement as part of a health check endpoint might include:
- Validating the status of important processes or background jobs
- Monitoring the size of job queues to ensure they are at expected levels
Here’s an example of one of our application health checks:
(pretty JSON brought to you by the nifty JSON Formatter Chrome extension)
This example endpoint checks several components of the system. Let’s walk through the JSON response and identify what each piece tells us.
The first part of the response tells us that our application is running. Since the endpoint is served by the application itself (vs. just returning a static file), we know that the app is running because we received a response.
"message": "Application is running",
The second part of the response indicates that our database is connected. If the database server goes down, this will display an error and the
success field will be
"message": "Database is connected",
The final two sections of the response deal with background workers and job queues. The information in these sections indicates the health of the worker processes and their respective job queues.
"message": "Background worker A is connected and not backed up",
"message": "Background worker B is connected but IS backed up",
"success": false // unhealthy!
We can see that background worker A is connected and its job queue is healthy. Background worker B, on the other hand, is dealing with an unhealthy queue. We know that it is connected, so the backed up queue is likely due to an increased volume of incoming jobs.
These checks give us deeper insight into the health of the system. Are emails backed up? Are we generating reports? With these checks in place, we can easily answer those questions.
Monitoring application health
With our health check endpoint up and running, we can automate the monitoring of application health with an Uptime check.
Create a new HTTP check
Since our health check is a single URL, we can use a single HTTP check to monitor it.
We can choose to monitor up to every minute, and we can configure custom notification settings to get alerts where we want them.
In our example endpoint, we set
false if a health check fails. To ensure all health checks pass, we can ensure that such text is absence. In Rigor parlance, we’ll verify “Absence of Text”:
With our Rigor check running, we can rest assured that our application is getting routine check-ups. If a health check fails, we’ll receive an alert (via Slack and OpsGenie) notifying us of the issue.
For those using Rails, here are a couple of gems we’ve encountered that aim to simplify the process of standing up a health check endpoint:
Are you using health check endpoints to monitor your applications? What other checks do you perform in your endpoints? Let us know in the comments or on Twitter @RigorEng.
Interested in monitoring the health of your application?
E-commerce revenue continues to grow,as consumers turn away from shopping in brick-and-mortar stores to shopping online. However, many businesses are not prepared for this growth because they do not fully understand the market and how to invest in...Read More
Because of the multifarious nature of web clients today, it’s important to consider the usage statistics when designing, implementing, and managing your site. However, misconceptions often arise when determining what browsers to design...Read More
Google Webmaster Tools is a web service that allows webmasters to view the status of their sites as seen by Google and the Googlebot crawlers. In addition to indexing your site structure and content, the Googlebot crawlers also record data on perform...Read More
Web designers and developers are always looking for ways to speed up their page load times. Yahoo has an excellent article written on the best practices for speeding up your page. How do you know if implementing one of their suggested practices will...Read More