×

Yesterday, Amazon Web Services’ (AWS) S3 web-based storage service suffered a several-hours-long outage that led to half of the Internet melting down (with the other half Googling “what is s3”). While AWS isn’t as commonly-known as Amazon’s shopping site, its popularity became evident yesterday when the Internet became difficult for people to use–not only were sites unavailable, many services (like Slack and Trello) were affected and users of IoT devices were left sitting in the dark.

AWS Outage

What should you take away from the Great Amazon S3 Outage of 2017?

Such outages can’t be predicted, but that doesn’t mean you can’t plan for these events–they can, and they will, happen. Such planning is the difference between being proactive instead of reactive when disaster strikes, and one way to ensure you’re on top of things is to implement a thorough monitoring system so that you have the information you need when you need it.

We also recommend you take control of your monitoring, so that you don’t have to rely on your partners to report on their statuses:

…The five-hour breakdown was so bad, Amazon couldn’t even update its own AWS status dashboard: its red warning icons were stranded, hosted on the broken-down side of the cloud.

AWS Outage

If something is important or critical to your business, you should be monitoring it.

You may be using third-party services, but you’re still responsible for your site’s overall usability

Sure, you could simply put up a banner that indicates your site is down due to issues seen by any third-party service providers with whom you’ve partnered. However, such actions are reactive in the sense you’re responding to a problem that’s already occurred. Worse would be if you hadn’t seen it before your users did (how many times have you discovered that critical functionality isn’t available via upset calls and emails from your clients?). The disruption might not be your fault per se, but it still affects customers’ perceptions of you negatively if they’re the one alerting you to problems.

Keep an eye on the third-party partners of your third-party partners

While this sounds a bit silly, consider your services chain is only as strong as the weakest link. For example, if your partner relies on Heroku or GitHub Webhooks (two of the many services affected by yesterday’s outage), then monitoring just your partner may not be enough to get your the information you need to respond appropriately and promptly. It might be difficult to track down all of the links in the chain, but it is worth doing so to ensure you’re able to see the whole lay of the land, so to speak.

What you can do to monitor those assets critical to your business

Rigor Monitoring provides tools you can use to keep an eye on both your critical systems and those of your partners (and your partners’ partners):

  • Real Browser Checks: Real Browser checks allow you to monitor the full user experience in terms of performance for a given user flow. It’s an easy way to ensure everything on your site works the way it’s supposed to, and if it doesn’t, you’ll receive notifications within minutes that something is impacting your site. You can also use this tool to monitor for warning conditions, such as slow response times. Such information alerts you to problems that could pop up in the future, allowing you to proactively eliminate its root causes.

AWS Outage

  • API Checks: API Checks are a great way to ensure any integrations between you and your partners, as well as between your partners and their partners, is well.

Rigor's New API Check

Rigor allows you to easily set up and monitor first- and third-party APIs

By making HTTP calls and analyzing both the way the response was sent and the response received, API Checks allows you to monitor, trend, and alert on an endpoint’s availability, response times, and response values and structure. You’re left with the necessary oversight to ensure the APIs your business relies on is doing what you need it to do.

Takeaways

Use Amazon’s outage and its widespread impact as a wake-up call that you should be proactive, instead of reactive, with regards to the uptime of your systems and infrastructure. Don’t let surprises catch you off guard; implement a thorough monitoring system that gives you clear oversight into the critical aspects of your business today.

For customized information on how Rigor can help you monitor, trend, and report on your site’s uptime and performance,

Suggested Blog Posts