Serverless Site Monitoring with AWS Lambda, DynamoDB, and SNS

The term 'serverless' is awfully in-vogue today, and I get the sense its utterances have started provoking eye-rolls and sighs from beleaguered technologists, desperate for a respite from the bluster of marketeers and buzzword-enthusiasts.

But with that said, a quiet revolution is underway. Data storage was once a difficult and expensive proposition: hard to scale, fraught with complexity and the risk of data loss, and requiring all manner of 'babysitting' to get right. Within the space of my not-especially-lengthy career, it has transitioned from a thorny in-house problem, to a barely considered, outsource-able service in the same vein as electricity1. Important, to be sure, but a problem that someone else has solved, and can supply to us far more cheaply and more reliably than we could possibly achieve in-house. Since the introduction of services like S3 and its contemporaries – your average firm has been forever spared any of this burden. So it was with storage, I believe the same will be true for compute.

It's early days. There's no "Facebook but without servers", and there won't be for some time. However, with the addition of managed2 compute resources to AWS in the form of Lambda functions, the space of possibilities of things that can be built without running and managing servers has grown considerably.

Downtime Notifier

To this end, I've been experimenting with some of the basic managed building blocks in AWS to build a system for monitoring websites, and alerting (via email and/or SMS) in the event of any downtime. I've come up with something I'm really happy with, and it's fully managed (i.e. serverless) and fully-automated. Detailed instructions about how this works and how to set it up can be found here.

The Lambda code itself is a set of Python modules that implements the following basic approach:


def lambda(event, context):
    """This is invoked every 5 minutes by an AWS::Events::Rule."""

    should_notify = False

    for site in configured_sites:
        # 1. In a separate thread, check if the site is up.
        #    Use retries/exponential-backoff in case of failure.
        # 2. Record the result to a DynamoDB table.
        # 3. Compare the result with the previous outcome stored in DynamoDB.
        # 4. If anything has changed, set should_notify = True.

    if should_notify:
        # 1. Build a message detailing the current state of things.
        # 2. Publish to the SNS topic.

AWS Infrastructure

This code sits at the core of a CloudFormation3 stack that contains the following resources:

  • A AWS::DynamoDB::Table to store results from each check
  • A AWS::SNS::Topic to notify about state changes, configured with email and/or SMS endpoints
  • An AWS::Lambda::Function 'container'4 for the code
  • An AWS::IAM::Role under which the function runs that grants the following permissions:
    • publishing to the SNS topic
    • querying the DynamoDB table and writing a record
    • writing logs to CloudWatch
  • An AWS::Events::Rule to trigger the function every 5 minutes
  • An AWS::Lambda::Permission to permit the events service to invoke the function

A single provision operation creates all of this. The code itself gets built into a single zip file and deployed with build and deploy operations.

If you follow the directions, you can easily deploy this into your own AWS account.

Pricing

The cost5 to operate this indefinitely is... zero dollars! With the current configuration, the function is run once every five minutes for a typical runtime of ~1100ms. It stores a modest amount of data in DynamoDB, and delivers SMS messages a couple of times a month. All of this usage is vastly within the thresholds offered by the AWS free tier for the respective services. In a pre-Lambda world, we'd be looking at a minimum spend of ~$5 a month for a t2.nano EC2 instance to mostly sit idle.

Operation

More significantly, though, there's nothing to manage to ensure this continues to work on a long-term basis. This will happily run indefinitely, at least until something profound changes with AWS itself. No security patches to install, no servers running out of disk space.

So, serverless. There's something to this, and I'm very bullish on the concept.


  1. At least for a great many use cases, if not yet all. 

  2. A 'managed' service is anything that does not provide access to the base servers, and exposes a higher abstraction layer instead. S3 and Lambda are the classic examples, but this also includes RDS, DynamoDB, ElastiCache, etc. 

  3. CloudFormation is an AWS tool to create and manage sets of related infrastructure, called stacks. 

  4. That's lowercase-c 'container', though I'm pretty sure Lambda itself is implemented using the uppercase-C sort. 

  5. Standard disclaimer: Be sure you understand how AWS bills for its services, and how to set up billing alerts. What was true at the time of writing may not always be true, so monitor your account accordingly.