serverless

TDD for Serverless - My Setup

Abhay Bhargav

01 Dec 2019 • 4 min read

I have been building serverless over the last 10 months now. I am currently working on a "not-small" project that spans nearly a 100 functions in Python 3.7. I started working with Test-Driven Development (TDD) around a year ago. Its been quite a large productivity boost for me. This post is some prose detailing my stack, my setup for you, the interested reader, and partly for myself, in case I forget, later down the road. 😬

Why TDD? What is TDD?

What is TDD? TL;DR Version

TDD (Test-Driven Development) is a practice of writing the test first. And subsequently writing the code that would result in the test passing. It goes with an "end-game first" approach to writing functionality, where you define how the functionality should work (in the form of a test) and then write the code, to ensure that the test passes.

Example: A User wants to sign up for my application with their email and password.

I first write a unit test, simulating the JSON object that a user would send to my function with email, password and confirm_password. Once I am done defining my unit test, I write the function to make the (unit) test pass (by refactoring) and ensure that I move forward only when that test passes.

This helps in the following ways:

Forces developer to define the use of the implementation before actually implementing it
Reduces possibilities of developer committing non-functional code to the source repo
Immediate Feedback to Developer.
Reduces time spent on debugging and rework
among several others...

With Functions as a Services (FaaS), this is a perfect complement because you are focused on writing the function itself as the unit of compute.

My Stack

I am working on a mid-size FaaS Project with ~100 functions:

written in Python 3.7 on AWS Lambda
DynamoDB as the Database
with Cloudwatch Events (for specific scheduled events)
API Gateway events (mostly web-service functionality)
S3 to process specific types of uploads
Amazon Cognito UserPool - for SSO (OIDC and OAuth 2.0)
Serverless Framework - to deploy the functions to AWS and orchestrate deployment

Common Constraints:

Since I am using Cognito, I dont need to handle Authentication and Authorization management. AWS allows me to leverage User Pool Authorizers with API Gateway. i.e. API Gateway checks the user's JWT and invokes the Lambda Function only if the user is authorized. However, I still need to check for:
- Whether the user belongs to a specific group
- User's email based on the specific function in question
I am mostly using my FaaS stack as a web-service, which means that I am largely expecting to receive JSON objects in the request, and respond with JSON
Few Events that are triggered from non API Gateway sources like S3 and Cloudwatch Events

My TDD Setup

Test Library

I like the popular Python library for unit test pytest. I have been using it for many years. For my serverless stack, I continue with pytest.

Database

Since DynamoDB is a managed cloud-hosted DB, I would have otherwise had to access it on the cloud with python's boto3 library. However, DynamoDB has a nifty local version that you can use and run on docker for developer-friendly workflows

docker run -d -p 8000:8000 amazon/dynamodb-local

To initialize this in my Python Test library, all I have to do is to initialize boto3 with the endpoint url argument to ensure that the library communicates with a local instance of dynamodb rather than the one in the cloud

db = boto3.resource("dynamodb", endpoint_url="http://localhost:8000")

Setup and Teardown

As with most unit testing, I have a function that I create called resource_a_setup which is a fixture that I create to define a baseline for the test. In this specific baseline, I define a DynamoDB table (locally) and all its attributes. And add some code to delete it, once the test suite is done running

@pytest.fixture(scope="module")
def resource_a_setup():
    table = db.create_table(
        TableName="test-table",
        KeySchema=[
            {"AttributeName": "pk", "KeyType": "HASH"},
            {"AttributeName": "sk", "KeyType": "RANGE"},
        ],
        AttributeDefinitions=[
            {"AttributeName": "pk", "AttributeType": "S"},
            {"AttributeName": "sk", "AttributeType": "S"},
        ],
        GlobalSecondaryIndexes=[
            {
                "IndexName": "bti-gsi",
                "KeySchema": [{"AttributeName": "sk", "KeyType": "HASH"}],
                "Projection": {"ProjectionType": "ALL"},
                "ProvisionedThroughput": {
                    "ReadCapacityUnits": 5,
                    "WriteCapacityUnits": 5,
                },
            }
        ],
        BillingMode="PROVISIONED",
        ProvisionedThroughput={
            "ReadCapacityUnits": 5, "WriteCapacityUnits": 5},
    )
    table.meta.client.get_waiter("table_exists").wait(TableName="test-table")
    print("created tables")
    yield
    table.delete()
    print("deleted table")

An Example TDD Function for FaaS

Now that we're done with the setup. Here's an example of one of my tests for a single Lambda Function

def test_create_event(resource_a_setup):
    event = {}
    event["body"] = {"event_name": "some_event"}
    event["cognitoPoolClaims"] = {"groups": "somegroup"}
    response = create_event(event, None)
    assert "body" in response
    assert response["body"]["success"]

Let's break this down:

The function test_create_event is initialized whenever I run py.test on my command line. Pytest looks for functions that are prefixed with the term test and runs them.
Every Lambda Function expects an event passed in as an argument. In this case, My event has some data (JSON Payload) from the API Gateway (in the body parameter.
Since the function is invoked only after a user has successfully authorized to the API Gateway, I expect the event object to additionally have a key called the cognitoPoolClaims. This key contains all the key:value pairs that is gathered from the user's ID token, post authorization. In my application the ID Token additionally contains the user's group affiliation. My function validates the user based on the group.
The response variable is assigned based on the return value from the target function create_event. The response typically looks like this

{
  "success": True,
  "error": False,
  "message": "Successfully created event `SomeEvent`",
}

I am asserting if the event response has a body key and if the success in the event.body key is set to True, essentially meaning that the function ran as per specification, successfully.

Where do we go from here?

I repeat this flow for every.single.function I write. I mock the event key with data that is typically required for the function and add additional params based on the type of event. This runs as unit tests, which means that they run quickly, and since all the infrastructure is on the developer's machine, it doesn't make those expensive trips to DynamoDB on the cloud