I have been developing a bunch of serverless apps and experimenting with serverless security for our (we45’s) work in Pentesting and for our training on Serverless Security in OWASP AppSecUSA 2018 and I came across this interesting scenario during my research.
If you are working with AWS Lambda (Serverless), chances are that you would be working with AWS’s NoSQL Database, DynamoDB. DynamoDB is AWS’s cloud NoSQL solution that supports both Document models (like MongoDB) and Key-Value models (like Redis). DynamoDB and Lambda are a popular combination that several developers use to develop and run serverless applications on AWS infrastructure.
A Quick note on DynamoDB
As DynamoDB is a cloud based NoSQL solution, it comes with a plethora of features from in-memory caching (DAX) to Seamless Scaling, to Encryption at rest and many more (full list here). As a Database, it supports a bunch of CRUD operations on Data, including INSERTS, UPDATES, DELETE, QUERY & SCAN operations. DynamoDB has TABLES (named similarly to SQL Tables). Tables can have Items (collection of attributes of a document).
DynamoDB has two types of primary keys. One is a Partition Key (primary key), which computes the value internally to a hash function. In tables where there’s only the Partition Key, the Partition key has to be unique. For example, in a User table with a partition key of UserID, the UserID has to be unique and you can GET the user’s attributes by referencing the UserID
DynamoDB also supports tables with Partition Keys and Sort Key. The first attribute is the Partition Key and the second attribute is the Sort Key (forming a composite primary key). In tables with both Partition and Sort Key, one can have a non-unique partition key, but would need to have different sort key values. This is also called a Range Key.
The two major ways to perform SELECT-like functions on DynamoDB tables is the use of Queries and Scans. A queryrequires users to use the primary key attributes as well as additional filters to return the data queried. However, the scan function scans the entire table and returns results based on the ScanFilters. It goes without saying that the query feature is more efficient, but the scan feature is equally useful for fuzzy searches or search by attributes, without having to deal with the partition key.
This attack scenario is very similar to NoSQL Injection attacks against MongoDB (link). And I have been able to validate this against the scan() function of the DynamoDB database.
Full disclosure: I have reported this issue to AWS. However, they (rightly) mentioned that this is the intended behavior of the DB and its up to the developers to ensure the security of their applications when they use DynamoDB. Therefore, its something that developers need to be watchful
Let’s examine a simple scan operation against my DynamoDB table. I have a User table with some attributes. I am using python boto3 client for this example.
In the above code snippet, I am using a scan operation to filter based on the first_name and last_name attributes. I am using a String filter, denoted by S (in AttributeValueList) with a string filter condition. I am using the ComparisonOperator == EQ (code for equals). You have a bunch of Comparison operators including
With String attributes, comparison gets tricky, as comparison depends on the ASCII lexical ordering of strings, therefore, if you compare string values against another string with lower lexical ordering like * or an string with whitespace its likely to be always greater than or less than the queried string. Here’s another code snippet demonstrating this…
In the above example, I am using the “Greater than” ComparisonOperator to check if the first_name and last_name attributes are greater than * or " ". In most cases, this would match all records in my Database, as these string attributes as WhateverString < * in all cases. In this case, if the attacker is able to manipulate (through the app/interface), the search section of the AttributeValueList or can manipulate/utilize a ComparisonOperator like GT, LT, NE, NOT_CONTAINS, etc then the attacker may be able to query more than the datasets that the developer intends to expose to the user.
As I mentioned before, this is very similar to the MongoDB NoSQL Injection attack possibilities, where MongoDB expressions like $gte, $lte, $ne can be used to manipulate the returned resultset.