DynamoDB Setup

Create the DynamoDB table structure for storing file hash, user associations, and S3 keys.

Amazon DynamoDB is used as the metadata store for managing deduplicated file entries. For every file uploaded, a hash is computed using SHA-256 and stored along with the list of users associated with that file. This ensures:

Only one copy of a file is stored in S3
Each user is tracked independently
Files are deleted from S3 only when no users are linked

Step 1: Create a DynamoDB Table

Go to the AWS Console → DynamoDB
Click Create Table
Configure the table as follows:
- Table name: dedupTable
- Partition key: FileHash (Type: String)

⚠️ Do not add a sort key

Leave all other settings default (unless you want to enable on-demand capacity or auto-scaling)
Click Create Table

Step 2: DynamoDB Item Structure

Each entry in the table looks like this:

{
  "FileHash": "7321348c889467...",
  "S3Key": "unique_id_test.txt",
  "Users": ["userA", "userB"]
}

Field

Type

Description

FileHash

String (PK)

SHA-256 hash of the file

S3Key

String

The unique filename stored in S3

Users

List<String>

User IDs who have uploaded this file

Step 3: IAM Policy for Lambda Access

Make sure your Lambda function has permissions to interact with this table.

Attach this inline policy to your Lambda execution role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:UpdateItem",
        "dynamodb:DeleteItem"
      ],
      "Resource": "arn:aws:dynamodb:<region>:<account-id>:table/dedupTable"
    }
  ]
}

Replace <region> and <account-id> with your AWS values.

Step 4: Verify Table Access

You can test by inserting a dummy item via the AWS Console or CLI:

{
  "FileHash": { "S": "abcdef123456..." },
  "S3Key": { "S": "test.txt" },
  "Users": {
    "L": [
      { "S": "userA" },
      { "S": "userB" }
    ]
  }
}

This validates your table’s structure and permissions.

What Happens Next?

Your Lambda functions will:

Check if a file hash already exists in this table
Append new users to the Users list if the file exists
Delete S3 files and the table entry when the last user unlinks

This is the core of your deduplication logic.

PreviousS3 Bucket Setup NextLambda Function Setup

Last updated 4 months ago