DynamoDB Setup

Create the DynamoDB table structure for storing file hash, user associations, and S3 keys.

Amazon DynamoDB is used as the metadata store for managing deduplicated file entries. For every file uploaded, a hash is computed using SHA-256 and stored along with the list of users associated with that file. This ensures:

  • Only one copy of a file is stored in S3

  • Each user is tracked independently

  • Files are deleted from S3 only when no users are linked

Step 1: Create a DynamoDB Table

  1. Go to the AWS Console → DynamoDB

  2. Click Create Table

  3. Configure the table as follows:

    • Table name: dedupTable

    • Partition key: FileHash (Type: String)

⚠️ Do not add a sort key

  1. Leave all other settings default (unless you want to enable on-demand capacity or auto-scaling)

  2. Click Create Table


Step 2: DynamoDB Item Structure

Each entry in the table looks like this:

{
  "FileHash": "7321348c889467...",
  "S3Key": "unique_id_test.txt",
  "Users": ["userA", "userB"]
}
Field
Type
Description

FileHash

String (PK)

SHA-256 hash of the file

S3Key

String

The unique filename stored in S3

Users

List<String>

User IDs who have uploaded this file


Step 3: IAM Policy for Lambda Access

Make sure your Lambda function has permissions to interact with this table.

Attach this inline policy to your Lambda execution role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:UpdateItem",
        "dynamodb:DeleteItem"
      ],
      "Resource": "arn:aws:dynamodb:<region>:<account-id>:table/dedupTable"
    }
  ]
}

Replace <region> and <account-id> with your AWS values.


Step 4: Verify Table Access

You can test by inserting a dummy item via the AWS Console or CLI:

{
  "FileHash": { "S": "abcdef123456..." },
  "S3Key": { "S": "test.txt" },
  "Users": {
    "L": [
      { "S": "userA" },
      { "S": "userB" }
    ]
  }
}

This validates your table’s structure and permissions.


What Happens Next?

Your Lambda functions will:

  • Check if a file hash already exists in this table

  • Append new users to the Users list if the file exists

  • Delete S3 files and the table entry when the last user unlinks

This is the core of your deduplication logic.

Last updated