DynamoDB Setup
Create the DynamoDB table structure for storing file hash, user associations, and S3 keys.
Amazon DynamoDB is used as the metadata store for managing deduplicated file entries. For every file uploaded, a hash is computed using SHA-256 and stored along with the list of users associated with that file. This ensures:
Only one copy of a file is stored in S3
Each user is tracked independently
Files are deleted from S3 only when no users are linked

Step 1: Create a DynamoDB Table
Go to the AWS Console → DynamoDB
Click Create Table
Configure the table as follows:
Table name:
dedupTable
Partition key:
FileHash
(Type:String
)
Leave all other settings default (unless you want to enable on-demand capacity or auto-scaling)
Click Create Table
Step 2: DynamoDB Item Structure
Each entry in the table looks like this:
{
"FileHash": "7321348c889467...",
"S3Key": "unique_id_test.txt",
"Users": ["userA", "userB"]
}
FileHash
String (PK)
SHA-256 hash of the file
S3Key
String
The unique filename stored in S3
Users
List<String>
User IDs who have uploaded this file
Step 3: IAM Policy for Lambda Access
Make sure your Lambda function has permissions to interact with this table.
Attach this inline policy to your Lambda execution role:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem",
"dynamodb:DeleteItem"
],
"Resource": "arn:aws:dynamodb:<region>:<account-id>:table/dedupTable"
}
]
}
Replace
<region>
and<account-id>
with your AWS values.
Step 4: Verify Table Access
You can test by inserting a dummy item via the AWS Console or CLI:
{
"FileHash": { "S": "abcdef123456..." },
"S3Key": { "S": "test.txt" },
"Users": {
"L": [
{ "S": "userA" },
{ "S": "userB" }
]
}
}
What Happens Next?
Your Lambda functions will:
Check if a file hash already exists in this table
Append new users to the
Users
list if the file existsDelete S3 files and the table entry when the last user unlinks
This is the core of your deduplication logic.
Last updated