Lambda Function Setup

Configure smart serverless functions to handle file uploads, deduplication logic, and cleanup—powering the core of our efficient, user-aware storage system.

We will now set up the core Lambda functions that power the file deduplication logic. These serverless functions are the brain of the project — handling file validation, deduplication checks, S3 operations, and user tracking in DynamoDB.

There are two main Lambda functions in this project:

  1. Upload Handler — triggered when a user uploads a file

  2. Delete Handler — triggered when a user deletes/unlinks a file


Step 1: Create the Upload Lambda Function

  1. Go to AWS Console → Lambda

  2. Click Create Function

  3. Select:

    • Author from scratch

    • Function name: owncloud-dedup-function

    • Runtime: Python 3.9

    • Permissions: Attach a previous role owncloud-dedup-role with basic Lambda permissions.

  4. Click Create function

Once created, open the function editor and paste your upload logic.

import json
import boto3
import hashlib
import base64
import uuid

# AWS Clients
s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')

# Configurations
TABLE_NAME = "dedupTable"
DEDUP_BUCKET = "owncloud-dedup-files"
table = dynamodb.Table(TABLE_NAME)

def lambda_handler(event, context):
    try:
        # Parse and validate input
        body = json.loads(event['body'])
        file_name = body.get('fileName')
        file_content_b64 = body.get('fileContent')
        user_id = body.get('userID')

        if not file_name or not file_content_b64 or not user_id:
            return {
                'statusCode': 400,
                'body': json.dumps({'error': 'Missing fileName, fileContent, or userID'})
            }

        # Decode base64 and hash the file
        file_data = base64.b64decode(file_content_b64)
        file_hash = hashlib.sha256(file_data).hexdigest()


        # Check for file existence by hash
        response = table.get_item(Key={'FileHash': file_hash})

        if 'Item' in response:
            item = response['Item']
            s3_key = item['S3Key']
            existing_users = item.get('Users', [])

            # Only update if user not already listed
            if user_id not in existing_users:
                table.update_item(
                    Key={'FileHash': file_hash},
                    UpdateExpression="SET #u = list_append(if_not_exists(#u, :empty_list), :new_user)",
                    ExpressionAttributeNames={'#u': 'Users'},
                    ExpressionAttributeValues={
                        ':new_user': [user_id],
                        ':empty_list': []
                    }
                )

            return {
                'statusCode': 200,
                'body': json.dumps({
                    'message': 'File already exists. Linked to you.',
                    'stored_path': f"s3://{DEDUP_BUCKET}/{s3_key}"
                })
            }
        # Upload new file to S3
        unique_key = f"{uuid.uuid4()}_{file_name}"
        s3.put_object(Bucket=DEDUP_BUCKET, Key=unique_key, Body=file_data)

        # Store new metadata
        table.put_item(Item={
            'FileHash': file_hash,
            'S3Key': unique_key,
            'Users': [user_id]
        })

        return {
            'statusCode': 200,
            'body': json.dumps({
                'message': 'File stored successfully',
                'stored_path': f"s3://{DEDUP_BUCKET}/{unique_key}"
            })
        }


    except Exception as e:
        return {
            'statusCode': 500,
            'body': json.dumps({'error': str(e)})
        }

This function handles:

  • File hash computation (SHA-256)

  • Duplicate check in DynamoDB

  • Upload to S3 (if new)

  • Append user reference to existing files

  • Return stored path (symlink-style)


Step 2: Create the Delete Lambda Function

  1. Go back to the Lambda console

  2. Click Create Function again

  3. Select:

    • Function name: deleteFileLambda

    • Runtime: Python 3.11

    • Use the same role deleteFileLambda-role-wpp128e3 with S3 + DynamoDB access

  4. Paste the delete logic:

import json
import boto3

s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')

# Constants
TABLE_NAME = "dedupTable"
DEDUP_BUCKET = "owncloud-dedup-files"
table = dynamodb.Table(TABLE_NAME)

def lambda_handler(event, context):
    try:
        # Parse request
        body = json.loads(event['body'])
        file_hash = body.get('fileHash')
        user_id = body.get('userID')

        if not file_hash or not user_id:
            return {
                'statusCode': 400,
                'body': json.dumps({'error': 'Missing fileHash or userID'})
            }

        # Fetch item from DynamoDB
        response = table.get_item(Key={'FileHash': file_hash})
        if 'Item' not in response:
            return {'statusCode': 404, 'body': json.dumps({'error': 'File not found'})}

        item = response['Item']
        users = item.get('Users', [])

        if user_id not in users:
            return {
                'statusCode': 400,
                'body': json.dumps({'error': 'User not linked to this file'})
            }

        # Remove the user from the list
        users.remove(user_id)

        if users:
            # Update remaining users in the DB
            table.update_item(
                Key={'FileHash': file_hash},
                UpdateExpression="SET #U = :u",
                ExpressionAttributeNames={"#U": "Users"},
                ExpressionAttributeValues={':u': users}
            )
            return {
                'statusCode': 200,
                'body': json.dumps({'message': 'User reference removed. File still linked to others.'})
            }
        else:
            # Delete file from S3 and remove DB entry
            s3.delete_object(Bucket=DEDUP_BUCKET, Key=item['S3Key'])
            table.delete_item(Key={'FileHash': file_hash})
            return {
                'statusCode': 200,
                'body': json.dumps({'message': 'File deleted from system. No users remaining.'})
            }

    except Exception as e:
        return {
            'statusCode': 500,
            'body': json.dumps({'error': str(e)})
        }

This function:

  • Removes a user's reference from the Users list in DynamoDB

  • Deletes the file from S3 if it has no users remaining


(Optional) Step 3: Assign IAM Role Permissions (for verification purpose)

Ensure both Lambda functions have access to:

  • S3 (PutObject, DeleteObject)

  • DynamoDB (GetItem, PutItem, UpdateItem, DeleteItem)

  • KMS (if using SSE-KMS encryption)

IAM Inline Policy Example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:*"
      ],
      "Resource": "arn:aws:s3:::owncloud-dedup-files/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:*"
      ],
      "Resource": "arn:aws:dynamodb:<region>:<account-id>:table/dedupTable"
    }
  ]
}

Optional: Enable CloudWatch Logging

To help with debugging, enable CloudWatch logs:

  • Go to Lambda → Monitor tab → Enable CloudWatch Logs

  • Add print() statements inside your Lambda code


Lambda Functions Ready!

These two serverless functions are now capable of:

  • Deduplicating files intelligently

  • Tracking user-level file ownership

  • Cleaning up storage as users unlink

Last updated