Deploying Node and Python Lambda Functions with CDK

Deploying λ Functions with CDK

I've started using AWS Lambda again for a few projects at Standard Metrics.

Our core codebase is Python for the backend and TypeScript for the frontend (with React/Next).

Most of our existing lambda functions are in TypeScript, and they work pretty well. We get reasonable cold-starts, complete TypeScript tooling with CDK support, bundling and tree-shaking with esbuild, fast install/deployment times with pnpm, tagging for Vanta compliance, and full observability with DataDog.

However, for my most recent project we needed Python's powerful data-wrangling capabilities, which meant working with both Python and TypeScript in the same project. I couldn't find documentation out in the wild for this use case, so I had to figure out how to seamlessly integrate the two languages.

This is the primary subject of this blog post, and we are going over examples of how we define and configure those functions and the infrastructure we use to deploy them to production.

NodeJS Lambda Functions

We use the NodeJS functions to handle HTTP requests. We define a set of “sane” defaults as base configuration for all these functions:

typescript

const LAMBDA_DEFAULTS: Omit<NodejsFunctionProps, 'functionName' | 'entry' | 'handler'> = {
    memorySize: 1024,
    runtime: lambda.Runtime.NODEJS_20_X,
    timeout: Duration.minutes(5),
    tracing: Tracing.DISABLED, 
    bundling: {
        minify: true,
        sourceMap: false,
        sourcesContent: false,
        target: 'es2022',
        loader: {
            '.node': 'file',
        },
        esbuildArgs: {
            '--tree-shaking': true,
        },
        nodeModules: ['aws-cdk-lib'],
        externalModules: [
            '@aws-sdk/*',
            'aws-lambda',
            '@datadog/native-metrics',
            '@datadog/pprof',
            'dd-trace',
            'datadog-lambda-js',
        ],
    },
    environment: {
        stage: config.stage,
        identifier: `${config.identifier || ''}`,
        WEBHOOK_CLIENT_ID: 'Serverless-Doc-Processing',
    },
}

We use Node 20 which was the newest Node version supported by Lambda at the time. Node 22 was just released this month. v18 is available with support until mid/late 2025.

We disable tracing (XRay) because we use DataDog (via their constructs and layer extension).

For bundling, we disable AWS SDK / DataDog bundling, target ES2022 and minify the code. When we bundle DuckDB with NodeJS we also use the afterBundling command hook to package the binary correctly for the specific targeted architecture (arm64 or x86).

We override these defaults with function specific settings like the function name and handler entry point, tweak memory and disk configuration (if needed), set the number of concurrent executions to limit the impact of spikes when calling our webhooks and pass environment variables needed like secrets.

Here is an example:

typescript

const function = NodejsFunction(this, `lambda-${config.identifier}`, {
      ...LAMBDA_DEFAULTS,
      functionName: `json-processing-lambda-${config.identifier}`,
      entry: './src/handlers/events/json-processing-handler.ts',
      handler: 'index.handler',
      memorySize: 2560,
      ephemeralStorageSize: Size.mebibytes(1024),
      reservedConcurrentExecutions: 5,
      environment: {
        ...LAMBDA_DEFAULTS.environment,
        WEBHOOK_SECRET: webhook_secret.secretName,
        WEBHOOK_API_CACHE_TTL: `${timeout.toSeconds()}`,
      },
    })

We use NodejsFunction instead of the more generic Function construct. This gives us more control over bundling with the options in our defaults to tweak the resulting code via esbuild.

Python Lambda Functions as Docker Containers

For this project our data wrangling and transformation workloads use DuckDB, Polars, and Ibis. These dependencies exceed the 256MB limit allowed for a PythonFunction. Docker container don’t have this limit.

Each function runs in as its own container from this base Dockerfile:

dockerfile

FROM public.ecr.aws/lambda/python:3.12-arm64
ARG DEBIAN_FRONTEND=noninteractive

# uv for arm64
COPY --from=ghcr.io/astral-sh/uv:0.4.28@sha256:3a996474ab73047f1f8a626771c029a7b82176b62965dfa90cce0d497feffb8c  /uv /bin/uv

ENV PYTHONPATH=$LAMBDA_TASK_ROOT
ENV UV_COMPILE_BYTECODE=1
ENV UV_SYSTEM_PYTHON=1

WORKDIR $LAMBDA_TASK_ROOT
ADD uv.lock $LAMBDA_TASK_ROOT
ADD pyproject.toml $LAMBDA_TASK_ROOT
RUN uv sync --no-python-downloads --no-progress --no-dev --frozen --no-install-project

ADD python/README.md $LAMBDA_TASK_ROOT
ADD python/__init__.py $LAMBDA_TASK_ROOT

COPY python/package $LAMBDA_TASK_ROOT/package

RUN uv sync --frozen --no-dev

RUN uv pip install ./

ENV PATH="$LAMBDA_TASK_ROOT/.venv/bin:$PATH"

CMD [ "package.module.lambda_handler" ]

We built from Lambda's Python base image (using the arm64 variation). Then, copy the base project.toml file and the lock file, download all dependencies, and copy our packages to the image.

We use DockerImageFunction to instantiate our functions, and override the main handler CMD in the function definition itself.

The function definition looks like this:

typescript

const image = DockerImageCode.fromImageAsset('../', {
    cmd: ['packge.module.lambda_handler'], 
})

const function = new DockerImageFunction(this, `python-function-${config.identifier}`, {
    functionName: `python-function-${config.identifier}`,
    architecture: Architecture.ARM_64,
    memorySize: 10240,
    ephemeralStorageSize: Size.gibibytes(10),
    code: image, 
    vpc: core_network_config.vpc,
    vpcSubnets: { subnetType: SubnetType.PRIVATE_WITH_EGRESS },
    securityGroups: [this.securityGroup],
    reservedConcurrentExecutions: 1,
    retryAttempts: 1,
    environment: {
      ...env,
    },
})

We are using arm64 (Graviton) purely for cost experimentation (these are some chubby functions with 10GB of memory and disk).

We want each task to run "one at a time" and retry only once on failure (see reservedConcurrentExecutions and retryAttempts options). We do that to limit concurrency contention for the tables they query in the read replica.

Because these are all event-driven tasks, we don't worry too much about . Generally speaking, container-based Python Functions take longer to bootstrap than our Node Functions.

Once bootstrapped, they are just as fast and behave pretty much like any other Lambda function.

Infra as Code

The infrastructure itself is deployed with AWS's CDK and is also written in TypeScript.

Each set of functions has its own CloudFormation stack, and they belong to the same Application in AWS Lambda. They can interact with each other and share resources (i.e., an HTTP Request can queue a Python task or can serve a Parquet file built by a Python function).

Codebase setup

NodeJS / TypeScript

We manage the NodeJS codebase, dependencies, and scripts with pnpm. Formatting and linting are done with Biome, and TypeScript is handled via tsx. This is the first project in which we used vitest for testing. It has excellent developer experience, which is a lot better than jest. So far, we haven't run into any issues.

tsx runs both CDK and all our node code. tsc is only used to validate our typing "on demand" (running pnpm run build runs tsc --noEmit), but our IDEs do most of the heavy lifting as we code.

Python

We are using uv as package manager for its speed, ease of use and its ability of building reproducible environments. It's extremely fast, can handle multiple packages, and is compatible with workspaces just like pnpm.

Max, one of the other engineers in the Platform team, set up a pyproject.toml configuration matching our core backend coding standards, which uv also understands. We use Ruff for formatting and linting, pytest for unit tests, and pyright for static type-checking.

TIP

Both VS Code and PyCharm/WebStorm have solid support for our tooling out of the box.

Folder structure

Our folder structure is slightly different from our other projects too. It looks like this:

shell


.
├── CHANGELOG.md
├── Dockerfile
├── Makefile
├── README.md
├── biome.json
├── infra
│   ├── README.md
│   ├── bin
│   │   └── infra.ts
│   ├── package.json
│   ├── src
│   │   ├── api.ts
│   │   ├── config
│   │   │   └── stages.ts
│   │   ├── constructs
│   │   │   └── base-constructs.ts
│   │   ├── external-imports.ts
│   │   ├── node-functions.ts
│   │   ├── python-functions.ts
│   │   ├── storage.ts
│   │   └── app-stack.ts
│   └── tsconfig.json
├── node
│   ├── coverage
│   ├── package.json
│   ├── src
│   │   ├── core
│   │   │   └── ...
│   │   ├── handlers
│   │   │   └── http
│   │   │       └── test-handler.ts
│   │   │       └── ...
│   │   │   └── events
│   │   │       └── ...
│   │   ├── shared
│   │   │   ├── duckdb.ts
│   │   │   ├── environment.ts
│   │   │   └── s3.ts
│   │   └── types
│   ├── tests
│   │   ├── core
│   │   │   ├── ...
│   │   └── shared
│   │       ├── duckdb.test.ts
│   │       ├── environment.test.ts
│   │       └── s3.test.ts
│   ├── tsconfig.json
│   └── vitest.config.ts
├── package.json
├── pnpm-lock.yaml
├── pnpm-workspace.yaml
├── pyproject.toml
├── python
│   ├── README.md
│   ├── __init__.py
│   └── modules
│       ├── __init__.py
│       ├── module1
│       │   ├── __init__.py
│       │   ├── core.py
│       │   └── handlers.py
│       ├── db
│       │   ├── __init__.py
│       │   └── duckdb.py
│       ├── module2
│       │   ├── __init__.py
│       │   ├── core.py
│       │   └── handlers.py
│       └── ...
└── uv.lock

Highlights:

infra: This is the main CDK folder, it handles all our infrastructure as code, deployment to all stages and "bundling" for both languages.
node: These are our core TypeScript functions. It has its own dependencies, tests and coverage.
python: These are our code Python functions. Each function is a different module, they share some of the packages like our lightweight duckdb abstraction.
Dockerfile: This is the main dockerfile for all Python containers.
pnpm-workspace.yaml: This is the workspace configuration for both infra and node. The root package.json file has the formatting/linting config and the release scripts.
pyproject.toml and uv.lock: These are the configuration for all Python modules, including dependencies, development tools, linting, formatting and tests.

Run scripts with make

Because we are using both Python and Node we use make scripts to standardize how we run scripts for both languages.

Running make deploy calls cdk deploy to push the code to Lambda. If we do make test it'll run tests for both python and node (calling pyright and vitest).

Here's an excerpt of our makefile:

makefile

.DEFAULT_GOAL := help

...

.PHONY: test-node
test-node: ## Runs tests for node code
	pnpm --filter node run test

.PHONY: test-python
test-python: ## Run python tests
	uv pytest

.PHONY: test test-python test-node
test:  test-node test-python ## Runs all tests.

.PHONY: deploy
deploy: ## Builds and deploys the project to AWS
	pnpm --filter infra run deploy

Deploying λ Functions with CDK ​

NodeJS Lambda Functions ​

Python Lambda Functions as Docker Containers ​

Infra as Code ​

Codebase setup ​

NodeJS / TypeScript ​

Python ​

Folder structure ​

Run scripts with make ​

Deploying λ Functions with CDK

NodeJS Lambda Functions

Python Lambda Functions as Docker Containers

Infra as Code

Codebase setup

NodeJS / TypeScript

Python

Folder structure

Run scripts with make