DuckDB on Serverless | Coffee Engineering

Intro

DuckDB is impressive. It has a delightful DevEx, its SQL dialect is concise and practical. It talks to S3, Postgres, MySQL, Snowflake, and more. It can read Parquet files, CSVs, JSONs, even Excel files. It's the most portable database you'll ever use. If you work with data (and every engineer should), it's the best.

I use it for everything. Parsing logs to find issues. Doing analytics on our Evaluations. Transforming data from S3 and Postgres, and reshaping it so that other consumers can use it. Several of our use cases are "serverless". They run once or as side-effects of events in our platform, and we run them on AWS Lambda.

DuckDB is built as an in-process data engine, and Lambda is short-lived, reused in ways you can't easily control. We've been running DuckDB in production on Lambda for a while now, querying our S3 data lake, attaching to our Postgres read-replica, and reaching into MotherDuck.

These are the field notes: the three sharp edges we hit before, and the small Python wrapper we built so the rest of the team never has to think about any of them.

Why DuckDB and serverless seem incompatible

DuckDB is built for the "one big machine" case. It assumes a persistent process, a stable filesystem, and a connection you set up once and keep. Lambda is exactly the opposite.

We're not the first to feel the mismatch. A widely cited DuckDB discussion puts it bluntly — "AWS Lambda is currently not a good environment for running DuckDB, at least if the tables to be processed don't fit for sure in the RAM" — after people hit OutOfMemory errors on multi-gigabyte tables, and even tobilg's reference setup admits it isn't the most performant option when Fargate and big EC2 boxes exist. Fair enough.

The trick is staying in the right lane: we use DuckDB for read-heavy analytical queries whose results fit in Lambda's memory (a hard limit of 10GB) most of the time. We run aggregations and transformations on S3 Parquet and our Postgres replica. Even when our database is more than 20x that, we've never had a use case where we are grinding tens of gigabytes through a GROUP BY. In that lane, cold starts and statefulness are the problems worth solving, and they're solvable.

A Lambda execution environment is cold on first use — no memory of any previous run — and its filesystem is read-only except for /tmp. It's also reused: after the first invocation, AWS keeps the environment warm and runs your next invocation in the same process, with whatever module-scope state you left behind. One correction, because it's widely misunderstood: an environment handles one invocation at a time: concurrency comes from AWS spinning up more of them, not from two requests fighting inside one process. The in-process trouble only shows up when your own code shares a state across threads.

So the three things to be deliberate about are: getting extensions in without a runtime download, wiring up credentials without paying for it on every cold start, and handing out connections without leaking them or sharing them unsafely.

Bundling extensions for faster cold starts

DuckDB's extension ecosystem is one of the reasons I love it. httpfs gets us S3, postgres_scanner attaches Postgres, and the motherduck extension connects us to a hybrid execution environment that can use compute from Lambda and from their ducklings. Locally you'd just run INSTALL httpfs; LOAD httpfs; and move on.

The catch is that INSTALL reaches out over the network to download the extension binary from extensions.duckdb.org. On a laptop you won't even notice. In a cold start it's a latency hit every time the runtime spins up a fresh container, and in a locked-down VPC, that outbound call may simply fail.

So we don't do it at runtime. We pre-bake the exact extensions we need into the Lambda container image during the Docker build, matching our architecture and DuckDB version, and drop them into /opt/duckdb_extensions:

dockerfile

# Pre-bake DuckDB extensions so the runtime doesn't need outbound HTTP to extensions.duckdb.org.
# DUCKDB_VERSION MUST match the `duckdb==` pin in pyproject.toml 
ARG DUCKDB_VERSION=1.5.4
ARG DUCKDB_PLATFORM=linux_arm64
ENV DUCKDB_EXTENSIONS_DIR=/opt/duckdb_extensions
COPY scripts/install_duckdb_extensions.py /tmp/install_duckdb_extensions.py

RUN python3 /tmp/install_duckdb_extensions.py \
    --version "${DUCKDB_VERSION}" \
    --platform "${DUCKDB_PLATFORM}" \
    --target "${DUCKDB_EXTENSIONS_DIR}" \
    && rm /tmp/install_duckdb_extensions.py

That version pin matters: the comment is there because the extension binaries are built against a specific DuckDB version, and a mismatch fails at run time. Our DuckDB wrapper then intercepts the extension loading so application code never has to think about where the binary lives:

python

# Extensions occasionally use different canonical names on the CDN
_EXTENSION_FILENAMES = {
    "postgres": "postgres_scanner",
}

def _load_extension(curr: duckdb.DuckDBPyConnection, name: str) -> None:
    filename = _EXTENSION_FILENAMES.get(name, name)
    
    # Look for the extension in our pre-baked directory
    bundled = pathlib.Path(f"/opt/duckdb_extensions/{filename}.duckdb_extension")
    
    if bundled.exists():
        # Fast path: load the local binary
        curr.execute(f"LOAD '{bundled}';")
        return
        
    # Fallback: network install (useful for local development!)
    curr.execute(f"INSTALL {name}; LOAD {name};")

One small detail worth calling out: the name you use when loading an extension isn't always the filename on disk. The postgres extension ships as postgres_scanner, so we keep a _EXTENSION_FILENAMES map to translate the exceptions. The fallback INSTALL path is deliberate too: it's the network install that makes local development painless when you don't have the pre-baked directory on your laptop.

Bootstrapping and secret management

When our application code asks for a database cursor, it shouldn't have to think about fetching credentials or attaching external databases. That's plumbing, and plumbing should be invisible. So we built a bootstrapping system that wires up the external connections on demand.

We use aws_lambda_powertools to pull secrets from AWS Systems Manager (SSM) Parameter Store. It caches them, which keeps us from hammering SSM on every warm invocation. When a connection is bootstrapped, we automatically:

Attach Postgres. We fetch the read-replica password from SSM and run an ATTACH to map our production Postgres replica as a read-only DuckDB database, pg_replica. We also inject aggressive keepalives settings into the connection string so the connection doesn't get dropped out from under a long-running query or while the lambda is idle.
Configure S3 credentials. We load the httpfs and aws extensions and set up a credential chain so DuckDB can read directly from our data lake on S3.
Connect to MotherDuck. We fetch our MotherDuck token and initialize the extension our cross-cloud data warehousing needs.

By rolling this into a DEFAULT_BOOTSTRAP tuple, any developer who simply requests a cursor gets a connection already wired up to the rest of our infrastructure securely.

Thread safety and lifecycle management

This one is the easiest to get wrong. Because a warm Lambda environment reuses the same process across invocations, anything you cache at module scope outlives the request that created it. And the moment your code touches that cached connection from more than one thread — a background worker, an async fan-out, a library that threads under the hood — DuckDB's in-process model needs guardrails. It was built assuming a single owner.

So we don't hand out raw connections. We built a multiton _Connection class, guarded by a threading.Lock(), that caches the underlying DuckDB connections (one per logical database) and creates them safely under the lock. On top of that, we expose clean context managers for application code:

python

@contextlib.contextmanager
def cursor(
    database: Databases = "default",
    *,
    bootstrap: tuple[BootstrapStep, ...] = DEFAULT_BOOTSTRAP,
) -> Generator[duckdb.DuckDBPyConnection]:
    """Retrieve thread-safe cursor for DuckDB."""
    cursor = _connections.get(database, bootstrap).cursor()
    try:
        yield cursor
    finally:
        cursor.close()

Which means the business logic gets to stay boring, exactly what you want from business logic:

python

import db.duckdb

with db.duckdb.cursor() as cursor:
    cursor.execute("SELECT * FROM pg_replica.agent_results LIMIT 100")

All the complexity — extension loading, credential fetching, locking — is hidden from the consumer. They ask for a cursor; they get one that works.

Three abstractions for different access patterns

Lambda functions do very different things — from a quick API endpoint that fetches a handful of rows to a heavy background batch job that chews through an ETL. One connection model doesn't fit both, so we expose three abstractions, each tuned to a different shape of work.

`cursor()`

For quick, short-lived operations like fetching a few rows for an API response, we provide the cursor() context manager. It taps into the shared, thread-safe connection pool (the multiton _Connection class), so you skip the overhead of standing up a new DuckDB instance every time.

python

with db.duckdb.cursor() as cursor:
    cursor.execute("SELECT count(*) FROM pg_replica.events")

`session()`

When a background job needs to do heavy ETL, attach specific local files, or change settings without polluting a shared connection, it reaches for session(). This hands back a fresh, dedicated DuckDB connection that gets torn down cleanly when the block exits. Nothing it does leaks into the shared pool.

python

with db.duckdb.session(bootstrap=("postgres", "s3")) as cursor:
    cursor.execute("ATTACH '/tmp/evals.duckdb'")
    # Run heavy reporting queries...

`DuckDB`

For complex orchestrations — where a connection needs to be passed across multiple functions or files — we expose the DuckDB class. This gives you dependency injection and explicit lifecycle control: the handler can configure the database with clean_on_start=True and tear it down with database.destroy() at the very end of the run.

python

database = DuckDB(db_name="base_runners.duckdb", clean_on_start=True)
try:
    generate_reports(database)
finally:
    database.destroy()

Three abstractions, one underlying engine: a shared shot for the rush, a dedicated brew for the slow pour-over, and a full setup when you're orchestrating a tasting flight.

Closing

DuckDB is excellent for analytical workloads: fast, zero-infrastructure, and increasingly capable of querying across storage systems without standing up a warehouse or dedicated server first. The friction in serverless is that you have to treat it like the stateful, in-process engine it is, and wrap Lambda's stateless assumptions around it carefully.

Good tooling is mostly good defaults, and good defaults are mostly someone caring enough to measure and decide on your behalf. It's the same reason an automatic coffee machine feels effortless: all the precision happened before you pressed the button.

There's still room to make this better: I'd love to put real cold-start numbers behind the extension bundling, and the bootstrap steps could probably be lazier. Lots of room for improvement, as always. But for our needs it works well, and it's been a fun problem to chew on. The wrapper pattern is simple: it covers the gap between "works on my laptop" and "works at 3am when traffic spikes and a fresh container has to wake up cold."

Stay tuned, and thanks for reading! Now, coffee time. Enjoy!

Coffee Time: Cold Brew Concentrate

Method: Cold brew (immersion)
Brew time: 16 hours

Grind: Coarse, like raw sugar
Ratio: 1:5 concentrate

Steps

Add 100g of coarsely ground coffee to a jar or large French press
Pour 500ml of room-temperature filtered water, making sure all the grounds are saturated
Give it a gentle stir and cap it loosely
Let it steep in the fridge for 16 hours
Strain through a paper filter or a fine mesh into a clean container
Dilute the concentrate 1:1 with water or milk over ice to taste

A drink bootstrapped the night before. ☕️

Learning More 📚

DuckDB Extensions overview: the canonical list and how loading/installing actually works.
DuckDB Postgres extension: the ATTACH semantics behind our read-only pg_replica.
DuckDB httpfs extension: reading directly from S3 and other object stores.
AWS Lambda Powertools — Parameters: the caching SSM fetch we use for secrets.
MotherDuck docs: the cloud side of the DuckDB story.

DuckDB on Serverless
Field notes on the most portable database

Intro

Why DuckDB and serverless seem incompatible

Bundling extensions for faster cold starts

Bootstrapping and secret management

Thread safety and lifecycle management

Three abstractions for different access patterns

`cursor()`

`session()`

`DuckDB`

Closing

Coffee Time: Cold Brew Concentrate

Steps

Learning More 📚

Join the Conversation via Github and Giscus

DuckDB on Serverless Field notes on the most portable database

Intro ​

Why DuckDB and serverless seem incompatible ​

Bundling extensions for faster cold starts ​

Bootstrapping and secret management ​

Thread safety and lifecycle management ​

Three abstractions for different access patterns ​

cursor() ​

session() ​

DuckDB ​

Closing ​

Coffee Time: Cold Brew Concentrate

Steps

Learning More 📚 ​

Join the Conversation via Github and Giscus

DuckDB on Serverless
Field notes on the most portable database

Intro

Why DuckDB and serverless seem incompatible

Bundling extensions for faster cold starts

Bootstrapping and secret management

Thread safety and lifecycle management

Three abstractions for different access patterns

`cursor()`

`session()`

`DuckDB`

Closing

Learning More 📚