Skip to content

DuckDB on Serverless
Field notes on a database that wakes up cold

Intro

You brew install duckdb, fire up the CLI, ATTACH a database, and query an S3 Parquet file — and the whole thing feels almost unfairly fast. On a laptop, DuckDB is a joy: no server to run, no warehouse to stand up first, just a binary and your data. Then you drop that same code into an AWS Lambda handler, deploy it, and the first invocation either times out fetching an extension, crashes because it can't write to its home directory, or trips over its own connection state when the container gets reused.

DuckDB isn't doing anything wrong here — it's just built for a persistent process, and Lambda is cold, short-lived, and reused in ways you don't control. We've been running DuckDB in production on Lambda for a while now — querying our S3 data lake, attaching to our Postgres read-replica, and reaching into MotherDuck — and these are the field notes: the three sharp edges we hit most often, and the small Python wrapper we built so the rest of the team never has to think about any of them.

Think of it like dialing in a grinder. Most of the work is invisible once it's set — and that's the point.

Why DuckDB and serverless seem incompatible

DuckDB is built for the laptop case. It assumes a persistent process, a stable filesystem, and a connection you set up once and keep. Lambda offers none of those for free.

You're not the first to feel the mismatch. A widely-cited DuckDB discussion puts it bluntly — "AWS Lambda is currently not a good environment for running DuckDB, at least if the tables to be processed won't fit for sure in the RAM" — after people hit OutOfMemory errors on multi-gigabyte tables, and even tobilg's reference setup admits it isn't the most performant option when Fargate and big EC2 boxes exist. Fair enough. The trick is staying in the right lane: we use DuckDB for read-heavy analytical queries whose results fit in memory — aggregations over S3 Parquet and the Postgres replica — not for grinding tens of gigabytes through a GROUP BY. In that lane, cold starts and statefulness are the problems worth solving, and they're solvable.

A Lambda execution environment is cold on first use — no memory of any previous run — and its filesystem is read-only except for /tmp. It's also reused: after the first invocation, AWS keeps the environment warm and runs your next invocation in the same process, with whatever module-scope state you left behind. One correction, because it's widely misunderstood: an environment handles one invocation at a time — concurrency comes from AWS spinning up more of them, not from two requests fighting inside one process. The in-process trouble only shows up when your own code shares a connection across threads.

So the three things to be deliberate about are: getting extensions in without a runtime download, wiring up credentials without paying for it on every cold start, and handing out connections without leaking them or sharing them unsafely. None of it is glamorous. All of it compounds.

Bundling extensions for fast cold starts

DuckDB's extension ecosystem is one of the reasons we reach for it. httpfs gets us S3, postgres_scanner attaches Postgres, and the motherduck extension connects us to the cloud. Normally you'd just run INSTALL httpfs; LOAD httpfs; and move on.

The catch is that INSTALL reaches out over the network to download the extension binary from extensions.duckdb.org. On a laptop that's a one-time blip you never notice. In a Lambda cold start it's unpredictable latency you pay every time the runtime spins up a fresh container — and in a locked-down VPC, that outbound call may simply fail.

So we don't do it at runtime. We pre-bake the exact extensions we need into the Lambda container image during the Docker build, matching our architecture and DuckDB version, and drop them into /opt/duckdb_extensions:

dockerfile
# Pre-bake DuckDB extensions so the runtime doesn't need outbound HTTP to extensions.duckdb.org.
# DUCKDB_VERSION MUST match the `duckdb==` pin in pyproject.toml 
ARG DUCKDB_VERSION=1.5.3
ARG DUCKDB_PLATFORM=linux_arm64
ENV DUCKDB_EXTENSIONS_DIR=/opt/duckdb_extensions
COPY scripts/install_duckdb_extensions.py /tmp/install_duckdb_extensions.py

RUN python3 /tmp/install_duckdb_extensions.py \
    --version "${DUCKDB_VERSION}" \
    --platform "${DUCKDB_PLATFORM}" \
    --target "${DUCKDB_EXTENSIONS_DIR}" \
    && rm /tmp/install_duckdb_extensions.py

That version pin matters — the comment is there because the extension binaries are built against a specific DuckDB version, and a mismatch fails at load time. The wrapper then intercepts the extension loading so application code never has to think about where the binary lives:

python
# Extensions occasionally use different canonical names on the CDN
_EXTENSION_FILENAMES = {
    "postgres": "postgres_scanner",
}

def _load_extension(curr: duckdb.DuckDBPyConnection, name: str) -> None:
    filename = _EXTENSION_FILENAMES.get(name, name)
    
    # Look for the extension in our pre-baked directory
    bundled = pathlib.Path(f"/opt/duckdb_extensions/{filename}.duckdb_extension")
    
    if bundled.exists():
        # Fast path: load the local binary
        curr.execute(f"LOAD '{bundled}';")
        return
        
    # Fallback: network install (useful for local development!)
    curr.execute(f"INSTALL {name}; LOAD {name};")

A small detail worth calling out — the name you ask for isn't always the filename on disk. The postgres extension ships as postgres_scanner, so we keep a _EXTENSION_FILENAMES map to translate the exceptions. The fallback INSTALL path is deliberate too: it's the network install that makes local development painless when you don't have the pre-baked directory on your laptop. Bundled in production, dynamic locally — you get the cold-start savings where you need them and the flexibility where you want it.

If you want a feel for the cold-start difference the bundling buys you, the honest answer is measure it on your own image — the gap is real, but the size depends on your extensions and region.

Bootstrapping and secret management

When our application code asks for a database cursor, it shouldn't have to think about fetching credentials or attaching external databases. That's plumbing, and plumbing should be invisible. So we built a bootstrapping system that wires up the external connections on demand.

We use aws_lambda_powertools to pull secrets from AWS Systems Manager (SSM) Parameter Store — it caches them, which keeps us from hammering SSM on every warm invocation. When a connection is bootstrapped, we automatically:

  1. Attach Postgres. We fetch the read-replica password from SSM and run an ATTACH to map our production Postgres database as a read-only DuckDB database, pg_replica. We also inject aggressive keepalives settings into the connection string so the connection doesn't get dropped out from under a long-running query.
  2. Configure S3 credentials. We load the httpfs and aws extensions and set up a credential chain so DuckDB can read directly from our data lake.
  3. Connect to MotherDuck. We fetch our MotherDuck token and initialize the extension for cloud-based data warehousing.

By rolling this into a DEFAULT_BOOTSTRAP tuple, any developer who simply requests a cursor gets a connection already wired up to the rest of our infrastructure — securely, with no copy-pasted credential code. It's the difference between grinding your beans fresh every morning and having the grounds measured out the night before.

Thread safety and lifecycle management

Here's the edge that's easiest to get wrong. Because a warm Lambda environment reuses the same process across invocations, anything you cache at module scope outlives the request that created it. And the moment your code touches that cached connection from more than one thread — a background worker, an async fan-out, a library that threads under the hood — DuckDB's in-process model needs guarding. It was built assuming a single owner.

So we don't hand out raw connections. We built a multiton _Connection class, guarded by a threading.Lock(), that caches the underlying DuckDB connections — one per logical database — and creates them safely under the lock. On top of that we expose clean context managers for application code:

python
@contextlib.contextmanager
def cursor(
    database: Databases = "default",
    *,
    bootstrap: tuple[BootstrapStep, ...] = DEFAULT_BOOTSTRAP,
) -> Generator[duckdb.DuckDBPyConnection]:
    """Retrieve thread-safe cursor for DuckDB."""
    cursor = _connections.get(database, bootstrap).cursor()
    try:
        yield cursor
    finally:
        cursor.close()

Which means the business logic gets to stay boring — exactly what you want from business logic:

python
import db.duckdb

with db.duckdb.cursor() as cursor:
    cursor.execute("SELECT * FROM pg_replica.users LIMIT 10")

All the complexity — extension loading, credential fetching, connection locking — is hidden from the consumer. They ask for a cursor; they get one that works.

Three abstractions for different access patterns

Lambda functions do wildly different things — from a quick API endpoint that fetches a handful of rows to a heavy background batch job that chews through an ETL. One connection model doesn't fit both, so we expose three abstractions, each tuned to a different shape of work.

cursor()

For quick, short-lived operations like fetching a few rows for an API response, we provide the cursor() context manager. It taps into the shared, thread-safe connection pool (the multiton _Connection class), so you skip the overhead of standing up a new DuckDB instance every time.

python
with db.duckdb.cursor() as cursor:
    cursor.execute("SELECT count(*) FROM pg_replica.events")

session()

When a background job needs to do heavy ETL, attach specific local files, or change settings without polluting a shared connection, it reaches for session(). This hands back a fresh, dedicated DuckDB connection that gets torn down cleanly when the block exits — nothing it does leaks into the shared pool.

python
with db.duckdb.session(bootstrap=("postgres", "s3")) as cursor:
    cursor.execute("ATTACH '/tmp/evals.duckdb'")
    # Run heavy reporting queries...

DuckDB

For complex orchestrations — where a connection needs to be passed across multiple functions or files — we expose the DuckDB class. This gives you dependency injection and explicit lifecycle control: the handler can configure the database with clean_on_start=True and tear it down with database.destroy() at the very end of the run.

python
database = DuckDB(db_name="base_runners.duckdb", clean_on_start=True)
try:
    generate_reports(database)
finally:
    database.destroy()

Three abstractions, one underlying engine — a shared shot for the rush, a dedicated brew for the slow pour-over, and a full setup when you're orchestrating a tasting flight.

The bigger point

DuckDB is genuinely excellent for analytical workloads — fast, zero-infrastructure, and increasingly capable of querying across storage systems without standing up a warehouse first. The friction in serverless isn't a sign it doesn't belong there. It's a sign you have to treat it like the stateful, in-process engine it is, and wrap Lambda's stateless assumptions around it carefully.

There's a craft angle here I keep coming back to. The interesting work wasn't writing clever SQL; it was building the defaultsDEFAULT_BOOTSTRAP, the pre-baked directory, the right context manager for the right job — so the obvious thing to do is also the correct thing. Good tooling is mostly good defaults, and good defaults are mostly someone caring enough to measure and decide on your behalf. It's the same reason a dialed-in espresso machine feels effortless: all the precision happened before you pressed the button.

There's still room to make this better — I'd love to put real cold-start numbers behind the extension bundling, and the bootstrap steps could probably be lazier. Lots of room for improvement, as always. But for our needs it works well, and it's been a fun problem to chew on. The wrapper pattern isn't exotic; it's just the gap between "works on my laptop" and "works at 3am when traffic spikes and a fresh container has to wake up cold."

Stay tuned, and thanks for reading! Now, coffee time. Enjoy!

Coffee Time: Cold Brew Concentrate
Method: Cold brew (immersion)
Brew time: 16 hours
Grind: Coarse, like raw sugar
Ratio: 1:5 concentrate — 100g coffee for 500ml water
Steps
  1. Add 100g of coarsely ground coffee to a jar or large French press
  2. Pour 500ml of room-temperature filtered water, making sure all the grounds are saturated
  3. Give it a gentle stir and cap it loosely
  4. Let it steep in the fridge for 16 hours — patience, like a warm Lambda, rewards you later
  5. Strain through a paper filter or a fine mesh into a clean container
  6. Dilute the concentrate 1:1 with water or milk over ice to taste
  7. Keeps in the fridge for about a week, ready to pour cold the moment you need it

A connection pool you can drink — bootstrapped the night before. ☕️

Learning More 📚


Join the Conversation via Github and Giscus

Last updated: