ProFor.Pro Profor.Pro
profor.pro / notes / postgres-logical-replication-office-move
Databases 5 min read

Postgres Logical Replication for Office Moves: Zero-Downtime Migration on a Budget

How we use Postgres logical replication to move databases between offices, clouds, or major versions with seconds of cutover, no enterprise tooling, and no late-night drama.

Office moves and provider switches are where databases get scary. The web tier you can re-deploy in minutes; the 200 GB Postgres cluster is the thing that decides how long the maintenance window actually is. Logical replication, built into Postgres since 10, is the cheapest tool we have for shrinking that window from hours to seconds. No Bucardo, no pglogical extension, no paid agents.

This is the runbook we use for office-to-office and on-prem-to-cloud moves. It assumes Postgres 12+ on both sides and a network path between them.

When logical beats physical

Physical replication (streaming, pg_basebackup) is simpler but rigid:

  • Source and replica must run the same major version.
  • The whole cluster ships, including roles and other databases.
  • You cannot write to the replica until you promote it.

Logical replication ships row changes per table, which buys us:

  • Cross-version migrations (e.g., 13 to 16) without a dump/restore window.
  • Selective tables — leave audit logs behind, migrate only what matters.
  • A writable target during the sync, so we can pre-create indexes, run smoke tests, even point read-only services at it early.

The price: no DDL replication, every replicated table needs a primary key (or REPLICA IDENTITY FULL), and large initial copies can take a while.

Preconditions on the source

# postgresql.conf
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10

A restart is required to flip wal_level. Plan that ahead of the move; this is the only mandatory restart on the source.

Then in pg_hba.conf, allow the destination to connect as a replication-capable user:

host   appdb   repl_user   10.0.0.0/8   scram-sha-256

Create the user and grant it what it needs:

CREATE ROLE repl_user WITH LOGIN REPLICATION PASSWORD '...';
GRANT CONNECT ON DATABASE appdb TO repl_user;
GRANT USAGE ON SCHEMA public TO repl_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO repl_user;

Bootstrap the target

Logical replication does not copy schema. We dump it from the source:

pg_dump -h source.local -U postgres \
  --schema-only --no-publications --no-subscriptions \
  appdb > schema.sql

psql -h target.local -U postgres -d appdb -f schema.sql

A few things we always check before going further:

  • Every table to be replicated has a primary key. For the few that do not, either add one or set ALTER TABLE x REPLICA IDENTITY FULL (slower, but works).
  • Sequences are present but not advanced by logical replication. We will fix them at cutover.
  • Extensions exist on both sides at compatible versions.

Wire it up

On the source:

CREATE PUBLICATION move_pub FOR ALL TABLES;

On the target:

CREATE SUBSCRIPTION move_sub
  CONNECTION 'host=source.local dbname=appdb user=repl_user password=...'
  PUBLICATION move_pub
  WITH (copy_data = true, create_slot = true, slot_name = 'move_sub');

At this point Postgres starts a parallel initial copy, then streams changes from the slot. Watch progress:

-- on the target
SELECT subname, srrelid::regclass, srsubstate
FROM pg_subscription_rel
JOIN pg_subscription ON oid = srsubid;

-- on the source
SELECT slot_name, active, confirmed_flush_lsn,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) AS lag
FROM pg_replication_slots;

srsubstate walks through i (initializing), d (data copy), s (synchronized), r (ready). When everything is r and the lag stays in the kilobytes, you are ready to cut over whenever you want.

Watch the slot. If the subscription is stopped or the network drops for a long weekend, WAL piles up on the source and can fill the disk. Set up a monitor on pg_replication_slots.lag.

The 30-second cutover

The actual move looks like this:

  1. Stop application writes (we usually do it at the load balancer or by revoking connect on the app role).
  2. Wait until source pg_current_wal_lsn() equals target pg_last_wal_replay_lsn() for the slot. Typically a second or two.
  3. Bump sequences on the target:
-- run on source, execute output on target
SELECT format('SELECT setval(%L, %s);', seq, last_value)
FROM (
  SELECT schemaname||'.'||sequencename AS seq, last_value
  FROM pg_sequences
) s;
  1. Drop the subscription cleanly on the target:
ALTER SUBSCRIPTION move_sub DISABLE;
ALTER SUBSCRIPTION move_sub SET (slot_name = NONE);
DROP SUBSCRIPTION move_sub;
  1. Repoint the application DNS or connection string and re-enable writes.

In our office moves the actual write-blocked window has consistently been under a minute, dominated by the time it takes humans to verify steps, not by Postgres.

Gotchas we have hit

  • DDL during the sync. Adding a column on the source mid-migration will not propagate. We freeze migrations during the cutover week, or apply schema changes manually on both sides in the right order (target first for additive, source first for destructive).
  • Large tables choke the initial copy. For multi-hundred-GB tables, we exclude them from the publication, copy them out-of-band with COPY over a fast link during off-hours, then add them with ALTER PUBLICATION ... ADD TABLE and ALTER SUBSCRIPTION ... REFRESH PUBLICATION.
  • TOAST and REPLICA IDENTITY FULL. On wide tables, FULL identity makes updates expensive on the source. Add a primary key instead if you can.
  • Foreign keys and triggers on the target. During catch-up, the target applies changes in commit order, which is usually fine, but user-defined triggers fire by default. Disable them on the target with ALTER TABLE ... DISABLE TRIGGER USER until cutover, then re-enable.
  • Sequences again. We have been bitten more than once by forgetting to advance them. The first insert after cutover then collides with an existing PK. Make this step a checklist item, not a memory item.

Rollback plan

Until you drop the subscription and start writing to the target, rollback is free: re-point the app at the source. After cutover, rollback means setting up replication in the opposite direction, which logical replication also supports — but only changes from that point forward will flow back. If you anticipate needing a real rollback window, set up the reverse publication/subscription before the cutover and just do not enable it.

Why we like this for small budgets

No licenses. No extensions to compile. The only moving parts are two config flags, a publication, and a subscription. For a typical 50–500 GB office database, one engineer can plan, rehearse, and execute the move in a week, with a cutover that fits inside a coffee break. That is hard to beat.

— need similar work done?

If what's described here resembles your situation — drop us a line. We'll talk through how we'd approach it for you.