We Tried to Run PostgreSQL Across Two Machines Using KeibiDrop

We connected an Intel Mac in Iasi to a Linux VPS in Timisoara over an encrypted P2P channel, ran PostgreSQL initdb on the Linux side, and watched 977 files appear on the Mac in real time. Then we tried to start PostgreSQL on the Mac using those synced files. It almost worked.

This post documents what happened, what we learned, and what remains to be done.

The setup

KEIBIDROP is a peer-to-peer file sharing tool we are building. Two machines connect via a relay for signaling, perform a handshake using pre-exchanged fingerprints, and establish an encrypted gRPC channel (AES-256-GCM or ChaCha20-Poly1305, negotiated based on hardware support). Each peer mounts a FUSE virtual filesystem. Files written on one mount appear on the other.

For this experiment we used the kd CLI, which is the non-interactive interface designed for scripting. Alice ran on a Contabo VPS in Timisoara (4 vCPUs, 16 GB RAM, Ubuntu 24.04). Bob ran on an Intel MacBook Pro in Iasi (i7-9750H, 32 GB RAM, macOS). The two cities are about 600 km apart with ~15 ms RTT. Both connected through the bridge relay at bridge.keibisoft.com:26600 since the Mac sits behind NAT and cannot accept inbound connections.

# Alice (Linux VPS)
KD_RELAY=https://keibidroprelay.keibisoft.com/ \
KD_BRIDGE=bridge.keibisoft.com:26600 \
KD_MOUNT_PATH=/tmp/kd-alice-mount \
KD_SAVE_PATH=/tmp/kd-alice-save \
kd start

# Bob (Mac)
KD_RELAY=https://keibidroprelay.keibisoft.com/ \
KD_BRIDGE=bridge.keibisoft.com:26600 \
KD_MOUNT_PATH=/tmp/kd-bob-mount \
KD_SAVE_PATH=/tmp/kd-bob-save \
kd start

After exchanging fingerprints and connecting, both peers reported connection_status: healthy and connection_mode: bridge.

What worked

We ran initdb on Alice's FUSE mount as the postgres user. It created the full database cluster: 968 files across the directory tree, configuration files, WAL segments, system catalogs. This was the same operation that failed 90% of the time a day earlier due to a file descriptor reuse bug (described in a separate post). After fixing that bug, initdb completes reliably.

We started PostgreSQL, created a table with three indexes, inserted 10,000 rows, then ran 5,000 more inserts interleaved with UPDATE statements that archived completed orders every 100 rows. We ran VACUUM ANALYZE. We queried aggregates across 15,000 rows. Everything returned correct results. We shut down PostgreSQL cleanly.

On Alice's backing store: 980 files, 42 MB.

On Bob's FUSE mount: 977 files appeared. Bob could see the entire directory tree, PG_VERSION, postgresql.conf, the base/ and global/ directories, WAL segments, everything. The file count was visible within seconds of the files being created on Alice. 812 files had been fully downloaded to Bob's local save directory (41 MB), with the rest available for on-demand streaming from Alice.

What did not work

We stopped PostgreSQL on Alice. We removed the stale postmaster.pid from Bob's mount (this lock file contains Linux-specific PID and shared memory information that is meaningless on the Mac). We set directory permissions to 0700, which PostgreSQL requires. We ran pg_ctl start on Bob's mount.

PostgreSQL started, bound to its ports, and began reading configuration. Then it stopped:

FATAL: could not load /tmp/kd-bob-mount/pgdata/pg_hba.conf

The file existed on Bob's mount but contained 0 bytes. On Alice, the same file was 5,711 bytes (126 lines). The content never made it across.

We checked Bob's save directory. The file was there, also 0 bytes. It had been created as a placeholder when Alice's notification arrived, but the actual content was never streamed. postgresql.conf (29,959 bytes) transferred correctly. Most of the binary catalog files transferred correctly. But pg_hba.conf and an unknown number of other small files arrived empty.

Why some files arrive empty

We have not fully diagnosed this yet, but the likely cause involves the notification timing during initdb. PostgreSQL's initdb creates a file, writes its content, and often closes and reopens it in rapid succession. KEIBIDROP sends an ADD_FILE notification to the peer when a file is released (closed). If the notification includes the file's size at close time, and a subsequent write happens after the notification was sent but before the peer starts streaming, the peer may receive a stale size of 0.

There is also a debounce mechanism (200ms per path) that coalesces rapid ADD_FILE/EDIT_FILE notifications. If the initial create (size 0) and the final write (size 5711) are debounced into one notification, the notification might carry the size from the first event.

We need to trace the notification flow for these specific files to confirm which scenario applies. The fix is likely straightforward once we know: either the notification must re-stat the file after the debounce window, or the streaming must always read the file at request time rather than trusting the notified size.

The PostgreSQL version question

Both machines ran PostgreSQL 16 (16.13 on Linux, 16.11 on Mac). Both are x86_64 and little-endian, so the binary data format is compatible. If we had an ARM Mac, the situation would be the same since PostgreSQL's on-disk format is architecture-independent within the same major version.

        The viable use case is failover: run on machine A, shut down, let the files sync, start on machine B. This is useful for migrating a database between machines without a dump/restore cycle.
      

The bridge relay

The two machines communicate through a bridge relay because the Mac sits behind NAT. Both peers connect outbound to bridge.keibisoft.com:26600. The bridge forwards encrypted traffic between them. Neither peer needs open inbound ports. The relay cannot read the traffic since it is encrypted end-to-end with keys derived from the fingerprint exchange.

This adds latency compared to a direct connection. At ~15 ms RTT between Iasi and Timisoara, the bridge overhead is small relative to the base network latency. For the small metadata files that failed to sync, latency is not the issue.

What we measured

On a standalone FUSE mount (no peer, local only), the numbers from the same day:

PostgreSQL initdb: 968 files created correctly, matching a native (non-FUSE) initdb exactly (168 legitimately empty placeholder files in both cases).
Full database lifecycle: initdb, start, CREATE TABLE with 3 indexes, INSERT 15,000 rows with concurrent UPDATEs, VACUUM ANALYZE, aggregation queries, clean shutdown. All on FUSE.
fstest POSIX compliance: 75.8% pass rate (1256/1657 tests, excluding symlink and hardlink tests which we intentionally do not support for security reasons in a P2P sync context).

Bugs uncovered

This experiment surfaced two bugs, one fixed and one open.

The first was the file descriptor reuse race. POSIX mandates that open() returns the lowest-numbered unused fd. When our FUSE handler closed fd 42 in Release and another goroutine immediately opened a new file, the kernel handed out fd 42 again. Our OpenFileHandlers map was keyed by fd number, so the new file overwrote the old entry. Any in-flight Write still referencing fd 42 would write data to the wrong file. During initdb, which creates 224 files in under a second, this corrupted roughly half the files. The fix was to replace fd numbers with monotonically increasing opaque handle IDs using an atomic counter. About 80 lines of code across four files. We verified with a stress test: 250 concurrent goroutines doing Create/Write/Release, ten runs with the Go race detector, zero cross-contamination.

The second bug is the 0-byte file sync. Some files created on Alice arrive on Bob as empty files. The file metadata (name, path, existence) propagates correctly, but the content does not stream. We observed this on pg_hba.conf (5,711 bytes on Alice, 0 on Bob) while postgresql.conf (29,959 bytes) transferred fine. Both are created during initdb in the same burst of writes. We have not identified the exact cause yet. Candidates include the 200ms per-path debounce window swallowing a size update, the notification carrying a stale size from before the final write, or a race between the ADD_FILE handler creating a 0-byte placeholder and the peer's prefetch stream starting before the content is fully flushed on the source side. This bug only manifests under high-throughput cross-peer sync. The standalone FUSE mount does not exhibit it because there is no peer notification path.

Where we are

The FUSE filesystem can run PostgreSQL locally without data corruption. This was blocked for weeks by a file descriptor reuse race condition that caused 50% data loss during high-throughput file creation. That bug is fixed.

The cross-peer sync gets most of the files across correctly. Out of 977 files, 812 were fully downloaded within seconds. The remaining 165 were available for on-demand streaming. But some files, including at least pg_hba.conf, arrived with 0 bytes of content. Until this is fixed, the cross-peer PostgreSQL failover scenario is not reliable.

The gap between "PostgreSQL works on local FUSE" and "PostgreSQL works across peers" is a notification timing bug, not an architectural limitation. We expect it to be a small fix in the debounce or notification path. We will document the resolution when we have it.