After fixing a file descriptor reuse race that caused 50% data loss during PostgreSQL initdb, we ran our benchmark suite on two machines. The numbers below document where we are. Some of them are good, some reflect the constraints of the hardware, and all of them are honest.
Test machines
The Mac is an Intel Core i7-9750H at 2.60 GHz with 6 cores and 12 threads, 32 GB of RAM, and a 500 GB NVMe SSD over PCIe. It runs macOS with macFUSE.
The Linux machine is a Contabo VPS with 4 vCPUs (Intel Xeon Gold 6140 at 2.30 GHz, single-threaded), 16 GB of RAM, and a 96 GB virtual disk that reports as rotational. The VPS shares its physical CPU and disk I/O with other tenants. IOPS are throttled by the hypervisor, which affects write-heavy workloads. It runs Ubuntu 24.04 with libfuse.
Both machines run Go 1.24. KEIBIDROP negotiates AES-256-GCM on both since the i7-9750H and the Xeon Gold 6140 both have AES-NI.
Encrypted channel throughput
The raw encrypted pipe, measured in isolation without FUSE or disk I/O.
| Metric | Mac | VPS |
|---|---|---|
| SecureConn 1 MiB blocks | 1,707 MB/s | 460 MB/s |
| SecureConn 16 MiB blocks | 1,740 MB/s | 502 MB/s |
| Raw net.Pipe (no encryption) | 19,432 MB/s | 4,901 MB/s |
Encryption costs about 10x compared to raw memory copies on both machines. The absolute throughput exceeds what any realistic network link between two peers will deliver, so the crypto is not the bottleneck for WAN transfers.
File transfer without FUSE
Encrypted gRPC streaming, peer-to-peer over loopback. This is the PullFile path: one peer has a file, the other requests it. Measures gRPC + encryption + disk write.
| Size | Mac | VPS |
|---|---|---|
| 10 MB | 290 MB/s | 100 MB/s |
| 100 MB | 547 MB/s | 213 MB/s |
| 1 GB | 585 MB/s | 288 MB/s |
The VPS plateaus at 288 MB/s, limited by virtual disk write throughput.
Round-trip transfer: write on one peer, pull from the other
Alice (FUSE) writes a file to her mount. The FUSE handler writes to the backing store and sends a notification to Bob (no-FUSE) over the encrypted gRPC channel. Bob receives the notification, then pulls the full file from Alice. Both peers run on the same machine over loopback, so network latency is zero. The timer starts before Alice's write and stops after Bob has the complete file on his disk.
| Size | FUSE Write | Pull (gRPC) | Total | Total MB/s |
|---|---|---|---|---|
| 1 MB | 494 MB/s | 99 MB/s | 315 ms | 3.2 MB/s |
| 10 MB | 736 MB/s | 244 MB/s | 358 ms | 28 MB/s |
| 100 MB | 1,018 MB/s | 547 MB/s | 584 ms | 171 MB/s |
| 1 GB | 756 MB/s | 498 MB/s | 3,714 ms | 276 MB/s |
The 1 MB total of 3.2 MB/s is almost entirely notification overhead. The file itself transfers in microseconds, but the FUSE Release handler runs a deferred notification path that lstats the file multiple times to ensure the size has stabilized (a workaround for macOS fcopyfile behavior). At 100 MB, this fixed latency cost becomes negligible and the total throughput reaches 171 MB/s.
The reverse direction (Bob adds file via API, Alice reads full file from FUSE mount, streaming from Bob over gRPC) shows different characteristics because it skips the deferred notification path:
| Size | MB/s |
|---|---|
| 1 MB | 154 MB/s |
| 10 MB | 261 MB/s |
| 100 MB | 227 MB/s |
| 1 GB | 259 MB/s |
The asymmetry is expected. Writing through FUSE is fast (kernel page cache), but the notification adds latency. Reading through FUSE triggers an on-demand gRPC stream, which starts immediately.
Where the time goes (100 MB read through FUSE)
| Layer | Duration | Overhead |
|---|---|---|
| Encrypted gRPC alone | 210 ms | baseline |
| + copy into user buffer | 210 ms | 0.1% |
| + pwrite to local cache | 241 ms | 8.2% |
| Full FUSE end-to-end | 371 ms | 35% |
The FUSE kernel overhead is 35% of total time for a 100 MB read. Each read crosses the kernel-userspace boundary twice: once from the reading process into the kernel, once from the kernel into our FUSE handler, and back. This cost is fixed per operation, not per byte, so it amortizes over larger reads.
Latency
FUSE mount operations on Mac, measured per-file:
| Size | Create+Write | Read | Total |
|---|---|---|---|
| 1 KB | 914 us | 412 us | 1.3 ms |
| 1 MB | 2.2 ms | 1.8 ms | 3.9 ms |
For comparison, the same operations on local disk without FUSE: 1 KB takes 1.3 ms total, 1 MB takes 1.1 ms. FUSE adds 1 to 3 ms per operation, which is the kernel round-trip cost.
Open/Close latency for 100 iterations: average open 218 us, average close 99 us.
Optimal block size and worker count
Block size sweep on Mac (no-FUSE PullFile): 256 KiB averages 406 MB/s, 1 MiB averages 571 MB/s, 4 MiB averages 657 MB/s, 16 MiB averages 690 MB/s. Diminishing returns above 4 MiB.
Worker count sweep on Mac: 1 worker does 544 MB/s, 4 workers peak at 741 MB/s, 8 workers drop to 523 MB/s. On the VPS: 1 worker does 153 MB/s, 4 workers peak at 329 MB/s, 8 workers drop to 301 MB/s. Four workers is the sweet spot on both machines. Beyond that, goroutine scheduling overhead and lock contention eat the parallelism gains.
PostgreSQL on FUSE
The reason we ran these benchmarks. After fixing the fd reuse race, PostgreSQL runs on the FUSE mount without data corruption.
On the Linux VPS we ran the following sequence: initdb (968 files, 168 legitimately empty, matching native exactly), pg_ctl start, CREATE TABLE with three indexes, INSERT 10,000 rows, INSERT 5,000 more rows interleaved with UPDATEs (archiving completed orders every 100 iterations), VACUUM ANALYZE, aggregation queries over 15,000 rows, clean shutdown. Final state: 980 files, 42 MB on the backing store.
POSIX compliance
We run the pjdfstest suite on Linux against the FUSE mount. Excluding symlink and hardlink tests (which we do not support, intentionally, because symlinks in a P2P sync context create path traversal risks):
1,256 passed out of 1,657 tests. 75.8% pass rate.
The remaining failures are chown tests that require root privileges and rename/unlink edge cases involving nlink counts, which are hardlink semantics. We do not plan to add hardlink or symlink support.
What these numbers mean in practice
For our target use case of syncing files between two machines, the throughput is sufficient. A medium git repository clones between peers in seconds. PostgreSQL runs with full ACID guarantees on the FUSE mount. The FUSE kernel overhead of 35% is a fixed cost of the architecture and we cannot reduce it without moving to a kernel-native filesystem, which would eliminate the cross-platform advantage.
The bottleneck for real WAN transfers between two machines will be the network bandwidth, not the crypto or the FUSE overhead. A 100 Mbit/s connection tops out at 12.5 MB/s, well below any of the numbers above. We have not yet benchmarked over a real WAN link, only over loopback. That test is planned.