b5a7a3e. Findings: 30 to 38 MB/s into the Windows host and 13 to 25 MB/s out; connect time about 6.3 s, dominated by an IPv6 dial timeout before bridge fallback; FUSE first byte under 1.5 s with sequential on-demand read near 1 MB/s. The throughput ceiling is the bandwidth-delay product, not link capacity or CPU. FUSE mode did not start on the Windows host on this build; a fix is in review.
Setup
Three machines, two pairings. One end is always the Singapore box.
| SG | Singapore, Windows Server, 4 vCPU / 8 GB, WinFsp installed. IPv4 only. |
| VPS | Timisoara, Linux. Also runs the relay and the TCP bridge. |
| laptop | Iasi, Linux, residential link about 500 to 600 Mbps wired (62 to 75 MB/s). |
| Pairing A | VPS to SG, about 200 ms RTT, datacenter to datacenter. |
| Pairing B | laptop to SG, about 330 ms RTT, home to datacenter. |
Method: each cell is the median across repeated trials, with the trial count n shown in every table: 5 at 10 and 100 MB, 3 at 1 GB, 1 at 5 and 10 GB for bulk; 10 for handshake; 3 for FUSE. All three peers run the same commit. X to Y means X sends and Y receives. Bulk transfer uses kd add then a timed pull, no FUSE in the path.
Bulk throughput
Pairing A, VPS to SG, about 200 ms
| Transfer | Size | n | Median MB/s | p95 |
|---|---|---|---|---|
| SG to VPS | 10 MB | 5 | 25.6 | 28.0 |
| VPS to SG | 100 MB | 5 | 35.5 | 41.5 |
| VPS to SG | 1 GB | 3 | 37.6 | 40.4 |
| SG to VPS | 1 GB | 3 | 13.2 | 13.6 |
| VPS to SG | 5 GB | 1 | 34.6 | 34.6 |
| SG to VPS | 5 GB | 1 | 8.6 | 8.6 |
| VPS to SG | 10 GB | 1 | 18.8 | 18.8 |
| SG to VPS | 10 GB | 1 | 12.9 | 12.9 |
Pairing B, laptop to SG, about 330 ms
| Transfer | Size | n | Median MB/s | p95 |
|---|---|---|---|---|
| SG to laptop | 10 MB | 5 | 20.4 | 21.8 |
| laptop to SG | 100 MB | 5 | 29.6 | 30.5 |
| laptop to SG | 1 GB | 3 | 29.0 | 29.3 |
| SG to laptop | 1 GB | 3 | 13.4 | 15.1 |
| laptop to SG | 5 GB | 1 | 32.9 | 32.9 |
| laptop to SG | 10 GB | 1 | 31.3 | 31.3 |
Handshake
Time to connect to the Singapore box, parsed from the joiner's own log (relay lookup to connected).
| Connects to SG from | n | Median ms | p95 | min | max |
|---|---|---|---|---|---|
| VPS (pairing A) | 10 | 6316 | 6473 | 6094 | 6476 |
| laptop (pairing B) | 10 | 6220 | 6481 | 5977 | 6507 |
FUSE on-demand
A Linux peer mounts the virtual drive and reads a file hosted on SG. Content streams on demand over the link. ttfb is the time to the first byte (the first 512 KB chunk). The mount is Linux because the Windows mount has a bug, covered below.
| Mount reads from SG | Size | n | ttfb median ms | p95 | read MB/s |
|---|---|---|---|---|---|
| VPS mount (pairing A) | 10 MB | 3 | 632 | 1242 | 0.9 |
| VPS mount (pairing A) | 100 MB | 3 | 509 | 635 | 1.0 |
| laptop mount (pairing B) | 10 MB | 3 | 709 | 1423 | 0.8 |
| laptop mount (pairing B) | 100 MB | 3 | 482 | 549 | 0.8 |
What the numbers say
The limit is round-trip time, not bandwidth and not CPU. The home link is 500 to 600 Mbps, which is 62 to 75 MB/s, and the best run used about half of it. The box is a 4 vCPU machine that stayed near idle, and ChaCha20 runs at gigabytes per second per core, so encryption is not the wall either. The wall is the bandwidth delay product: one gRPC stream with a 16 MiB window cannot keep a 330 ms pipe full, because filling it needs roughly 18 MB in flight at the home link rate. A bigger window or parallel pull streams would close that gap.
The Singapore box sends at about half the rate it receives. At 1 GB on pairing A it pulls in 37.6 MB/s but pushes out 13.2 MB/s, and the same shape holds across sizes and on pairing B. The receive path is fine. The send path is where the work is.
The 6.3 second connect is one timeout. The box has no IPv6, so the direct IPv6 dial always fails and waits out its deadline before the bridge takes over. Skipping the direct dial when a peer advertises no IPv6 removes most of that. The same change helps phones, which are usually IPv4 only and hit the same wait.
On-demand FUSE is for opening and seeking, not for streaming a whole large file. The first byte comes back in half a second to a second and a half, which is fine for opening a document, scrubbing a video, or reading part of a dataset. Sequential read settles near 1 MB/s because each 512 KB chunk costs one to three round trips at this latency. For a whole large file, a bulk pull at 30 MB/s or a prefetch is the right path.
Defects identified
FUSE mode did not start on the Windows host on this build. host.Mount returned false and the daemon fell back to no-FUSE: kd list showed the files, but the mount point stayed an empty folder and neither local nor remote files surfaced. The cause is the mount point being created before WinFsp runs. WinFsp takes the mount point as a drive letter (for example K:), an auto-assigned drive (*), or a directory it creates itself, and it rejects a path that already exists. PR #181 removes the pre-creation on Windows. No-FUSE transfer is unaffected, which is why the FUSE tables above use Linux mounts.
One 10 GB transfer looked like it stalled at 12 MB. A re-run moved 3.7 GB in 4 minutes at a steady 15.6 MB/s, so it was a blip on the residential link, not a defect. Not filed.
Planned changes
Skip the direct dial when a peer advertises no IPv6. This removes most of the 6.3 s connect time on the common case and applies to mobile, which is also IPv4 only.
Land the Windows mount fix (PR #181) so FUSE starts on the Windows host and surfaces files.
Add a push-based send to remove the asymmetry where the host sends at half its receive rate.
For bulk throughput, raise the window or pull with several streams. Tradeoff: saturating the link degrades on-demand read latency, so this stays opt-in for bulk transfers rather than the default for the interactive mount.
Medians over the per-table trial counts (n), all peers on commit b5a7a3e. Pairing A is VPS to SG at about 200 ms. Pairing B is laptop to SG at about 330 ms. All traffic over the TCP bridge.