100,000 Files: What Eager Metadata Costs

        Summary. When you share a folder, KEIBIDROP announces every file in it to the peer up front, so the peer sees the whole tree immediately and reads any file on demand. That is eager metadata, as opposed to fetching each file's metadata lazily on first access. We measured the choice at 100,000 files two ways. On a single host the round-trip is effectively zero (under 1 ms), which isolates the pure metadata path: the sharer indexes and announces all 100,000 in 24.6 s (about 4,000 files/s), and because announcements stream while indexing runs, the peer holds the complete index by the time sharing finishes. Then over a real WAN — an Intel Mac in Iasi to a VPS in the Timisoara datacenter, 17 ms round-trip, mount to mount over on-demand FUSE — the same 100,000 land in about 72 s across two runs, all received, though roughly 65 s of that is the sharer writing 100,000 small files through its own FUSE mount; the metadata rides along concurrently (the peer holds about 45,000 by 20 s in) and finishes about 7 s after the writes, so the link is not the bottleneck. Reading any one file on demand from the peer then takes 30 to 40 ms, about a round trip. The receiver carries that index in 60 MiB, roughly 270 bytes per file, and a byte-for-byte sample read-back matches. Cold-path git over the same mount stays sub-second. The trade against lazy metadata is paying ~25 s and 60 MiB once, in return for instant browsing and no per-open round trip afterward. We also show why the obvious byte-saving optimization would not make any of this faster.
      

Eager versus lazy

Two ways to make a peer's files appear on your machine. Lazy: send nothing up front, and when something opens a file, go fetch its metadata then. Cheap to connect, but every first touch pays a round trip, and a tool that stats a whole directory (a file manager, a build, git status) pays one per entry. Eager: announce every file's path, size, mode, and timestamps when the share starts, so the index is local and complete from the first second.

Large shares reach 100,000 files and more, often on the lazy model to keep the up-front cost near zero. We run eager, so the fair question is what eager costs at that size. The answer is a known, bounded up-front price instead of a latency tax spread across every later access.

100,000 files, measured

Two peers on a single host (round-trip under 1 ms), no-FUSE path so we measure the pure ADD_FILE metadata path. The sharer creates the files (a test artifact, timed separately), then shares them; the receiver tracks every announcement.

Phase	Time	Rate
create 100k files on disk (test artifact)	5.0 s	—
share: index + announce 100k	24.6 s	4,062 files/s
peer receives the full index	concurrent (0 extra)	—
receiver heap holding the index	60 MiB	~270 B/file
sample read-back (7 spread across the set)	byte-perfect	7/7

The 24.6 s is the eager metadata itself: each file gets indexed locally and an ADD_FILE queued for the peer. The 5 s of file creation is an artifact of the test (a real share already has its files on disk). Nothing was dropped: the receiver ended with exactly 100,000, and the wait after the share loop was zero, because announcements stream out while the sharer is still indexing. The whole index lives in 60 MiB on the receiver, which is the standing memory price of browsing 100k files without a round trip.

Why it keeps up

Announcements do not go one RPC per file. A worker debounces per path (a file written in bursts, like an LFS download, settles for 200 ms before it is announced, so the peer never restarts on a half-written file), then a 100 ms tick flushes everything that is ready in a single BatchNotify. During a bulk share that means a few hundred files per batch, ordered, so the receiver applies them in sequence. The batching is bound by the share rate, not a fixed count, which is why 100k flows out in big batches rather than a storm of tiny calls.

Cold path stays fast

Holding 100k metadata entries does not slow reads, because reads are on demand and independent of the index size. Real git between the two peers over the cold mount, every object fetched on first touch:

Operation	Time	Check
cold clone, 5 MB repo (.git + a 5 MB binary)	0.09 s	fsck clean, binary byte-perfect
cold checkout of a new commit (truncate path)	0.18 s	working files byte-perfect
bidirectional commit (peer B's branch back to A)	0 s propagate	fsck clean both sides
git-lfs 20 MB object, cold clone	0.97 s	byte-perfect

Over a real WAN, measured

The single-host run isolates the metadata path, but the link is not in it, and a loopback number is not a WAN result. So we ran the same share over a real one: an Intel Mac in Iasi to a VPS in the Timisoara datacenter, peer to peer over IPv6, 17 ms round-trip, mounted on both ends so the receiver browses the tree and reads files on demand through FUSE.

Mount-to-mount, on-demand FUSE, 17 ms link	Result
round-trip time (ping6, the peer-to-peer path)	17 ms
share 10,000 files	4.9 s, all 10,000 received
share 100,000 files	~72 s, two runs (72.3 / 72.0 s), all 100,000 received
— sharer writes the 100k files through FUSE	~65 s
— metadata propagation, riding concurrent	~45,000 by +20 s, then ~7 s tail
on-demand first-read of one file	30–40 ms (about a round trip)

The split is the point. The 100,000 land mount-to-mount in about 72 s, but roughly 65 s of that is the sharer creating 100,000 small files through its own FUSE mount, which a real share never pays because the files already exist. The eager metadata is not the bottleneck: it streams while the sharer writes, the receiver holds about 45,000 entries 20 s in, and the last of the 100,000 arrives about 7 s after the writes finish. Around 9 MB crosses the wire for the full index, so bandwidth is not the constraint at this latency. Once the index is there, opening any one file pulls its bytes on demand in 30 to 40 ms, about a round trip, no matter how many files the share holds — the browse-everything, read-anything experience eager metadata buys.

One thing the run surfaced honestly: a 100k burst flushes a single large BatchNotify (about 9 MB), and on a link that turned jittery after a laptop sleep that one flush could stall. Capping the flush into bounded batches removes that cliff and is the next hardening step; on a steady 17 ms link, every fresh 100k run went through clean.

The optimization that would not help

The per-file Attr we send has nine fields, and across one directory most repeat: same device, same mode, the same flags. The obvious move is to send a template once per batch and only the per-file differences, which trims roughly 30% of the metadata bytes. It is a real bandwidth saving and worth doing for metered or slow links, done so older peers keep working.

It would not make 100k faster. The breakdown shows the wall-clock is the 24.6 s of local indexing plus, on a WAN, the round trips, neither of which is the byte count. Two of those nine fields also have to stay per file: the inode (the kernel identifies files by it) and the change time (git uses ctime to decide what changed), so dropping them to chase more savings would break exactly the git workflow we lean on as the integrity test. The honest levers for speed are the local index path at 246 us/file and cutting round trips, not shrinking the bytes.

Takeaway

Eager metadata at 100k is a bounded, measured cost: about 25 s to index and announce on one host, 60 MiB to hold, byte-perfect, with on-demand reads unaffected by the count. Over a real 17 ms WAN the same holds mount to mount — the index populates as fast as the sharer can write the files, and any file then opens on demand in about a round trip (30 to 40 ms). You pay the index once when sharing, instead of paying a round trip on every later open. That is the right trade for browsing and working in a large shared tree, and the scale check is a one-line opt-in (KD_SCALE_N=100000) so it never slows the normal test run.