git clone, git commit, branch switching, and Git LFS all work between encrypted P2P peers.
The Setup
KEIBIDROP mounts a virtual FUSE filesystem for each peer. When Peer A creates or modifies files, the changes sync to Peer B's mount in real-time through encrypted gRPC.
Blog post #8 covered making git work on a single FUSE mount: per-file direct_io for mmap compatibility, fsync race conditions, macOS fcopyfile quirks. After those fixes, you can git clone into your own FUSE mount and use git normally.
But what happens when Peer A clones a git repo into their mount, and Peer B tries to use it through their mount?
Bob (FUSE mount) Alice (FUSE mount)
├── .git/ ├── .git/ <- synced from Bob
│ ├── HEAD │ ├── HEAD
│ ├── config │ ├── config
│ ├── objects/pack/ │ ├── objects/pack/
│ │ └── pack-abc123.pack (2MB) │ │ └── pack-abc123.pack
│ └── refs/heads/main │ └── refs/heads/main
├── go.mod ├── go.mod
└── README.md └── README.md
Bob runs git clone into his mount. Alice should be able to cd into her mount and run git status, git log, even git commit.
The Debugging Approach
We added one-liner trace logs to every major FUSE operation -- getattr, open, read, write, rename, mkdir, readdir, release. Each log includes the path and key state:
FUSE getattr path=/go-fp/.git/config fh=0xFFFF
FUSE open path=/go-fp/.git/config flags=0
FUSE read path=/go-fp/.git/config offset=0 len=4096 src=remote
FUSE release path=/go-fp/.git/config fh=16
FUSE rename old=/go-fp/.git/config.lock new=/go-fp/.git/config
Then we ran git clone on Bob, switched to Alice, and ran git status. Six bugs appeared across three debugging sessions.
Bug 1: The Rename Race (Stale .lock Files)
Symptom: git status on Alice returns fatal: bad config line 12 in file .git/config. Also, stale .lock files (HEAD.lock, config.lock) appear in .git/.
What git does: Git uses atomic write patterns. To update config, it:
- Creates
config.lock - Writes the new content
- Closes
config.lock-- KEIBIDROP sendsADD_FILEnotification - Renames
config.locktoconfig-- KEIBIDROP sendsRENAME_FILEnotification
What we expected: Our prefetch system downloads config.lock in the background. The prefetch goroutine has a deferred cleanup that checks if the file was renamed (by comparing f.RealPathOfFile to the original path) and atomically moves the content.
The race: The prefetch goroutine finishes downloading config.lock before the RENAME_FILE notification arrives. The deferred cleanup runs, sees f.RealPathOfFile hasn't changed yet (still config.lock), and skips the disk rename. Then the RENAME_FILE notification arrives and updates the in-memory maps -- but the file on disk stays at config.lock.
Timeline:
t=0ms ADD_FILE config.lock -> prefetch starts
t=1ms Prefetch complete -> deferred cleanup runs
-> f.RealPathOfFile == "config.lock" (no change yet)
-> NO disk rename
t=2ms RENAME_FILE arrives -> maps updated
-> but config.lock still on disk!
The fix: The RENAME_FILE handler now does os.Rename on the actual disk file before updating the maps. The prefetch deferred cleanup becomes a redundant safety net:
case bindings.NotifyType_RENAME_FILE:
oldDiskPath := filepath.Join(root, req.OldPath)
newDiskPath := filepath.Join(root, req.Path)
os.Rename(oldDiskPath, newDiskPath) // disk rename FIRST
// then update maps...
Bug 2: Pack File Corruption (20 Missing Bytes)
Symptom: error: packfile .git/objects/pack/pack-abc123.pack does not match index. MD5 checksums differ between peers.
Root cause: Alice's pack file was 2,182,186 bytes. Bob's was 2,182,206 bytes. Exactly 20 bytes short -- the SHA-1 checksum that git's index-pack appends.
What git does during clone:
- Writes pack data to
tmp_pack_xyz(2,182,186 bytes) - Closes --
ADD_FILEnotification sent with size 2,182,186 index-packappends 20-byte SHA-1 checksum- Renames
tmp_pack_xyztopack-abc123.pack RENAME_FILEnotification sent with Attr showing size 2,182,206
The problem: Alice downloads 2,182,186 bytes (from step 2). The RENAME_FILE handler moves the file on disk but doesn't check that the size changed. The 20-byte checksum was written between the close and the rename -- a window where KEIBIDROP doesn't see the update.
The fix: After renaming on disk, compare the local file size with the size in the RENAME_FILE notification's Attr. If they differ, trigger a re-download:
localInfo, _ := os.Stat(newDiskPath)
if localInfo.Size() != req.Attr.Size {
// Re-download with correct size
kd.FS.Root.EditRemoteFile(logger, req.Path, ...)
}
Bug 3: The Invisible File (macFUSE Cache Poisoning)
Symptom: git status shows deleted: go.mod -- but the file exists on disk (verified with ls -la). Running git status again still shows it deleted.
Root cause: macFUSE's negative_vncache mount option. This tells the kernel to cache "file not found" (ENOENT) results. The sequence:
- Bob clones repo -- notifications stream to Alice
- macOS probes Alice's mount (Spotlight, fsevents) --
getattr /go-fp/go.mod-- ENOENT (file hasn't arrived yet) - Kernel caches: "go.mod doesn't exist"
- Notification arrives -- prefetch writes
go.modto disk git statusruns -- kernel returns cached ENOENT without asking our FUSE handler- Git thinks the file is deleted
The cache is sticky. Even after the file exists, the kernel keeps returning ENOENT from cache. Only unmounting clears it.
The fix: Remove negative_vncache from mount options and add defer_permissions (for exec support):
func getMountOptions() []string {
return []string{
"-o", "volname=KeibiDrop",
"-o", "local",
"-o", "slow_statfs",
"-o", "allow_other",
"-o", "defer_permissions",
}
}
The performance impact is minimal -- a few extra kernel-to-FUSE round-trips for genuinely non-existent files.
Bug 4: The 612-File Hang (Notification Flood)
Symptom: Cloning a small repo (24 objects, 2MB) works perfectly. Cloning a large repo (2339 objects, 257MB, 612 files) hangs after Updating files: 100% (612/612), done.
Root cause: Every FUSE Release (file close) sends a gRPC notification to the peer. With 612 files closing in rapid succession, the notifications overwhelm the connection. Making them async just created 612 goroutines competing for the gRPC transport.
The fix -- three parts:
First, a BatchNotify RPC at the protocol level. Instead of sending one gRPC call per file notification, we batch multiple notifications into a single RPC. 612 individual Notify calls = 612 round-trips. 10 BatchNotify calls with 64 notifications each = 10 round-trips.
rpc BatchNotify (BatchNotifyRequest) returns (BatchNotifyResponse);
message BatchNotifyRequest {
repeated NotifyRequest notifications = 1;
uint64 seq = 2; // Monotonic batch sequence for ordering.
uint64 timestamp = 3; // Sender's unix nano timestamp.
}
Second, a client-side notification worker with per-path debounce. Instead of spawning goroutines or using a simple batch timer, the worker maintains a pending map per path:
- ADD_FILE / EDIT_FILE: Stored in pending map with a 200ms deadline. Each new notification for the same path resets the deadline. Only sent when the path is stable for 200ms.
- RENAME: Sent immediately. Any pending ADD_FILE for the old path is retargeted to the new path (not deleted -- deleting it would mean the peer never gets the content).
- REMOVE / ADD_DIR: Sent immediately.
The retarget was a critical insight we discovered through testing. Our first attempt simply deleted the pending ADD_FILE on RENAME, but this meant the peer never downloaded the file content -- the RENAME moved nothing because nothing had arrived. Retargeting preserves the content download while updating the path.
Third, a prefetch semaphore on the receiving side. A channel-based semaphore limits concurrent prefetches to 8, preventing 612 simultaneous StreamFile gRPC streams from overwhelming the connection.
Bug 5: The 190,000 Log Lines (Hot-Path Logging)
Symptom: Even after fixing the notification flood, the 612-file clone still hangs for 30 seconds.
Root cause: We added FUSE trace logs for debugging. But Getattr is called on every file access. macOS probes hundreds of paths per directory (Spotlight, fsevents, .DS_Store, resource forks). Without negative_vncache, every probe goes through our handler. Result: 207,000 getattr logs and 190,000 ENOENT error logs -- all written synchronously to disk.
grep -c "FUSE getattr" Log_Bob.txt -> 207,137
grep -c "Failed to lstat" Log_Bob.txt -> 190,614
The fix: Two changes:
- Remove all trace logs from hot-path handlers (getattr, open, read, write, release). Keep logs only on low-frequency operations (rename, mkdir, remote-add).
- Don't log ENOENT as an error -- it's normal for macOS probing non-existent paths:
if err != nil {
if !os.IsNotExist(err) { // only log real errors
logger.Error("Failed to lstat", "path", cleanPath, "error", err)
}
return int(convertOsErrToSyscallErrno("lstat", err))
}
Lesson: Logging on a hot path is invisible until you have enough files. 20 files? Fine. 612 files with macOS probing? 400,000 synchronous disk writes.
Bug 6: Hook Permission Denied
Symptom: fatal: cannot exec '.git/hooks/post-checkout': Operation not permitted
Root cause: macOS Gatekeeper blocks executing scripts from FUSE mounts by default.
The fix: Add defer_permissions to mount options, which tells macFUSE to use standard Unix permission checks instead of Gatekeeper restrictions.
Bug 7: LFS File Corruption (Intermediate Sizes + Stale Notifications)
Symptom: Cloning a repo with Git LFS (3.71 GiB, 13 large XML files) results in corrupted LFS objects on the receiving peer. git status shows Files don't match and clean filter 'lfs' failed.
Root cause -- two interacting problems:
First, intermediate sizes. Git LFS downloads a 420MB XML file incrementally into .git/lfs/incomplete/<hash>. Each intermediate close triggers ADD_FILE with the current size. The peer starts prefetching at 100MB, gets a new notification at 200MB, restarts, gets another at 300MB, restarts again. The file ends up with content from whichever intermediate prefetch happened to complete.
Second, stale notifications after rename. LFS renames incomplete/<hash> to objects/<sha>/<hash> when the download finishes. The RENAME is sent immediately, but the debounced ADD_FILE for the old incomplete/<hash> path fires after the rename. The peer tries to download a file that no longer exists at the old path.
Timeline without fix:
t=0s ADD_FILE incomplete/<hash> size=100MB -> debounced (pending)
t=1s ADD_FILE incomplete/<hash> size=200MB -> debounced (replaces)
t=2s ADD_FILE incomplete/<hash> size=420MB -> debounced (replaces)
t=2.5s RENAME incomplete/<hash> -> objects/<sha>/<hash> -> sent immediately
t=3.5s Debounce fires: ADD_FILE incomplete/<hash> size=420MB -> STALE!
Peer tries to download from old path -> File not found
The fix -- per-path debounce with RENAME retargeting:
ADD_FILE and EDIT_FILE get a 200ms per-path debounce -- each update resets the deadline. When a RENAME arrives, it's sent immediately AND the pending ADD_FILE for the old path is retargeted to the new path:
case bindings.NotifyType_ADD_FILE, bindings.NotifyType_EDIT_FILE:
pending[req.Path] = &pendingNotify{
req: req,
deadline: time.Now().Add(200 * time.Millisecond),
}
case bindings.NotifyType_RENAME_FILE:
if old, exists := pending[req.OldPath]; exists {
delete(pending, req.OldPath)
old.req.Path = req.Path // retarget to new path
pending[req.Path] = old
}
immediate = append(immediate, req)
Timeline with fix:
t=0s ADD_FILE incomplete/<hash> size=100MB -> debounced (pending)
t=0.1s ADD_FILE incomplete/<hash> size=200MB -> debounced (replaces)
t=0.2s ADD_FILE incomplete/<hash> size=420MB -> debounced (replaces)
t=0.3s RENAME incomplete/<hash> -> objects/<sha>/<hash> -> sent immediately
Pending ADD_FILE retargeted: path = objects/<sha>/<hash>
t=0.5s Debounce fires: ADD_FILE objects/<sha>/<hash> size=420MB -> correct path!
Peer downloads file at final location with final size
Result: The RENAME arrives first (peer updates maps + renames on disk). Then the retargeted ADD_FILE arrives with the correct final path and size, triggering a prefetch that downloads the complete content. No stale paths, no intermediate sizes, no lost content.
Bonus: macFUSE Mount Option Tuning
While debugging, we also tuned the macFUSE mount options based on benchmarking and the macFUSE wiki:
| Option | Why |
|---|---|
noappledouble |
Blocks ._ and .DS_Store probes -- eliminates 100K+ getattr calls during large clones |
iosize=524288 |
512KB I/O block size -- matches our ChunkSize. Benchmarked: 245 MB/s vs 209 MB/s (1MB) vs 232 MB/s (default 64KB) |
defer_permissions |
Standard Unix permission checks -- enables executing git hooks on the mount |
negative_vncache |
Removed -- caches ENOENT results, breaks file sync from peers |
auto_cache |
Tested, not used -- interfered with in-flight reads during peer sync |
The Final Result
After all fixes, the full git workflow works between encrypted P2P FUSE peers:
# Bob clones a repo
$ cd MountBob && git clone git@github.com:user/repo.git
Cloning into 'repo'... done.
# Alice sees it immediately
$ cd MountAlice/repo && git status
On branch main -- nothing to commit, working tree clean
# Alice creates a branch and commits
$ git checkout -b test_me
$ echo "asdsadasda" > tst.txt && git add tst.txt && git commit -m "Commit for test"
# Bob sees the branch and commit
$ cd MountBob/repo && git status
On branch test_me -- nothing to commit, working tree clean
$ git log
commit 38e23bc (HEAD -> test_me)
Commit for test
commit f2b9b1f (origin/main, origin/HEAD, main)
Scale image
Tested with:
- Small repo: 24 objects, 2MB, 8 files -- instant sync, bidirectional commits
- Large repo: 2339 objects, 257MB + 3.71 GiB LFS, 612 files -- completes without hanging
- Git operations: clone, status, log, branch, checkout, commit, add -- all work on both peers
Lessons Learned
- FUSE handlers must rename on disk, not just in memory. Our
RENAME_FILEhandler only updated maps, trusting the prefetch goroutine to move files. But goroutines race with notification handlers -- especially for tiny files that download in milliseconds. - The close-to-rename gap is real. Git writes data, closes, then renames. Between close and rename, other processes can modify the file (index-pack appending checksums). Notifications sent at close time carry stale sizes. The rename notification has the truth -- use it.
- Kernel caches are invisible bugs. macFUSE's
negative_vncachesilently caches ENOENT results. Our FUSE handler was never even called -- the kernel answered from cache. These bugs don't show up in FUSE logs because the kernel short-circuits the call. Disable negative caching for any FUSE filesystem where files can appear asynchronously. - Never do unbounded concurrency from a tight loop. 612 files = 612 goroutines = 612 gRPC streams = connection collapse. A bounded channel + batching worker gives you backpressure without blocking the caller.
- Hot-path logging is a silent killer. 190,000 synchronous log writes caused a 30-second hang. The logging was added for debugging and seemed harmless -- until we hit 612 files with macOS probing. Always guard hot-path logs with level checks or keep them commented out.
- Debounce per-path, retarget on rename. LFS downloads a 420MB file over seconds, triggering 10+ notifications at intermediate sizes. Per-path debounce (200ms deadline, reset on each update) catches rapid-fire writes. But the tricky part is RENAME: you can't just delete the pending ADD_FILE (the peer never gets the content) and you can't send the stale ADD_FILE after the rename (wrong path). The answer is to retarget the pending notification to the new path. Three attempts to get this right: delete (broke content delivery), send stale (wrong path), retarget (correct).
- Git is just syscalls. There's nothing git-specific about these bugs. Any application that creates files through one peer's mount and reads them through another would hit the same rename race, size mismatch, cache poisoning, notification flood, and intermediate-size corruption. Git just exercises every edge case because of its atomic
.lock-to-rename pattern, large checkouts, and LFS incremental downloads.