Git Clone Between Two FUSE Peers: The Last Puzzle Piece

Making git work on a single FUSE mount was hard (blog post #8). Making git work when the repo is shared between two P2P FUSE peers required fixing eight bugs: a rename race, pack file truncation, macOS kernel cache poisoning, notification flooding, hot-path log spam, exec permissions, LFS intermediate-size corruption, and stale branch state from a cache coherency flaw. The result: git clone, git commit, branch switching, and Git LFS all work between encrypted P2P peers on both macOS and Linux.

The Setup

KEIBIDROP mounts a virtual FUSE filesystem for each peer. When Peer A creates or modifies files, the changes sync to Peer B's mount in real-time through encrypted gRPC.

Blog post #8 covered making git work on a single FUSE mount: per-file direct_io for mmap compatibility, fsync race conditions, macOS fcopyfile quirks. After those fixes, you can git clone into your own FUSE mount and use git normally.

But what happens when Peer A clones a git repo into their mount, and Peer B tries to use it through their mount?

Bob (FUSE mount)                     Alice (FUSE mount)
├── .git/                            ├── .git/          <- synced from Bob
│   ├── HEAD                         │   ├── HEAD
│   ├── config                       │   ├── config
│   ├── objects/pack/                │   ├── objects/pack/
│   │   └── pack-abc123.pack (2MB)   │   │   └── pack-abc123.pack
│   └── refs/heads/main              │   └── refs/heads/main
├── go.mod                           ├── go.mod
└── README.md                        └── README.md

Bob runs git clone into his mount. Alice should be able to cd into her mount and run git status, git log, even git commit.

The Debugging Approach

We added one-liner trace logs to every major FUSE operation -- getattr, open, read, write, rename, mkdir, readdir, release. Each log includes the path and key state:

FUSE getattr  path=/go-fp/.git/config fh=0xFFFF
FUSE open     path=/go-fp/.git/config flags=0
FUSE read     path=/go-fp/.git/config offset=0 len=4096 src=remote
FUSE release  path=/go-fp/.git/config fh=16
FUSE rename   old=/go-fp/.git/config.lock new=/go-fp/.git/config

Then we ran git clone on Bob, switched to Alice, and ran git status. Eight bugs appeared across multiple debugging sessions -- seven on macOS, the eighth (cache coherency) found when testing cross-platform consistency.

Bug 1: The Rename Race (Stale `.lock` Files)

Symptom: git status on Alice returns fatal: bad config line 12 in file .git/config. Also, stale .lock files (HEAD.lock, config.lock) appear in .git/.

What git does: Git uses atomic write patterns. To update config, it:

Creates config.lock
Writes the new content
Closes config.lock -- KEIBIDROP sends ADD_FILE notification
Renames config.lock to config -- KEIBIDROP sends RENAME_FILE notification

What we expected: Our prefetch system downloads config.lock in the background. The prefetch goroutine has a deferred cleanup that checks if the file was renamed (by comparing f.RealPathOfFile to the original path) and atomically moves the content.

The race: The prefetch goroutine finishes downloading config.lock before the RENAME_FILE notification arrives. The deferred cleanup runs, sees f.RealPathOfFile hasn't changed yet (still config.lock), and skips the disk rename. Then the RENAME_FILE notification arrives and updates the in-memory maps -- but the file on disk stays at config.lock.

Timeline:
  t=0ms  ADD_FILE config.lock -> prefetch starts
  t=1ms  Prefetch complete -> deferred cleanup runs
         -> f.RealPathOfFile == "config.lock" (no change yet)
         -> NO disk rename
  t=2ms  RENAME_FILE arrives -> maps updated
         -> but config.lock still on disk!

The fix: The RENAME_FILE handler now does os.Rename on the actual disk file before updating the maps. The prefetch deferred cleanup becomes a redundant safety net:

case bindings.NotifyType_RENAME_FILE:
    oldDiskPath := filepath.Join(root, req.OldPath)
    newDiskPath := filepath.Join(root, req.Path)
    os.Rename(oldDiskPath, newDiskPath) // disk rename FIRST
    // then update maps...

Bug 2: Pack File Corruption (20 Missing Bytes)

Symptom: error: packfile .git/objects/pack/pack-abc123.pack does not match index. MD5 checksums differ between peers.

Root cause: Alice's pack file was 2,182,186 bytes. Bob's was 2,182,206 bytes. Exactly 20 bytes short -- the SHA-1 checksum that git's index-pack appends.

What git does during clone:

Writes pack data to tmp_pack_xyz (2,182,186 bytes)
Closes -- ADD_FILE notification sent with size 2,182,186
index-pack appends 20-byte SHA-1 checksum
Renames tmp_pack_xyz to pack-abc123.pack
RENAME_FILE notification sent with Attr showing size 2,182,206

The problem: Alice downloads 2,182,186 bytes (from step 2). The RENAME_FILE handler moves the file on disk but doesn't check that the size changed. The 20-byte checksum was written between the close and the rename -- a window where KEIBIDROP doesn't see the update.

The fix: After renaming on disk, compare the local file size with the size in the RENAME_FILE notification's Attr. If they differ, trigger a re-download:

localInfo, _ := os.Stat(newDiskPath)
if localInfo.Size() != req.Attr.Size {
    // Re-download with correct size
    kd.FS.Root.EditRemoteFile(logger, req.Path, ...)
}

Bug 3: The Invisible File (macFUSE Cache Poisoning)

Symptom: git status shows deleted: go.mod -- but the file exists on disk (verified with ls -la). Running git status again still shows it deleted.

Root cause: macFUSE's negative_vncache mount option. This tells the kernel to cache "file not found" (ENOENT) results. The sequence:

Bob clones repo -- notifications stream to Alice
macOS probes Alice's mount (Spotlight, fsevents) -- getattr /go-fp/go.mod -- ENOENT (file hasn't arrived yet)
Kernel caches: "go.mod doesn't exist"
Notification arrives -- prefetch writes go.mod to disk
git status runs -- kernel returns cached ENOENT without asking our FUSE handler
Git thinks the file is deleted

The cache is sticky. Even after the file exists, the kernel keeps returning ENOENT from cache. Only unmounting clears it.

The fix: Remove negative_vncache from mount options and add defer_permissions (for exec support):

func getMountOptions() []string {
    return []string{
        "-o", "volname=KeibiDrop",
        "-o", "local",
        "-o", "slow_statfs",
        "-o", "allow_other",
        "-o", "defer_permissions",
    }
}

The performance impact is minimal -- a few extra kernel-to-FUSE round-trips for genuinely non-existent files.

Bug 4: The 612-File Hang (Notification Flood)

Symptom: Cloning a small repo (24 objects, 2MB) works perfectly. Cloning a large repo (2339 objects, 257MB, 612 files) hangs after Updating files: 100% (612/612), done.

Root cause: Every FUSE Release (file close) sends a gRPC notification to the peer. With 612 files closing in rapid succession, the notifications overwhelm the connection. Making them async just created 612 goroutines competing for the gRPC transport.

The fix -- three parts:

First, a BatchNotify RPC at the protocol level. Instead of sending one gRPC call per file notification, we batch multiple notifications into a single RPC. 612 individual Notify calls = 612 round-trips. 10 BatchNotify calls with 64 notifications each = 10 round-trips.

rpc BatchNotify (BatchNotifyRequest) returns (BatchNotifyResponse);

message BatchNotifyRequest {
  repeated NotifyRequest notifications = 1;
  uint64 seq = 2;        // Monotonic batch sequence for ordering.
  uint64 timestamp = 3;  // Sender's unix nano timestamp.
}

Second, a client-side notification worker with per-path debounce. Instead of spawning goroutines or using a simple batch timer, the worker maintains a pending map per path:

ADD_FILE / EDIT_FILE: Stored in pending map with a 200ms deadline. Each new notification for the same path resets the deadline. Only sent when the path is stable for 200ms.
RENAME: Sent immediately. Any pending ADD_FILE for the old path is retargeted to the new path (not deleted -- deleting it would mean the peer never gets the content).
REMOVE / ADD_DIR: Sent immediately.

The retarget was a critical insight we discovered through testing. Our first attempt simply deleted the pending ADD_FILE on RENAME, but this meant the peer never downloaded the file content -- the RENAME moved nothing because nothing had arrived. Retargeting preserves the content download while updating the path.

Third, a prefetch semaphore on the receiving side. A channel-based semaphore limits concurrent prefetches to 8, preventing 612 simultaneous StreamFile gRPC streams from overwhelming the connection.

Bug 5: The 190,000 Log Lines (Hot-Path Logging)

Symptom: Even after fixing the notification flood, the 612-file clone still hangs for 30 seconds.

Root cause: We added FUSE trace logs for debugging. But Getattr is called on every file access. macOS probes hundreds of paths per directory (Spotlight, fsevents, .DS_Store, resource forks). Without negative_vncache, every probe goes through our handler. Result: 207,000 getattr logs and 190,000 ENOENT error logs -- all written synchronously to disk.

grep -c "FUSE getattr" Log_Bob.txt -> 207,137
grep -c "Failed to lstat" Log_Bob.txt -> 190,614

The fix: Two changes:

Remove all trace logs from hot-path handlers (getattr, open, read, write, release). Keep logs only on low-frequency operations (rename, mkdir, remote-add).
Don't log ENOENT as an error -- it's normal for macOS probing non-existent paths:

if err != nil {
    if !os.IsNotExist(err) {  // only log real errors
        logger.Error("Failed to lstat", "path", cleanPath, "error", err)
    }
    return int(convertOsErrToSyscallErrno("lstat", err))
}

Lesson: Logging on a hot path is invisible until you have enough files. 20 files? Fine. 612 files with macOS probing? 400,000 synchronous disk writes.

Bug 6: Hook Permission Denied

Symptom: fatal: cannot exec '.git/hooks/post-checkout': Operation not permitted

Root cause: macOS Gatekeeper blocks executing scripts from FUSE mounts by default.

The fix: Add defer_permissions to mount options, which tells macFUSE to use standard Unix permission checks instead of Gatekeeper restrictions.

Bug 7: LFS File Corruption (Intermediate Sizes + Stale Notifications)

Symptom: Cloning a repo with Git LFS (3.71 GiB, 13 large XML files) results in corrupted LFS objects on the receiving peer. git status shows Files don't match and clean filter 'lfs' failed.

Root cause -- two interacting problems:

First, intermediate sizes. Git LFS downloads a 420MB XML file incrementally into .git/lfs/incomplete/<hash>. Each intermediate close triggers ADD_FILE with the current size. The peer starts prefetching at 100MB, gets a new notification at 200MB, restarts, gets another at 300MB, restarts again. The file ends up with content from whichever intermediate prefetch happened to complete.

Second, stale notifications after rename. LFS renames incomplete/<hash> to objects/<sha>/<hash> when the download finishes. The RENAME is sent immediately, but the debounced ADD_FILE for the old incomplete/<hash> path fires after the rename. The peer tries to download a file that no longer exists at the old path.

Timeline without fix:
  t=0s   ADD_FILE incomplete/<hash> size=100MB -> debounced (pending)
  t=1s   ADD_FILE incomplete/<hash> size=200MB -> debounced (replaces)
  t=2s   ADD_FILE incomplete/<hash> size=420MB -> debounced (replaces)
  t=2.5s RENAME incomplete/<hash> -> objects/<sha>/<hash> -> sent immediately
  t=3.5s Debounce fires: ADD_FILE incomplete/<hash> size=420MB -> STALE!
         Peer tries to download from old path -> File not found

The fix -- per-path debounce with RENAME retargeting:

ADD_FILE and EDIT_FILE get a 200ms per-path debounce -- each update resets the deadline. When a RENAME arrives, it's sent immediately AND the pending ADD_FILE for the old path is retargeted to the new path:

case bindings.NotifyType_ADD_FILE, bindings.NotifyType_EDIT_FILE:
    pending[req.Path] = &pendingNotify{
        req:      req,
        deadline: time.Now().Add(200 * time.Millisecond),
    }
case bindings.NotifyType_RENAME_FILE:
    if old, exists := pending[req.OldPath]; exists {
        delete(pending, req.OldPath)
        old.req.Path = req.Path  // retarget to new path
        pending[req.Path] = old
    }
    immediate = append(immediate, req)

Timeline with fix:
  t=0s   ADD_FILE incomplete/<hash> size=100MB -> debounced (pending)
  t=0.1s ADD_FILE incomplete/<hash> size=200MB -> debounced (replaces)
  t=0.2s ADD_FILE incomplete/<hash> size=420MB -> debounced (replaces)
  t=0.3s RENAME incomplete/<hash> -> objects/<sha>/<hash> -> sent immediately
         Pending ADD_FILE retargeted: path = objects/<sha>/<hash>
  t=0.5s Debounce fires: ADD_FILE objects/<sha>/<hash> size=420MB -> correct path!
         Peer downloads file at final location with final size

Result: The RENAME arrives first (peer updates maps + renames on disk). Then the retargeted ADD_FILE arrives with the correct final path and size, triggering a prefetch that downloads the complete content. No stale paths, no intermediate sizes, no lost content.

Bug 8: Stale Branch After Checkout (Cache Coherency)

Symptom: Bob clones a repo. Alice sees it, creates a branch good, adds a file, commits. On Bob's side, git branch shows good and * main -- the branch exists but HEAD wasn't updated. After Bob manually runs git checkout good, everything works.

Root cause -- two interacting problems:

First, the lock-file debounce gap. Git writes .git/HEAD using the lock-file pattern: create .git/HEAD.lock, write ref: refs/heads/good\n, rename .git/HEAD.lock to .git/HEAD. The RENAME notification is sent immediately, but the ADD_FILE for .git/HEAD.lock is debounced (200ms). When RENAME arrives at Bob, .git/HEAD.lock was never in Bob's RemoteFiles (the debounced ADD_FILE hasn't arrived yet). So the RENAME handler finds exists = false and skips the re-download check. The retargeted ADD_FILE arrives 200ms later and triggers the correct re-download -- but there's a window where Bob reads stale .git/HEAD content.

Timeline:
  t=0ms   CREATE+WRITE .git/HEAD.lock -> debounced ADD_FILE (pending, 200ms)
  t=1ms   RENAME .git/HEAD.lock -> .git/HEAD -> sent immediately
          Bob: .git/HEAD.lock NOT in RemoteFiles -> RENAME is a no-op
          Bob reads .git/HEAD -> gets old content "ref: refs/heads/main"
  t=200ms Retargeted ADD_FILE .git/HEAD -> AddRemoteFile -> prefetch -> correct

Second, Getattr falsely marking files as locally newer. After every prefetch completes, the local file has a more recent mtime than the remote peer's stat (because the download time is always later than the creation time). Getattr compared mtimes and set LocalNewer = true on every prefetched file. This caused OpenEx to serve from the local path -- usually fine because prefetch wrote correct content, but during the window above, the local file still had stale content.

The fix -- three parts:

First, the RENAME handler now detects the lock-file pattern. When the source path is NOT in RemoteFiles but the target path IS (.git/HEAD.lock unknown, but .git/HEAD is tracked), it triggers an immediate re-download via AddRemoteFile. This eliminates the 200ms stale window:

// Handle lock-file -> final-file renames.
if !exists && req.Attr != nil && req.Attr.Size > 0 {
    if _, targetTracked := RemoteFiles[req.Path]; targetTracked {
        AddRemoteFile(req.Path, req.Attr) // re-download immediately
    }
}

Second, Getattr no longer manages LocalNewer. The mtime comparison was fundamentally flawed for prefetched files -- download time is always newer than creation time. LocalNewer is now managed exclusively by:

Write handler -- sets true (genuine local edit)
AddRemoteFile / EditRemoteFile -- sets false (remote update arrived)

Third, AddRemoteFile and EditRemoteFile now sync AllFileMap to point to the same File object as RemoteFiles. Previously, Getattr could create a separate File in AllFileMap with stale LocalNewer = true, and OpenEx would read from that stale object instead of the updated RemoteFiles entry.

Result: Branch checkout on one peer is reflected on the other immediately. No more stale .git/HEAD content.

Bonus: Linux Compatibility

The first seven bugs were found and fixed on macOS (Darwin). Testing on Linux revealed three additional issues that share the same root causes but manifest differently:

Read handler EOF vs EBADF: When a remote file disappears mid-read (git's transient .keep files), the Read handler returned -EBADF which aborted the caller. Returning 0 (EOF) lets callers see an empty file gracefully.
REMOVE_FILE stale cache: Git creates then immediately removes .keep files. Without cancelling the in-flight prefetch and removing the stale cache file, subsequent Opens found zero-byte cached content instead of the actual file.
RENAME_FILE with missing files: If a prefetch was still in progress when RENAME arrived, the file didn't exist locally after rename. The original fix only re-downloaded on size mismatch -- now it also re-downloads when the file is completely missing.

All eight bug fixes are platform-independent. The same code runs on both macOS and Linux, with platform-specific mount options in specifics_darwin.go and specifics_linux.go.

Bonus: macFUSE Mount Option Tuning

While debugging, we also tuned the macFUSE mount options based on benchmarking and the macFUSE wiki:

Option	Why
`noappledouble`	Blocks `._` and `.DS_Store` probes -- eliminates 100K+ getattr calls during large clones
`iosize=524288`	512KB I/O block size -- matches our ChunkSize. Benchmarked: 245 MB/s vs 209 MB/s (1MB) vs 232 MB/s (default 64KB)
`defer_permissions`	Standard Unix permission checks -- enables executing git hooks on the mount
~~`negative_vncache`~~	Removed -- caches ENOENT results, breaks file sync from peers
~~`auto_cache`~~	Tested, not used -- interfered with in-flight reads during peer sync

The Final Result

After all eight fixes, the full git workflow works between encrypted P2P FUSE peers:

# Bob clones a repo
$ cd MountBob && git clone git@github.com:user/repo.git
Cloning into 'repo'... done.

# Alice sees it immediately
$ cd MountAlice/repo && git status
On branch main -- nothing to commit, working tree clean

# Alice creates a branch and commits
$ git checkout -b good
$ echo "testeasdasda" > test.txt && git add test.txt && git commit -m "Good"

# Bob sees the branch, the commit, AND the updated HEAD
$ cd MountBob/repo && git branch
  good
* main
$ git checkout good
Switched to branch 'good'
$ git log
commit aff12dd (HEAD -> good)
    Good
commit f2b9b1f (origin/main, origin/HEAD, main)
    Scale image

Tested with:

Small repo: 24 objects, 2MB, 8 files -- instant sync, bidirectional commits
Large repo: 2339 objects, 257MB + 3.71 GiB LFS, 612 files -- completes without hanging
Git operations: clone, status, log, branch, checkout, commit, add -- all work on both peers
Platforms: macOS (Darwin) and Linux -- same codebase, platform-specific mount options only

Lessons Learned

FUSE handlers must rename on disk, not just in memory. Our RENAME_FILE handler only updated maps, trusting the prefetch goroutine to move files. But goroutines race with notification handlers -- especially for tiny files that download in milliseconds.
The close-to-rename gap is real. Git writes data, closes, then renames. Between close and rename, other processes can modify the file (index-pack appending checksums). Notifications sent at close time carry stale sizes. The rename notification has the truth -- use it.
Kernel caches are invisible bugs. macFUSE's negative_vncache silently caches ENOENT results. Our FUSE handler was never even called -- the kernel answered from cache. These bugs don't show up in FUSE logs because the kernel short-circuits the call. Disable negative caching for any FUSE filesystem where files can appear asynchronously.
Never do unbounded concurrency from a tight loop. 612 files = 612 goroutines = 612 gRPC streams = connection collapse. A bounded channel + batching worker gives you backpressure without blocking the caller.
Hot-path logging is a silent killer. 190,000 synchronous log writes caused a 30-second hang. The logging was added for debugging and seemed harmless -- until we hit 612 files with macOS probing. Always guard hot-path logs with level checks or keep them commented out.
Debounce per-path, retarget on rename. LFS downloads a 420MB file over seconds, triggering 10+ notifications at intermediate sizes. Per-path debounce (200ms deadline, reset on each update) catches rapid-fire writes. But the tricky part is RENAME: you can't just delete the pending ADD_FILE (the peer never gets the content) and you can't send the stale ADD_FILE after the rename (wrong path). The answer is to retarget the pending notification to the new path. Three attempts to get this right: delete (broke content delivery), send stale (wrong path), retarget (correct).
Mtime lies about who wrote the file. After prefetching a remote file, the local copy has a more recent mtime than the original. Using mtime to decide "local is newer, skip remote" caused stale reads because every prefetched file appeared locally newer. The fix: don't use mtime for this decision. Track authorship explicitly -- only the Write handler (genuine local edits) can mark a file as locally authoritative.
Debounce creates timing windows; immediate handlers must compensate. The 200ms per-path debounce on ADD_FILE notifications is essential for LFS downloads. But RENAME arrives immediately, and the ADD_FILE it depends on hasn't been sent yet. When the RENAME handler can't find the source in its maps, it must check whether the target is tracked and trigger a re-download. Debouncing is not free -- every delayed notification creates a window where the peer can read stale data.
Git is just syscalls. There's nothing git-specific about these bugs. Any application that creates files through one peer's mount and reads them through another would hit the same rename race, size mismatch, cache poisoning, notification flood, and intermediate-size corruption. Git just exercises every edge case because of its atomic .lock-to-rename pattern, large checkouts, and LFS incremental downloads.