mmap for pack files, and FUSE direct_io disables mmap. The solution is per-file direct_io that excludes .git/ paths, but that turned out to be only the first of five problems we had to fix.
The Problem
We wanted KEIBIDROP users to be able to git clone a repository directly into a FUSE-mounted shared directory and have changes sync to their peer in real time. The first attempt crashed immediately:
$ git clone https://github.com/example/repo.git /mnt/keibidrop/repo
Cloning into '/mnt/keibidrop/repo'...
remote: Enumerating objects: 1842, done.
fatal: BUS_ERROR
zsh: bus error git clone https://github.com/example/repo.git /mnt/keibidrop/repo
BUS_ERROR (signal 7) means a process tried to access memory that is not backed by anything. In our case, git was trying to mmap a pack file, and the FUSE filesystem was configured with direct_io globally, which tells the kernel to bypass the page cache entirely. No page cache means no pages to map. The mmap call succeeds (it is lazy), but the first access to the mapped region triggers SIGBUS.
Why Git Needs mmap
Git stores objects in pack files, which are compressed archives that contain multiple objects (commits, trees, blobs) concatenated together. When git needs to read an object, it does not decompress the entire pack. Instead, it mmaps the pack file and seeks directly to the object's offset using the pack index.
This is efficient for a reason: a typical .git/objects/pack/ directory might contain a single 500 MB pack file with tens of thousands of objects. Reading the whole file into memory would be wasteful. With mmap, the kernel pages in only the regions git actually touches, and those pages can be shared across multiple git processes.
// What git does internally (simplified):
fd = open(".git/objects/pack/pack-abc123.pack", O_RDONLY);
map = mmap(NULL, file_size, PROT_READ, MAP_PRIVATE, fd, 0);
// Later, to read object at offset 0x4F20:
obj_data = map + 0x4F20; // triggers page fault, kernel calls FUSE read
decompress(obj_data); // works because pages are backed by FUSE
With direct_io enabled, the kernel never creates those page mappings. The mmap syscall returns a valid pointer (the kernel defers actual allocation), but the first dereference has no backing page to fault in. Hence: SIGBUS.
The Fix: Per-File direct_io
We need direct_io for regular files because KEIBIDROP's content can change on the remote peer at any time. Without direct_io, the kernel caches stale data and serves it to readers even after the remote has updated the file. But git's pack files need mmap, which requires the page cache.
The solution: enable direct_io per file, not globally. Files inside .git/ get normal cached I/O. Everything else gets direct_io.
// shouldUseDirectIo decides whether a file should bypass the page cache.
// Files inside .git/ need mmap support, so they use cached I/O.
// All other files use direct_io to ensure freshness from remote peers.
func shouldUseDirectIo(path string, flags int) bool {
// .git internals need mmap for pack files
if strings.HasPrefix(path, "/.git/") || path == "/.git" {
return false
}
// Read-only opens of known safe paths can use cache
if flags&(syscall.O_WRONLY|syscall.O_RDWR) == 0 {
return false
}
return true
}
Source: shouldUseDirectIo
In the Open handler, we set the direct_io flag on the file handle based on this check:
func (fs *KeibiFS) Open(path string, flags int) (errc int, fh uint64) {
handle := fs.allocHandle(path, flags)
if shouldUseDirectIo(path, flags) {
handle.DirectIo = true
}
return 0, handle.ID
}
Source: OpenEx handler
This fixed the SIGBUS crash. Git could now mmap its pack files because they went through the page cache, but four more issues remained.
Additional Issues
Issue 1: The Fsync Race
After fixing mmap, git clone started working but occasionally failed with EBADF (bad file descriptor) during the "resolving deltas" phase. The FUSE log showed something unexpected: Fsync arriving after Release for the same file handle.
In POSIX semantics, fsync should always happen before close. But FUSE does not guarantee this ordering. The kernel's FUSE module can reorder operations across different threads, and git uses multiple threads for delta resolution. One thread closes the file while another thread's fsync is still in the FUSE queue.
// Timeline of the race:
// Thread A: write() -> fsync() [queued in FUSE]
// Thread B: close() -> Release() [processed first]
// Thread A: fsync() arrives -> file handle already freed -> EBADF
The fix: when Fsync receives an invalid file handle, reopen the file by path, sync it, and close it:
func (fs *KeibiFS) Fsync(path string, datasync bool, fh uint64) int {
handle := fs.getHandle(fh)
if handle == nil {
// Handle was already released. Reopen by path to honor the sync.
f, err := os.OpenFile(fs.localPath(path), os.O_RDONLY, 0)
if err != nil {
return -fuse.EIO
}
defer f.Close()
if err := f.Sync(); err != nil {
return -fuse.EIO
}
return 0
}
return handle.Fsync(datasync)
}
Source: Write handler | Readdir
Issue 2: macOS fcopyfile After Close
On macOS, git clone would occasionally produce zero-byte files in the working tree. The FUSE logs showed writes arriving after Release had been called. This is not a FUSE bug; it is fcopyfile.
macOS's fcopyfile (used internally by cp, Finder, and apparently git's checkout code on Darwin) can issue writes to a file descriptor that has already been closed from the application's perspective. The kernel keeps the file descriptor alive because fcopyfile operates at a lower level, but the FUSE Release callback fires when the last userspace close() happens.
The fix: buffer writes that arrive after Release and flush them when the next Open of the same path occurs, or after a short timeout:
func (fs *KeibiFS) Write(path string, buff []byte, ofst int64, fh uint64) int {
handle := fs.getHandle(fh)
if handle == nil {
// Post-release write (macOS fcopyfile). Buffer it.
fs.bufferPendingWrite(path, buff, ofst)
return len(buff)
}
return handle.Write(buff, ofst)
}
Source: Write handler with post-Release buffering
Issue 3: Permission Denied on Reopen
Git creates some files (like pack index files) with mode 0444, meaning read-only. When our Fsync fallback tried to reopen these files, it got EACCES. The file was owned by the current user, but the write bit was not set, and os.OpenFile with O_WRONLY respects permissions.
The fix: before reopening for fsync, ensure the file has owner-write permission, then restore the original mode afterward:
func (fs *KeibiFS) reopenForSync(path string) (*os.File, error) {
fullPath := fs.localPath(path)
info, err := os.Stat(fullPath)
if err != nil {
return nil, err
}
// Temporarily add owner-write if needed
origMode := info.Mode()
if origMode&0200 == 0 {
os.Chmod(fullPath, origMode|0200)
defer os.Chmod(fullPath, origMode)
}
return os.OpenFile(fullPath, os.O_RDONLY, 0)
}
Issue 4: sudo git Hangs Forever
Running sudo git clone into the FUSE mount would hang indefinitely. The root user could not access the FUSE mount because, by default, FUSE mounts are only accessible to the user who mounted them. Root's open() call would block waiting for a FUSE response that never came, because the FUSE daemon rejected the access.
The fix: mount with -o allow_other so that all users (including root) can access the filesystem:
func (fs *KeibiFS) Mount(mountpoint string) error {
fs.host.Mount(mountpoint, []string{
"-o", "allow_other",
"-o", "default_permissions",
})
return nil
}
On Linux, this also requires adding user_allow_other to /etc/fuse.conf. On macOS with macFUSE, it works out of the box.
The Result
After all five fixes, git operations work correctly inside a KEIBIDROP FUSE mount:
git cloneworks end-to-end: pack files mmap correctly, delta resolution completes without EBADF, and checkout writes arrive in order.git checkoutrewrites the working tree through FUSE, and all changes sync to the peer.git commithandles staging, tree creation, and commit object writing. The new pack data syncs immediately.git pushandgit pullwork normally since they go through git's own transport, not the filesystem.
Changes made by one peer appear on the other peer's mount within seconds. You can have two developers with the same repository mounted via KEIBIDROP, each making commits, and see each other's work in real time.
Lessons Learned
- I assumed
mmapwas limited to databases and memory-mapped I/O libraries, but git, SQLite, Python'smmapmodule, Java'sMappedByteBuffer, and countless other tools depend on it. If your FUSE filesystem usesdirect_ioglobally, you will break more things than you expect. - The kernel's FUSE module dispatches operations across multiple threads, so operation ordering is not guaranteed.
Fsynccan arrive afterRelease;Writecan arrive afterReleaseon macOS. Your handlers must be defensive. - macOS has its own copy semantics.
fcopyfiledoes not behave likeread()/write()loops; it can continue writing after the application has closed the file descriptor. If you only test on Linux, you will miss this. - The instinct is to set
direct_ioglobally or not at all, but per-file granularity gives you the best of both worlds: freshness for synced files and mmap support for tools that need it.