Hunting a 250% CPU Bug in a Vite Monorepo
My MacBook Pro fans rarely spin up. When they did on a quiet afternoon with nothing obvious running, I knew something was off. Activity Monitor showed three Node processes pinned at 236–250% CPU each — all Vite dev servers from my SaaS monorepo.
The obvious answer would have been "Vite just does that, it's a big project." The actual answer turned out to be a four-line config change and a one-line hook interaction I'd never have spotted without methodical evidence-gathering.
This is a story about resisting the urge to guess, and about what agentic debugging actually looks like when the target is a performance problem, not a crash.
The Setup
The monorepo runs four Vite dev servers in parallel via devenv up: a legacy management client, a rewritten management client, a PWA for end users, and an admin console. Each is a separate app in its own workspace, each imports from a handful of shared packages (@hausify/ui, @hausify/api-client, @hausify/utils), all linked via pnpm symlinks.
Running all four at once has always been somewhat heavy — they're four full dev servers, after all. But this was different. The fans were audible. The system load average hit 14 on an 8-core machine. Something was genuinely broken.
Resisting the Obvious "Fix"
The first instinct is to start changing things. Disable plugins. Switch to polling. Add optimizeDeps.exclude. Reinstall fsevents. Each of those might help. Each of those is also a shot in the dark.
I have a rule for debugging: no fixes before root cause. A guess that happens to work teaches you nothing. A guess that doesn't work wastes time and adds complexity to the codebase. The cost of gathering evidence first is almost always less than the cost of three wrong guesses in a row.
So the first step wasn't a fix. It was a measurement.
Gathering Evidence
ps -axo pid,pcpu,pmem,command | grep vite
Three Vite processes at 240% cumulative CPU. One — the admin — at 0%. That single asymmetry was the most important clue of the entire investigation. The admin runs the same Vite version, the same plugins, the same shared config. If it were a Vite-level problem, all four would misbehave. Something about the other three was different.
Next: file descriptors. On macOS, a Vite dev server using fsevents correctly should hold one or two native watcher descriptors for the project root. If it's polling directory-by-directory, you see hundreds of DIR and KQUEUE descriptors instead.
lsof -p <pid> | awk '{print $5}' | sort | uniq -c
The hot Vite processes had 253 DIR descriptors and 93 KQUEUE descriptors each. The admin had 38 and 9. That was a 6x difference in watcher surface area — strongly suggesting the hot processes were doing far more file-system work, either because they were watching more, or because something was constantly changing underneath them.
To confirm, I took a profiler sample:
sample <pid> 3
Fifty-four percent of the main thread was in node::fs::AfterStat — the callback that fires after an async lstat() completes. Another twelve percent was in fs::LStat itself. The process wasn't compiling, wasn't bundling, wasn't serving requests. It was doing a continuous storm of file-system stat calls.
That's when the picture came together.
The Root Cause
Vite's file watcher calls stat() on files when it receives events — that's normal. What's not normal is calling stat tens of thousands of times per minute on an idle dev server. That only happens if something is constantly changing on disk, causing constant events.
And something was constantly changing on disk.
My development environment has a post-edit hook that runs mvn compile whenever a Java source file is saved. Maven writes compiled .class files to api/target/classes/. The Vite dev servers were configured to ignore only **/node_modules/** and **/.git/** — which meant api/target/ was being watched. Every Java recompile fired an avalanche of file events, and every event triggered a lstat call in Vite.
The legacy admin client was unaffected because it has fewer files in its source tree and doesn't import from the large shared packages, so its watcher graph was small enough that the noise didn't dominate.
There was a second, related issue: none of the Vite configs had an optimizeDeps.include entry for the workspace packages. In a pnpm monorepo, workspace packages are symlinked rather than installed as CommonJS bundles. Vite treats them as source by default, which is great for HMR on actively-developed packages — but for stable generated code like an API client, it means the dev server crawls into the symlinked source on every cold start, paying a repeated resolution cost.
The Fix
Two changes, in one shared config file:
export const SHARED_WATCH_IGNORED = [
"**/node_modules/**",
"**/.git/**",
"**/build/**",
"**/dist/**",
"**/api/target/**", // the smoking gun
"**/.devenv/**",
"**/.pulumi/**",
"**/e2e/test-results/**",
"**/e2e/playwright-report/**",
];
export const SHARED_OPTIMIZE_DEPS_INCLUDE = [
"@hausify/api-client/client.gen",
"@hausify/api-client/sdk.gen",
"@hausify/api-client/@tanstack/react-query.gen",
"@hausify/utils/image-compression",
];Each of the four Vite configs imports these constants and wires them into server.watch.ignored and optimizeDeps.include. Critically, @hausify/ui is not in the prebundle list — it's under active development and I want HMR on it. The prebundle list only covers packages that change rarely during a coding session.
Verifying the Fix
Here's what matters about debugging: the fix isn't done when it seems to work. It's done when the measurements confirm it.
| Metric (one Vite dev server) | Before | After | Delta |
|---|---|---|---|
| Cumulative average CPU | ~240% | 0.0% | — |
| CPU time / wall time | ~50% | ~1.5% | ~30x lower |
DIR file descriptors | 253 | 60 | -76% |
KQUEUE file descriptors | 93 | 13 | -86% |
fs::AfterStat samples in a 2s profile | ~1150 | 0 | eliminated |
The systm file descriptor — the one native fsevents watcher — reappeared in the healthy state. fsevents was never broken. It just couldn't keep up with the volume of events being generated by files that shouldn't have been watched in the first place.
System-wide: load average dropped from 14 to under 3. The fans went quiet. HMR kept working across all four servers.
The Lessons
The asymmetry is the clue. Four near-identical Vite servers, one behaving correctly. The question isn't "what's wrong with Vite" — it's "what's different about the three hot ones." Asymmetries in systems that should be symmetric point directly at the cause.
Profile before you fix. sample on macOS (or perf on Linux) takes seconds and tells you where time actually goes. "It's probably the plugin" is not evidence. A call graph with fs::AfterStat at 54% is evidence.
Watchers have blast radius. A file watcher's ignored list isn't a convenience — it's a correctness property. If anything in your build pipeline writes into the watched tree (compiled output, generated code, cache files), the watcher will burn CPU until it's told to stop.
Monorepo symlinks have costs. pnpm workspaces are elegant, but every Vite instance discovers them independently and pays its own resolution bill. optimizeDeps.include for stable workspace packages is the cheap fix.
Agentic debugging works. The entire investigation — from "fans are loud" to "merged fix" — took about thirty minutes of back-and-forth. Not because the agent guessed well, but because the agent ran ps, lsof, sample, read the outputs, formed hypotheses, and tested them against new measurements. The human contribution was mostly "yes, that's the right question to ask next." The machinery of systematic debugging scales beautifully when there's something willing to grind through twelve shell commands in a row without losing the thread.
If your dev server is hot and you don't know why, start with ps and sample. The answer is almost always hiding in the file system.