How we built pitch detection that works in noisy rooms

When we started building Guitar Tunio, the first prototype worked beautifully — in a quiet room. The moment we took it to a rehearsal space with a humming amp, a drummer warming up, and someone talking three feet away, accuracy dropped off a cliff.

This is the story of how we solved that problem.

Why FFT Alone Isn’t Enough

The Fast Fourier Transform (FFT) is the textbook approach to pitch detection. It converts a chunk of audio from the time domain into the frequency domain, giving you a spectrum of which frequencies are present and how loud they are. In theory, you just pick the loudest peak and that’s your note.

In practice, it’s more complicated. A guitar string doesn’t produce a single frequency — it produces a fundamental plus a series of harmonics. The second harmonic (one octave up) is often louder than the fundamental, especially on the lower strings. A naive FFT implementation will frequently report the wrong octave.

Worse, FFT resolution is limited by your window size. A longer window gives better frequency resolution but introduces latency. For a tuner that needs to feel instant, you can’t afford a 500ms window.

The Autocorrelation Method

Our core pitch detection uses autocorrelation — essentially, we compare the audio signal with a time-shifted copy of itself. When the shift equals one period of the fundamental frequency, the correlation peaks. This naturally handles the harmonic problem because the fundamental period is what repeats, regardless of which harmonic is loudest.

We use a normalized variant (NSDF — Normalized Square Difference Function) that’s more robust to amplitude changes, which matters when a plucked note is decaying.

The Noise-Cancellation Pipeline

Accurate pitch detection is only half the battle. The other half is figuring out when a valid note is being played versus when the microphone is just picking up room noise.

Our pipeline has three stages:

Onset detection: We look for sudden increases in energy that indicate a string has been plucked. This prevents the tuner from reacting to gradual background noise.
Spectral gating: We compare the current spectrum against a noise profile captured during silence. Frequencies that match the noise floor get suppressed.
Confidence scoring: Each pitch estimate gets a confidence score based on how clean the autocorrelation peak is. Below a threshold, we show “listening” instead of a potentially wrong note.

The result is a tuner that stays quiet when you’re not playing and responds accurately when you are — even in a noisy rehearsal room.