A Napkin Films series

The Language of Sound

A documentary series that teaches the whole world of sound, with the lesson living inside the music.

The Language of Sound teaches how music is actually built, using the ChipForge engine, with the warmth of Peter and the Wolf or The Young Person’s Guide to the Orchestra. Each episode is its own track where the teaching lives inside the music: you hear the idea as you learn it.

Everything is synthesized from code, sample by sample, and shown on a neon oscilloscope so you can watch the sound as it is explained. Narrated by the Governor.

6 of 14 episodes released

Season 1 · The Foundations

From a single note to a finished master.

  1. S1E1 The Note Waveform, ADSR, filter, resonance, vibrato: the anatomy of one sound. Watch · 3:39
  2. S1E2 The Instrument Layering and multi-voice: material plus action plus size, and formants. Watch · 3:04
  3. S1E3 The Arrangement Scale, key, chords, progression, voice-leading, motif, tension and release. Watch · 3:05
  4. S1E4 The Mix Pan, reverb gradient, EQ real-estate, and curing mud, harsh, and thin. Watch · 3:06
  5. S1E5 The Master Limiting, LUFS, multiband, true-peak: measure, do not guess. Watch · 3:07
  6. S1E6 How the Machine Thinks EDM, Euclidean rhythm, Markov chains: the mathematics underneath. Watch · 3:04

Season 2 · The Voices of the Machine

Every way the engine makes a sound from nothing.

Season 2: The Voices of the Machine

Season 1 covered the foundations, the note, the instrument, the arrangement, the mix, the master. Season 2 opens up synthesis itself: every way the engine makes a sound from nothing, subtractive, additive, FM, Karplus-Strong physical modeling, granular clouds, wavetable morphing, the vocal tract, and the Game Boy chip where it all began.

Further seasons are sketched: the mathematics of music (tuning, rhythm, generative melody), the worlds (genre by genre), and the craft (stereo, modulation, reverb, mastering).

Go deeper: The five-altitude map of sound (the essay). The written deep-dive that maps the same territory the series teaches.

The deep-dive essay

The Language of Sound: A Field Guide for Making Music

I could already make sound. I could not name it, and you cannot ask for a thing you cannot name. So I asked my music engine for a map. This is that map: sound built in altitudes, and the words that let you climb it.

I have always been moved by sound. Not as a hobby I picked up later, but as
something closer to the center of me. A chord change can reorganize a room. A low
end you feel in your chest is a different animal from one you only hear. I love
the way music makes me feel, and for a long time that love sat on one side of a
wall. I could feel sound completely. I could not make it.

Then I built ChipForge, my music engine, and the wall moved. ChipForge makes
every instrument from code. No samples, no recordings, just numbers between
negative one and positive one, forty four thousand of them a second, sent to a
speaker as vibration. Every effect is arithmetic on that list of numbers. That is
the demystifying truth I keep coming back to: it is all math on a list of numbers,
and I can already make it produce sound.

So capability was never the thing stopping me. The thing stopping me was that I
could not name what I was hearing. When something sounded amateur I had no word
for which part was wrong, and you cannot ask for a thing you cannot name, cannot
notice when it is off, cannot tell where its edge is. The pros are not smarter
about sound. They have words for it, and the words are hooks that let you pull one
thing at a time out of a wall of noise. I wrote a whole essay recently, Working at the Frontier, arguing
that the real limit on thought is language, the narrow channel cognition has to
pass through to become anything you can share. Sound turned out to be the same
problem wearing different clothes, and the same lesson runs through Making Complexity Visible: a hard thing stops being a black box once you can name its parts.

So I asked ChipForge for a map. Not more features, a vocabulary. A field guide
written for my own engine, every word tied to a real knob in the code and to a
sentence I could actually say out loud. What came back is the most useful thing I
have read about music in years, and it is the thing I want to give you here.

Sound is built in altitudes

Here is the whole model, and once you see it you cannot unsee it. Everything in
this engine, and in all of music production, stacks in altitudes:

  Altitude 5 :: THE MASTER       the whole song as one finished object
  Altitude 4 :: THE MIX          how the parts sit together: space, balance, width
  Altitude 3 :: THE ARRANGEMENT  what notes, when, in what role
  Altitude 2 :: THE INSTRUMENT   the character of one voice
  Altitude 1 :: THE NOTE         one sound from start to finish
  Altitude 0 :: THE PHYSICS      what sound actually is
        ↓
       your ear, the feeling

A signal flows up. A note is shaped into an instrument, instruments are arranged,
the arrangement is mixed, the mix is mastered. But when you diagnose a problem you
go down. "This feels harsh" is a feeling. "The highs are too loud" is a guess at
altitude four. "The lead's filter is too open up at the top" is the actual cause,
down at altitude two, where you can reach in and fix it.

That is the entire skill in one sentence. Every problem, mud, harshness, the
vague sense that something sounds cheap, lives at exactly one altitude, and you
fix it there. Learning to ask "which altitude is this?" is most of the game. The
words below are organized that way, climbing from the physics up to the finished
record.

Altitude 0: the physics

Four properties, and everything else is a combination of them. Frequency is how
fast the air vibrates, and doubling it moves you up one octave. Amplitude is how
big the vibration, the loudness, measured in decibels. Timbre is the color of a
sound, the reason a flute and a saw playing the same pitch are instantly
different, and it comes from harmonics, the quiet higher tones stacked above the
fundamental. Get those four and you have the alphabet. There is nothing to build
down here, only to learn, because the engine already speaks physics fluently.

Altitude 1: the note

A single note has a shape in pitch, which is the waveform, and a shape in time,
which is the envelope, and then it gets sculpted by a filter. Those three are the
holy trinity of synthesis, the foundation under nearly every synth sound ever
made.

The waveform is the raw color. A sine is pure and hollow, a tuning fork with no
overtones. A sawtooth is buzzy and full, all the harmonics at once, the rasp under
a trumpet or a screaming lead. A square is woody and retro, the Game Boy tone.
Noise is hiss and wind, the raw material of every cymbal and snare.

The envelope is the most important idea for making something sound played instead
of generated, and it has four letters, ADSR. Attack is how fast the note reaches
full volume; fast is a pluck, slow is a swell. Decay is how it settles. Sustain is
the level it holds while you keep the key down. Release is the tail, how long it
rings after you let go. The single most common beginner mistake, said in this
language, is notes that are too short. Real notes breathe.

The filter is where most of the magic hides. A low pass filter lets the lows
through and removes the highs above a cutoff point, and lowering that cutoff is
the one knob for "make it warmer" or "make it less harsh." Put an envelope on the
cutoff itself and the brightness moves over the life of the note: the swell of a
bowed string, the blat on the front of a brass stab. That moving filter is the
single biggest jump in realism that most beginners skip entirely.

Altitude 2: the instrument

An instrument is a waveform plus an envelope plus a filter, plus the thing that
makes it rich, which is layers. Real sounds are never one oscillator. A piano is a
hammer click on top of a string body on top of a sub. Stack two to four slightly
detuned voices and the tone gets a body and stops sounding thin.

The best trick I learned here is how to talk about a timbre at all: describe it as
material plus action plus size. A struck glass. A bowed, swelling string. A
plucked, decaying wire. A breathy, hollow tube. That is how producers brief each
other, and it maps cleanly onto the engine, because material is the waveform,
action is the envelope and the filter, and size is the layers and the register.
Once you can say "a struck glass," you can build a struck glass.

Altitude 3: the arrangement

This is where the conversation stops being about physics and starts being about
music. Which notes, in what order, played by which part of the band. The home note
is the key, the note everything wants to return to. A chord is notes stacked, a
progression is chords in sequence, and tension and resolution is the engine under
all of it: dissonance wants to move, consonance is home, and withholding the
resolution is exactly how a track stays brooding instead of letting you off the
hook.

Then there are roles. Kick, snare, and hat are the rhythm section. Bass is the
foundation. The pad is the warm floor. The lead is the voice, the hook. The real
arrangement skill is not adding, it is knowing what to leave out, managing the
density so every part has room. This is also where my engine has its own edge,
because an algorithm can make every single bar unique instead of copying and
pasting, and a machine can try a thousand variations of a groove before you pick
one. That advantage is the whole reason I built the thing from scratch.

Altitude 4: the mix

If the arrangement is what, the mix is where. A great mix has depth, width, and
clarity, which means you can hear every part and each one sits in its own place.
Pan moves a sound left or right. Reverb pushes it front to back, near or far. EQ
balances the tone, cutting the frequencies that are fighting. Compression tames
the loud parts so the quiet parts survive.

The idea that organized all of this for me is frequency real estate. Every sound
wants its own band. Sub bass owns the very bottom, the body sits in the low mids,
presence is the band where melody and vocals cut through, and there is a narrow,
painful band up high where too much energy makes your ears hurt. Mud is too many
sounds crowding the low mids. Harsh is too much in that painful band. Thin is
missing body. Those three words diagnose almost every bad mix I have ever made,
and now I can say which one it is.

Altitude 5: the master

Mastering is the final polish on the whole song summed into one stereo file,
making it loud and cohesive without breaking it. Limiting is a brick wall the
signal cannot cross, so you get volume without clipping. Glue compression makes
the separate parts feel like one performance instead of a pile of tracks.

But the real lesson of the top altitude is not a tool, it is a habit. Measure, do
not guess. My engine prints five numbers about any finished track: how loud it
peaks, its average energy, how much dynamic punch survives, how wide it is, and
where its tonal center sits. Reading those numbers turns "it sounds bad" into
something specific I can act on. That one habit, measuring instead of guessing, is
the fastest way to stop sounding like a beginner.

Name the altitude, fix it there

Here is how the whole map pays off in practice. Something sounds wrong. Instead of
flailing at random knobs, you name the feeling, the feeling gives you the word,
and the word points at the altitude and the fix.

Boxy and blurry means muddy, which is too much energy in the low mids at altitude
four, so you cut the low mids and thin the arrangement. Painful highs means harsh,
so you tame that narrow upper band. Small and weak means thin, so you layer the
voice and add weight underneath. Flat and robotic means it was never humanized, a
problem up at the arrangement, so you let the timing and the velocities breathe.
Wrong, off notes are almost never a synthesis problem, they are a key problem at
altitude three, notes that wandered outside the scale.

That table, feeling to word to fix, is the thing I actually wanted all along. Not
a bigger engine. A way to hear precisely, and to say what I hear.

Why this matters to me

This is a map, not a manual, and it is going to keep growing. Every time a session
turns up a useful word I add it to the field guide, so the thing stays alive
instead of going stale. I am publishing it because I suspect a lot of people are
where I was: able to feel sound all the way down, blocked only by not having the
words for it.

The music this engine makes is real and out in the world now. You can hear it on
my music page, the Napkin Films soundtracks and the Plan 9 volumes, all
of it rendered from pure math by ChipForge. Why I build my own engines instead of
reaching for someone else's is its own essay, Why I Build Creative Technology.
And I am now walking each altitude on
its own, in video, with the real audio drawing itself on the screen as it plays.
That series is The Language of Sound, the first episode is live, and the details
are at the end of this map.

I spent years on the wrong side of a wall, loving sound and unable to make it. It
turned out the wall was made of missing words. Once I had the map, the engine I
had already built stopped being a black box and became an instrument I could
finally play.

Watch the series

I am turning this map into a film series, the same idea walked one altitude at a
time, except now you can see it. No teacher on camera. Everything on the screen is
driven by the real audio, a live oscilloscope tracing each waveform as the Governor
names it. Season One is six episodes, from a single note up to the machine writing
music on its own, and the first two are live.

The full series, with every episode and how it is made, lives at The Language of Sound. Or jump straight to the playlist on YouTube. It sits alongside the rest of the studio at Napkin Films.