Projects 10 min read

THE GRAND TOUR: a two-voice film where the picture is the physics

A two-voice music film where the picture is the physics. A small AI crew raps the lecture, Pythagoras and the pure ratios and the golden mean, and then the maker's own voice takes the bloom. The visuals are the math, rendered as light, every pulse read straight from the track.

THE GRAND TOUR: a two-voice film where the picture is the physics

Watch on YouTube: https://youtu.be/RxXN5KQrLmo

CC BY 4.0 · 2:27 · breath of god · A minor · just intonation · BPM 122

This one started as a test. I sat down to read the words for the first time, on mic, to try out a few new things in our ChipForge engine. The take was rough, but the idea underneath it would not let go, so I kept building until it became a whole film.

The idea was simple. Make the picture out of the same math the music is made of. The score is in just intonation, so every interval is a small whole number ratio, beat free and pure. The form is Fibonacci, and the climax sits at the golden ratio of the timeline. So the visuals are not decoration over the sound. They are the same structure drawn as light. A breathing core. Organum rings at the pure ratios. Stacked harmonic waves. A golden spiral that opens at the golden mean. Nothing is a character. The shapes are the laws. And the whole picture is driven straight off the finished track, so the kick pulses the rings, the words flare the core, and the bloom erupts when the voice and the beat and the music all peak at once.

Then it grew a cast. The build-up, the lecture half, is rapped by a small AI crew. The Governor opens it quiet. OG Bobby Johnson carries the myth of Pythagoras and the pure ratios in his low chanting alter ego, Anubis, then drops into his rap-spit self for the double-time, trading bars with Plan 9 and the Governor over an Egyptian chant running underneath. OG Bobby and Anubis are one voice wearing two faces, the chant and the spit, the same character either way. The hard part was the craft difference between a voice that reinforces a human and a voice that replaces one. A reinforcement layer wants to be warped tight onto the take, but a lead that stands in for me has to be left to speak its full words, or it drops consonants and clips the ends of lines. Once I stopped fighting that, the build-up snapped into place.

The bloom is mine. My own recorded voice takes the climax, but I did not want to be up there alone, so the same crew come back as a chorus and join me, full force on Breathe and Alive, and then they fall away so I can close it by myself on the last line. It was never not you.

It is bookended the Napkin Films way. A Stranger Things red title over an uptuned cricket chirp I built in the engine, and at the end a Plan 9 bunny in a mandala, signing off in Italian. Non è un addio. È un arrivederci. A farewell, but not a goodbye.

It took five passes to get from that rough test read to the film that shipped. Almost all of the work was in the audio and the seam between a human voice and the machine voices around it. Here is how it came together.

1. The score: A minor, just intonation, Fibonacci form

The music is an original ChipForge composition, our own engine, no GPU and no samples. Key of A minor at 122 BPM, and the whole thing is tuned in just intonation rather than equal temperament. That means every interval is a small whole number ratio, a perfect fifth at exactly 3:2, an octave at 2:1, so the harmony is beat free and pure rather than the slightly detuned compromise a piano gives you. You can hear it as a kind of stillness in the chords.

The structure is the physics too. The form is Fibonacci, five sections riding bar boundaries: inhale, bones, life, bloom, release, landing at bars 8, 21, 42 and 55. The bloom, the climax, lands at bar 42, which is the golden ratio of the timeline. So the loudest most radiant moment of the picture is not placed by feel, it is placed by φ. Under all of it: Pythagorean organum, the overtone series stacked as harmony, Euclidean drums, and a slow breath LFO opening the whole field like a lung.

2. Two voices for two halves

The film hands off at the midpoint, right after the line "started believin'." Everything before is the lecture, the build-up, INHALE into BONES into LIFE, roughly the first 75 seconds. Everything after is the bloom and the release.

The build-up is voiced entirely by a small AI crew. The Governor opens it intimate and quiet. The myth of Pythagoras and the pure ratios is carried in a low chant. Then LIFE drops into double-time and becomes a four-way trade of bars: Plan 9, Anubis, OG Bobby Johnson, the Governor, passing the rhythm hand to hand over an Egyptian chant running underneath. The bloom is mine, my own recorded voice, and the same crew come back around me as a chorus so I am never up there alone.

3. The central lesson: a lead is not a reinforcement

This is the craft thing I will carry into every film after this one. There are two completely different jobs an AI voice can do next to a human, and they need opposite treatment.

A reinforcement voice sits under a human take to thicken it. In the bloom, the Governor doubles me. That voice wants to be dynamic-time-warped onto my take syllable by syllable and envelope-gated so it only speaks where I speak. Warp it tight, lock it to the human, and it disappears into the body of the voice.

A lead voice replaces a human. The entire AI build-up has no human under it. My first instinct was to treat it the same way, warp it to a guide, gate it. That was wrong, and it audibly broke the take: it dropped consonants and clipped the ends of lines, the missing "S" in "silence," the cut tail on "prayer." A lead has to be left to speak its full words. The fix was a clean uniform stretch of the natural read to fit the section, no timemap, no gate. The moment I stopped fighting it, the build-up snapped into place. One global tuning move on top: the ensemble read a touch high, so a single pitch-only shift of minus one and a half semitones with rubberband settled it.

4. OG Bobby Johnson and Anubis are one voice

Worth saying plainly because the film makes it sound like a cast: OG Bobby Johnson and Anubis are the same character. Anubis is OG Bobby's low chanting alter ego, the face he wears to carry the myth and the pure ratios. When he drops into the double-time spit, that is OG Bobby's rap self. One voice, two faces, the chant and the spit.

Under the whole build-up runs an Egyptian chant as atmosphere, the same character pitched deep: Ra, Nun, Maat, Nuk pu Nuk over a vowel drone, dropped three semitones, drenched in a large room reverb, and sat well below the bed so it reads as ground rather than melody.

5. The picture is the physics

There are no characters in the body of the film. Every shape on screen is one of the laws the music is built from, composited as additive light: gaussian sprites stamped into a float buffer and tonemapped at the end so bright things bloom the way real light does.

A breathing core at the center. Organum rings at the pure ratios, cool blue for the root, pale cyan for the fifth, gold for the octave. Stacked harmonic standing waves for the overtone series. Euclidean orbit dots for the drums. A wandering Lissajous figure for the lead. And the golden spiral, opening at the golden angle of 137.5 degrees and erupting at the golden mean.

None of it is keyframed by hand. The picture is driven straight off the finished mix. Per-frame envelopes for voice, kick and music energy are baked out to a drive file, and the render reads them: the spoken word flares the core, the kick pulses the rings, the overall energy breathes the brightness. That is why the sync is perfect. The image is not animated to the music, it is reading the music. 854 by 480, 12 frames a second, 1582 frames.

6. The bloom entry, and other gotchas paid for

The seam into the bloom stumbled at first, a hard sub-impact landing on the downbeat that felt like a trip. The fix was a sub-swell that crescendos in under the entry plus a gaussian dip taming the residual spike, so the climax arrives lifted instead of punched.

Two smaller ones, logged so I never repeat them: never route non-Latin text through a Latin font or you get tofu boxes, and this ffmpeg build has no rubberband filter, so an uptune uses asetrate (pitch and speed together) while a pitch-only move has to shell out to the rubberband CLI.

7. The bookends

The open is a Stranger Things red title over an uptuned ChipForge cricket and alien chirp, then "A NAPKIN FILMS PRODUCTION." The first cut ran seven seconds and the crickets were too loud and went on too long. The note that fixed it: bring back the attention-grabbing synth notes from an earlier version, blend the notes, the cricket and the music together, turn the crickets down, and cut it to about four and a half seconds led by the notes. The close is a Plan 9 bunny in a golden mandala signing off in Italian, "Non è un addio. È un arrivederci," a farewell but not a goodbye.

8. Five passes from a test read to a film

It really did start as a throwaway. I sat down to read the words on mic for the first time and to try a few new engine ideas. v1 through v5 then chased, in order: the clipped verse-ends (solved by the lead-versus-reinforcement realization in section 3), the slightly high pitch (the minus one and a half semitone settle), the bloom-entry stumble (the sub-swell and dip), a handful of missing or weak sounds around six, sixteen, twenty-eight and thirty-nine seconds (level rides), then the additive layers the film asked for: a chorus so I was not alone in the bloom, and OG Bobby and the Egyptian chant carried throughout. v5 added the bookends and shipped.

Final sound spec

A minor, 122 BPM, just intonation. Music-forward balance: voices sit at or just under the bed and the sidechain is deliberately light (threshold around minus 30 dB, ratio 1.4) so the score stays the hero and the words stay intelligible. The two-voice composite runs about 135 seconds; the full film with bookends is 147 seconds.

Related work

This continues the OYM and Plan 9 line. The lead-versus-reinforcement vocal craft grew out of the autotune and voice-clone work in THRONE PROTOCOL and the earlier rap films. The Stranger Things cricket bookend and the procedural-light approach are house techniques first shipped on THRONE PROTOCOL and refined here into a fully audio-reactive, character-free picture.

License

This film is licensed CC BY 4.0 (Creative Commons Attribution 4.0 International). Remix it, repost it, drop it into your own thing. Credit "Napkin Films / Organic Arts LLC" and link CC BY 4.0: https://creativecommons.org/licenses/by/4.0/

Engine code (Napkin Films, ChipForge) is licensed GPL-3.0-or-later. The music is an original ChipForge composition. The lyrics and the lead vocal are my own. ElevenLabs voice audio is licensed content and is not redistributed outside of this film.