How to Run PrismAudio on JarvisLabs
ยท 8 min read
PrismAudio just dropped. It's a 518M parameter Video-to-Audio model accepted at ICLR 2026 that generates synchronized audio from silent video. Give it a clip of someone drumming on water bottles, and it produces the sound of tapping and splashing. Benchmark inference is 0.63 seconds, faster than both MMAudio (1.30s) and ThinkSound (1.07s).
We ran it on a JarvisLabs A100. Here's how we got it working, the gotchas we hit along the way, and a clean recipe you can follow.