
United States · Startup
An AI-assisted digital audio workstation (DAW) that places Foley and sound effects in real-time based on video cues.
Developers of the Gemini family of models, which are trained from the start to be multimodal across text, images, video, and audio.
Developer of the Llama series of open-source LLMs.
Open source generative AI company, creators of Stable Audio.
A generative AI audio company building models that generate realistic music and speech.
AI music generation platform founded by former Google DeepMind researchers.
Canada · Company
Developer of Wwise, the leading interactive audio middleware for the gaming industry.
United Kingdom · Company
Develops software for sound design, including Weaponiser and Dehumaniser.
Japan · Startup
AI music generator for video creators allowing customization of length, tempo, and mood.
Procedural audio generation suites pair visual scene understanding with diffusion or autoregressive audio models so ambience, Foley, and music can be generated parametrically. They consume metadata such as material tags, camera motion, and emotional arcs, then emit multitrack stems synchronized via SMPTE timecode. Diffusion-based engines like ElevenLabs, Meta AudioCraft, or proprietary studio models bake in room impulse responses so output matches the acoustics of a scene without manual convolution.
Game studios and streamers lean on these suites to localize shows into dozens of languages overnight, generate adaptive scores that react to gameplay, or propagate consistent Foley across large user-generated libraries. Podcasters and educational creators use them to sonify archival footage, while immersive venues generate scent-plus-audio routines from the same scene graph. Crucially the suites include rights management, so generated stems carry usage logs for royalty workflows.
Adoption (TRL 5) hinges on creative control: supervisors need sliders for intensity, instrumentation, and mix balance, not black-box output. Toolmakers are responding with DAW plugins, prompt templates, and guardrails that ensure unique sonic identity. Standards for watermarking AI audio are emerging alongside Dolby Atmos deliverables, pointing to a future where generative audio sits alongside human composers rather than replacing them, scaling routine tasks while keeping signature motifs under human direction.