
Developers of the Gemini family of models, which are trained from the start to be multimodal across text, images, video, and audio.

United States · University
Research lab hosting Josh Tenenbaum's Computational Cognitive Science group, a leader in probabilistic programming and neuro-symbolic models.
An AI video generation platform that includes a feature to automatically generate sound effects that match the action in the generated video.
Applied AI research company shaping the next era of art, entertainment and human creativity.
Open source generative AI company, creators of Stable Audio.
Adobe Research
United States · Research Lab
Conducts extensive research on computational photography and light-field processing.
Home to the Centre for Vision, Speech and Signal Processing (CVSSP), which conducts advanced research in audio-visual AI and automated sound synthesis.
United Kingdom · Company
Develops software for sound design, including Weaponiser and Dehumaniser.
Taiwan · Company
Offers MyEdit and PowerDirector, which now feature AI Sound Effect Generators that create audio from text prompts for video projects.
Automated Foley synthesis pipelines pair scene-understanding computer vision with conditional diffusion or autoregressive audio models to generate sound effects that match on-screen motion down to the frame. The system identifies object materials, surfaces, and contact dynamics, then renders multichannel samples that already align with the project’s timecode. Some suites output parametric control data so mixers can tweak intensity or swap alternate takes without regenerating from scratch.
Post houses use the tech to fill temp tracks, documentary producers sonify silent archives, and UGC platforms bring cinematic Foley to creators who lack studios. Sports broadcasters layer AI footsteps and cloth swishes for camera angles that lack microphones, and accessibility teams generate descriptive audio cues that mirror visual action. Because the models learn style from reference libraries, a showrunner can ask for “retro noir footsteps” or “anime sword flourishes” and receive cohesive results.
Adoption (TRL 5) depends on metadata discipline and rights management. Vendors embed provenance tags and watermarking so AI-generated effects remain distinguishable, and unions push for crediting policies to protect human Foley artists. Expect hybrid workflows where AI handles repetitive footsteps, freeing artisans to craft hero sounds that define a project’s sonic identity.