T-FOLEY

A controllable waveform-domain diffusion model for temporal-event-guided foley sound synthesis (ICASSP 2024)

T-FOLEY is a waveform-domain diffusion model that synthesizes foley sound effects conditioned on temporal event sequences. Unlike spectrogram-based approaches, T-FOLEY operates directly in the waveform domain and allows fine-grained control over the timing of sound events in the generated audio.