T-FOLEY
A controllable waveform-domain diffusion model for temporal-event-guided foley sound synthesis (ICASSP 2024)
T-FOLEY is a waveform-domain diffusion model that synthesizes foley sound effects conditioned on temporal event sequences. Unlike spectrogram-based approaches, T-FOLEY operates directly in the waveform domain and allows fine-grained control over the timing of sound events in the generated audio.
The model was presented at IEEE ICASSP 2024 (Chung et al., 2024).
Links: Code