NoiseBandNet: Controllable Time-Varying Neural Synthesis of Sound Effects Using Filterbanks Reconstructed audio comparison between multiple configurations of the DDSP time-varying FIR noise synthesiser [1] and NoiseBandNet. The top row shows the waveform of the entire sound, the middle row its log-magnitude spectrogram and at the bottom a detail of the transient. The transient spot is annotated with a vertical dashed line in the first and third rows. The left column shows the original training sample: a short metal impact. The middle columns show the reconstruction of five different configurations of the DDSP time-varying FIR noise synthesiser with 128, 512, 1024, 4096 and 8192 taps respectively, all of them with a hop size of 32 samples. Observe its time and frequency trade-off: the frequency resolution increases with the number of taps at the same time the time resolution decreases, and vice-versa. The right column shows the NoiseBandNet reconstruction using 2048 filters and a synthesis window of 32 samples, maintaining both good time and frequency resolution. We formally explore the suitability of NoiseBandNet comparing its reconstruction capabilities against four different configurations of the original DDSP time-varying FIR noise synthesiser [1] with a configuration of FIR filter taps of 256 (DDSP256taps), 512 (DDSP512taps), 1024 (DDSP1024taps) and 4096 (DDSP4096taps). The R code and loss data used to perform the statistical analysis on section IV-B of the paper can be found here. Corresponding to Section V-A of the paper. Corresponding to Section V-B of the paper. Caution: loud. Corresponding to Section V-C of the paper. Caution: loud.
[Paper] | [Code] Original (training data) DDSP128taps DDSP512taps DDSP1024taps DDSP4096taps DDSP8192taps NoiseBandNet (ours) Original (training data) DDSP256taps DDSP512taps DDSP1024taps DDSP4096taps NoiseBandNet (ours) Footsteps Thunderstorm Pottery Knocking Metal Resynthesis (no randomisation) Stereo generation Top-k randomisation I Top-k randomisation II Frequency shift randomisation I Frequency shift randomisation II Both randomisations I Both randomisations II Metal impact (training data)
Wilhelm scream (training data)
Electric drill (training data)
Beatbox (target loudness)
Beatbox to metal
Mix
Beatbox to scream
Mix
Beatbox to drill
Mix
Scribbling (target loudness)
Scribbling to metal
Mix
Scribbling to scream
Mix
Scribbling to drill
Mix
Squeaky toy (target loudness)
Squeaky toy to metal
Mix
Squeaky toy to scream
Mix
Squeaky toy to drill
Mix
Metal impact (training data)
Metal impact control I
Metal impact control II
Metal impact control III
Wilhelm scream (training data)
Wilhelm scream control I
Wilhelm scream control II
Wilhelm scream control III
Electric drill (training data)
Electric drill control I
Electric drill control II
Electric drill control III
Footsteps on metal sounds by: Freesound user "Eelke", licensed under CC BY 4.0: https://freesound.org/people/Eelke/sounds/462599/
Thunderstorm sounds by: Freesound user "InspectorJ", licensed under CC BY 4.0: https://freesound.org/people/InspectorJ/sounds/360328/
Pottery sounds by: Freesound user "Tumbleweed3288", licensed under CC0 1.0: https://freesound.org/people/Tumbleweed3288/sounds/381638/ and https://freesound.org/people/Tumbleweed3288/sounds/381548/
Knocking sounds by: Adrián Barahona-Ríos & Sandra Pauletto [2], licensed under CC BY 4.0: https://zenodo.org/record/3668503
Metal sounds by: Freesound user "gokalp_gonen", licensed under CC0 1.0: https://freesound.org/people/gokalp_gonen/sounds/640517/ and https://freesound.org/people/gokalp_gonen/sounds/640518/
Metal impact sounds by: Freesound user "jorickhoofd", licensed under CC BY 4.0: https://freesound.org/people/jorickhoofd/sounds/160045/
Wilhelm scream by: Freesound user "SweetNeo85", licensed under CC Sampling Plus 1.0: https://freesound.org/people/SweetNeo85/sounds/13797/
Drill sounds by: Freesound user "aharri6", licensed under CC Sampling Plus 1.0: https://freesound.org/people/aharri6/sounds/71079/
Beatbox sounds by: Freesound user "VocalPercussion", licensed under CC0 1.0: https://freesound.org/people/VocalPercussion/sounds/245324/
Scribbling sounds by: Freesound user "InspectorJ", licensed under CC BY 4.0: https://freesound.org/people/InspectorJ/sounds/398271/
Squeaky toy sounds by: Freesound user "metrostock99", licensed under CC BY 4.0: https://freesound.org/people/metrostock99/sounds/514701/
References
[1] Engel, Jesse, et al. "DDSP: Differentiable Digital Signal Processing." arXiv preprint arXiv:2001.04643 (2020).
[2] Barahona-Ríos, Adrián and Sandra Pauletto. "Synthesising Knocking Sound Effects Using Conditional WaveGAN." In: Proceedings of the 17th Sound & Music Computing Conference, pp. 450-456, Torino, Italy. 2020.
Sound examples
Time and frequency resolution comparison
Reconstruction
Creative experiments
Amplitude randomisation
Loudness transfer
Training using user-defined control parameters
Training sounds attribution