NoiseBandNet: Controllable Time-Varying Neural Synthesis of Sound Effects Using Filterbanks
[Paper] | [Code]
Reconstructed audio comparison between multiple configurations of the DDSP time-varying FIR noise synthesiser [1] and NoiseBandNet. The top row shows the waveform of the entire sound, the middle row its log-magnitude spectrogram and at the bottom a detail of the transient. The transient spot is annotated with a vertical dashed line in the first and third rows. The left column shows the original training sample: a short metal impact. The middle columns show the reconstruction of five different configurations of the DDSP time-varying FIR noise synthesiser with 128, 512, 1024, 4096 and 8192 taps respectively, all of them with a hop size of 32 samples. Observe its time and frequency trade-off: the frequency resolution increases with the number of taps at the same time the time resolution decreases, and vice-versa. The right column shows the NoiseBandNet reconstruction using 2048 filters and a synthesis window of 32 samples, maintaining both good time and frequency resolution.
Original (training data) | DDSP128taps | DDSP512taps | DDSP1024taps | DDSP4096taps | DDSP8192taps | NoiseBandNet (ours) |
---|---|---|---|---|---|---|
We formally explore the suitability of NoiseBandNet comparing its reconstruction capabilities against four different configurations of the original DDSP time-varying FIR noise synthesiser [1] with a configuration of FIR filter taps of 256 (DDSP256taps), 512 (DDSP512taps), 1024 (DDSP1024taps) and 4096 (DDSP4096taps). The R code and loss data used to perform the statistical analysis on section IV-B of the paper can be found here.
Original (training data) | DDSP256taps | DDSP512taps | DDSP1024taps | DDSP4096taps | NoiseBandNet (ours) | |
---|---|---|---|---|---|---|
Footsteps | ||||||
Thunderstorm | ||||||
Pottery | ||||||
Knocking | ||||||
Metal |
Corresponding to Section V-A of the paper.
Resynthesis (no randomisation) | Stereo generation | Top-k randomisation I | Top-k randomisation II | Frequency shift randomisation I | Frequency shift randomisation II | Both randomisations I | Both randomisations II |
---|---|---|---|---|---|---|---|
Corresponding to Section V-B of the paper. Caution: loud.
Metal impact (training data) | Wilhelm scream (training data) | Electric drill (training data) | |
---|---|---|---|
Beatbox (target loudness) | Beatbox to metal Mix | Beatbox to scream Mix | Beatbox to drill Mix |
Scribbling (target loudness) | Scribbling to metal Mix | Scribbling to scream Mix | Scribbling to drill Mix |
Squeaky toy (target loudness) | Squeaky toy to metal Mix | Squeaky toy to scream Mix | Squeaky toy to drill Mix |
Corresponding to Section V-C of the paper. Caution: loud.
Metal impact (training data) | Metal impact control I | Metal impact control II | Metal impact control III |
Wilhelm scream (training data) | Wilhelm scream control I | Wilhelm scream control II | Wilhelm scream control III |
Electric drill (training data) | Electric drill control I | Electric drill control II | Electric drill control III |
Footsteps on metal sounds by: Freesound user "Eelke", licensed under CC BY 4.0: https://freesound.org/people/Eelke/sounds/462599/
Thunderstorm sounds by: Freesound user "InspectorJ", licensed under CC BY 4.0: https://freesound.org/people/InspectorJ/sounds/360328/
Pottery sounds by: Freesound user "Tumbleweed3288", licensed under CC0 1.0: https://freesound.org/people/Tumbleweed3288/sounds/381638/ and https://freesound.org/people/Tumbleweed3288/sounds/381548/
Knocking sounds by: Adrián Barahona-Ríos & Sandra Pauletto [2], licensed under CC BY 4.0: https://zenodo.org/record/3668503
Metal sounds by: Freesound user "gokalp_gonen", licensed under CC0 1.0: https://freesound.org/people/gokalp_gonen/sounds/640517/ and https://freesound.org/people/gokalp_gonen/sounds/640518/
Metal impact sounds by: Freesound user "jorickhoofd", licensed under CC BY 4.0: https://freesound.org/people/jorickhoofd/sounds/160045/
Wilhelm scream by: Freesound user "SweetNeo85", licensed under CC Sampling Plus 1.0: https://freesound.org/people/SweetNeo85/sounds/13797/
Drill sounds by: Freesound user "aharri6", licensed under CC Sampling Plus 1.0: https://freesound.org/people/aharri6/sounds/71079/
Beatbox sounds by: Freesound user "VocalPercussion", licensed under CC0 1.0: https://freesound.org/people/VocalPercussion/sounds/245324/
Scribbling sounds by: Freesound user "InspectorJ", licensed under CC BY 4.0: https://freesound.org/people/InspectorJ/sounds/398271/
Squeaky toy sounds by: Freesound user "metrostock99", licensed under CC BY 4.0: https://freesound.org/people/metrostock99/sounds/514701/
References
[1] Engel, Jesse, et al. "DDSP: Differentiable Digital Signal Processing." arXiv preprint arXiv:2001.04643 (2020).
[2] Barahona-Ríos, Adrián and Sandra Pauletto. "Synthesising Knocking Sound Effects Using Conditional WaveGAN." In: Proceedings of the 17th Sound & Music Computing Conference, pp. 450-456, Torino, Italy. 2020.