Joint singing voice separation and F0 estimation with deep U-net architectures
Jansson, A., Bittner, R. M., Ewert, S. & Weyde, T. ORCID: 0000-0001-8028-9905 (2019). Joint singing voice separation and F0 estimation with deep U-net architectures. 2019 27th European Signal Processing Conference (EUSIPCO), 2019-S, doi: 10.23919/EUSIPCO.2019.8902550
Abstract
Vocal source separation and fundamental frequency estimation in music are tightly related tasks. The outputs of vocal source separation systems have previously been used as inputs to vocal fundamental frequency estimation systems; conversely, vocal fundamental frequency has been used as side information to improve vocal source separation. In this paper, we propose several different approaches for jointly separating vocals and estimating fundamental frequency. We show that joint learning is advantageous for these tasks, and that a stacked architecture which first performs vocal separation outperforms the other configurations considered. Furthermore, the best joint model achieves state-of-the-art results for vocal-f0 estimation on the iKala dataset. Finally, we highlight the importance of performing polyphonic, rather than monophonic vocal-f0 estimation for many real-world cases.
Publication Type: | Article |
---|---|
Additional Information: | © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Subjects: | M Music and Books on Music Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Departments: | School of Science & Technology > Computer Science |
Download (875kB) | Preview
Export
Downloads
Downloads per month over past year