Jansson, A., Humphrey, E., Montecchio, N., Bittner, R., Kumar, A. and Weyde, T. ORCID: 0000-0001-8028-9905 (2017).
Singing voice separation with deep U-Net convolutional networks.
Paper presented at the 18th International Society for Music Information Retrieval Conference, 23-27 Oct 2017, Suzhou, China.
Abstract
The decomposition of a music audio signal into its vocal and backing track components is analogous to image-to-image translation, where a mixed spectrogram is transformed into its constituent sources. We propose a novel application of the U-Net architecture — initially developed for medical imaging — for the task of source separation, given its proven capacity for recreating the fine, low-level detail required for high-quality audio reproduction. Through both quantitative evaluation and subjective assessment, experiments demonstrate that the proposed algorithm achieves state-of-the-art performance.
Publication Type: | Conference or Workshop Item (Paper) |
---|---|
Departments: | School of Arts & Social Sciences > Music |
Date Deposited: | 19 Mar 2018 10:47 |
URI: | https://openaccess.city.ac.uk/id/eprint/19289 |
|
Text
- Accepted Version
Available under License Creative Commons: Attribution International Public License 4.0. Download (2MB) | Preview |
Export
Downloads
Downloads per month over past year
Actions (login required)
![]() |
Admin Login |