City Research Online

Singing voice separation with deep U-Net convolutional networks

Jansson, A., Humphrey, E., Montecchio, N. , Bittner, R., Kumar, A. & Weyde, T. ORCID: 0000-0001-8028-9905 (2017). Singing voice separation with deep U-Net convolutional networks. Paper presented at the 18th International Society for Music Information Retrieval Conference, 23-27 Oct 2017, Suzhou, China.

Abstract

The decomposition of a music audio signal into its vocal and backing track components is analogous to image-to-image translation, where a mixed spectrogram is transformed into its constituent sources. We propose a novel application of the U-Net architecture — initially developed for medical imaging — for the task of source separation, given its proven capacity for recreating the fine, low-level detail required for high-quality audio reproduction. Through both quantitative evaluation and subjective assessment, experiments demonstrate that the proposed algorithm achieves state-of-the-art performance.

Publication Type: Conference or Workshop Item (Paper)
Departments: School of Communication & Creativity > Performing Arts > Music
[thumbnail of 7bb8d1600fba70dd79408775cd0c37a4ff62.pdf]
Preview
Text - Accepted Version
Available under License Creative Commons: Attribution International Public License 4.0.

Download (2MB) | Preview

Export

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login