City Research Online

A Shift-Invariant Latent Variable Model for Automatic Music Transcription

Benetos, E. & Dixon, S. (2012). A Shift-Invariant Latent Variable Model for Automatic Music Transcription. Computer Music Journal, 36(4), pp. 81-94. doi: 10.1162/comj_a_00146


In this work, a probabilistic model for multiple-instrument automatic music transcription is proposed. The model extends the shift-invariant probabilistic latent component analysis method, which is used for spectrogram factorization. Proposed extensions support the use of multiple spectral templates per pitch and per instrument source, as well as a time-varying pitch contribution for each source. Thus, this method can effectively be used for multiple-instrument automatic transcription. In addition, the shift-invariant aspect of the method can be exploited for detecting tuning changes and frequency modulations, as well as for visualizing pitch content. For note tracking and smoothing, pitch-wise hidden Markov models are used. For training, pitch templates from eight orchestral instruments were extracted, covering their complete note range. The transcription system was tested on multiple-instrument polyphonic recordings from the RWC database, a Disklavier data set, and the MIREX 2007 multi-F0 data set. Results demonstrate that the proposed method outperforms leading approaches from the transcription literature, using several error metrics.

Publication Type: Article
Subjects: M Music and Books on Music > M Music
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments: School of Science & Technology > Computer Science
SWORD Depositor:
[thumbnail of benetosdixon_cmj.pdf]
Download (690kB) | Preview


Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email


Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login