City Research Online

Structuring lute tablature and MIDI data: Machine learning models for voice separation in symbolic music representations

de Valk, R. (2015). Structuring lute tablature and MIDI data: Machine learning models for voice separation in symbolic music representations. (Unpublished Doctoral thesis, City, University of London)


This thesis concerns the design, development, and implementation of machine learning models for voice separation in two forms of symbolic music representations: lute tablature and MIDI data. Three modelling approaches are described: MA1, a note-level classification approach using a neural network, MA2, a chord-level regression approach using a neural network, and MA3, a chord-level probabilistic approach using a hidden Markov model. Furthermore, three model extensions are presented: backward processing, modelling voice and duration simultaneously, and multi-pass processing using an extended (bidirectional) decision context.

Two datasets are created for model evaluation: a tablature dataset, containing a total of 15 three-voice and four-voice intabulations (lute arrangements of polyphonic vocal works) in a custom-made tablature encoding format, tab+, as well as in MIDI format, and a Bach dataset, containing the 45 three-voice and four-voice fugues from Johann Sebastian Bach’s _Das wohltemperirte Clavier_ (BWV 846-893) in MIDI format. The datasets are made available publicly, as is the software used to implement the models and the framework for training and evaluating them.

The models are evaluated on the datasets in four experiments. The first experiment, where the different modelling approaches are compared, shows that MA1 is the most effective and efficient approach. The second experiment shows that the features are effective, and it demonstrates the importance of the type and amount of context information that is encoded in the feature vectors. The third experiment, which concerns model extension, shows that modelling backward and modelling voice and duration simultaneously do not lead to the hypothesised increase in model performance, but that using a multi-pass bidirectional model does. In the last experiment, where the performance of the models is compared with that of existing state-of-the-art systems for voice separation, it is shown that the models described in this thesis can compete with these systems.

Publication Type: Thesis (Doctoral)
Subjects: M Music and Books on Music > M Music
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments: Doctoral Theses
School of Science & Technology > School of Science & Technology Doctoral Theses
School of Science & Technology > Computer Science
[thumbnail of de Valk, Reinier.pdf]
Text - Accepted Version
Download (3MB) | Preview


Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email


Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login