City Research Online

Modelling metrical flux: an adaptive frequency neural network for expressive rhythmic perception and prediction

Elmsley (né Lambert), A. (2017). Modelling metrical flux: an adaptive frequency neural network for expressive rhythmic perception and prediction. (Unpublished Doctoral thesis, City, University of London)


Beat induction is the perceptual and cognitive process by which humans listen to music and perceive a steady pulse. Computationally modelling beat induction is important for many Music Information Retrieval (MIR) methods and is in general an open problem, especially when processing expressive timing, e.g. tempo changes or rubato.

A neuro-cognitive model has been proposed, the Gradient Frequency Neural Network (GFNN), which can model the perception of pulse and metre. GFNNs have been applied successfully to a range of ‘difficult’ music perception problems such as polyrhythms and syncopation.

This thesis explores the use of GFNNs for expressive rhythm perception and modelling, addressing the current gap in knowledge for how to deal with varying tempo and expressive timing in automated and interactive music systems. The cannonical oscillators contained in a GFNN have entrainment properties, allowing phase shifts and resulting in changes to the observed frequencies. This makes them good candidates for solving the expressive timing problem.

It is found that modelling a metrical perception with GFNNs can improve a machine learning music model. However, it is also discovered that GFNNs perform poorly when dealing with tempo changes in the stimulus.

Therefore, a novel Adaptive Frequency Neural Network (AFNN) is introduced; extending the GFNN with a Hebbian learning rule on oscillator frequencies. Two new adaptive behaviours (attraction and elasticity) increase entrainment in the oscillators, and increase the computational efficiency of the model by allowing for a great reduction in the size of the network.

The AFNN is evaluated over a series of experiments on sets of symbolic and audio rhythms both from the literature and created specifically for this research. Where previous work with GFNNs has focused on frequency and amplitude responses, this thesis considers phase information as critical for pulse perception. Evaluating the time-based output, it was found that AFNNs behave differently to GFNNs: responses to symbolic stimuli with both steady and varying pulses are significantly improved, and on audio data the AFNNs performance matches the GFNN, despite its lower density.

The thesis argues that AFNNs could replace the linear filtering methods commonly used in beat tracking and tempo estimation systems, and lead to more accurate methods.

Publication Type: Thesis (Doctoral)
Subjects: M Music and Books on Music > M Music
Q Science > QA Mathematics > QA76 Computer software
Departments: Doctoral Theses
School of Science & Technology > School of Science & Technology Doctoral Theses
School of Science & Technology > Computer Science
[thumbnail of Elmsley, Andrew.pdf]
Text - Accepted Version
Download (2MB) | Preview


Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email


Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login