City Research Online

Multi-Time-Scale Convolution for Emotion Recognition from Speech Audio Signals

Guizzo, E., Weyde, T. ORCID: 0000-0001-8028-9905 and Leveson, J. B. (2020). Multi-Time-Scale Convolution for Emotion Recognition from Speech Audio Signals. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), doi: 10.1109/icassp40776.2020.9053727 ISSN 2379-190X

Abstract

Robustness against temporal variations is important for emotion recognition from speech audio, since emotion is expressed through complex spectral patterns that can exhibit significant local dilation and compression on the time axis depending on speaker and context. To address this and potentially other tasks, we introduce the multi-time-scale (MTS) method to create flexibility towards temporal variations when analyzing time-frequency representations of audio data. MTS extends convolutional neural networks with convolution kernels that are scaled and re-sampled along the time axis, to increase temporal flexibility without increasing the number of trainable parameters compared to standard convolutional layers. We evaluate MTS and standard convolutional layers in different architectures for emotion recognition from speech audio, using 4 datasets of different sizes. The results show that the use of MTS layers consistently improves the generalization of networks of different capacity and depth, compared to standard convolution, especially on smaller datasets.

Publication Type: Conference or Workshop Item (UNSPECIFIED)
Additional Information: © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Publisher Keywords: Convolutional Neural Network, Scale Invariance, Speech Emotion Recognition
Subjects: P Language and Literature > P Philology. Linguistics
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments: School of Mathematics, Computer Science & Engineering > Computer Science
Date Deposited: 05 Jun 2020 15:15
URI: https://openaccess.city.ac.uk/id/eprint/24239
[img]
Preview
Text - Accepted Version
Download (133kB) | Preview

Export

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login