City Research Online

Quaternion Anti-Transfer Learning for Speech Emotion Recognition

Guizzo, E., Weyde, T. ORCID: 0000-0001-8028-9905, Tarroni, G. ORCID: 0000-0002-0341-6138 & Comminiello, D. (2023). Quaternion Anti-Transfer Learning for Speech Emotion Recognition. In: 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 22-25 Oct 2023, New Paltz, NY, USA. doi: 10.1109/WASPAA58266.2023.10248082

Abstract

This study explores the benefits of anti-transfer learning with quaternion neural networks for robust, effective, and efficient speech emotion recognition. Anti-transfer learning selectively promotes task invariance through the introduction of a deep feature loss at training time. It has been shown to improve the performance of speech emotion recognition models by encouraging the independence of emotion predictions from specific uttered words and characteristics of the speaker’s voice. However, the improved accuracy comes at a cost of increased computation time and memory requirements. In order to reduce the resource demand of anti-transfer, we propose to exploit quaternion-valued processing. We design, implement, and evaluate the use of quaternion anti-transfer learning on the basis of the VGG16 architecture and quaternion embeddings on multiple datasets for different speech emotion recognition task setups. The effectiveness of this approach depends on the layer where it is applied, with early layers offering a good compromise between performance gain and resource requirements. Our results show that anti-transfer in the quaternion domain can enhance generalisation while reducing the model’s demand for computation and memory.

Publication Type: Conference or Workshop Item (Paper)
Additional Information: © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Publisher Keywords: Training, Emotion recognition, Quaternions, Computational modeling, Memory management, Speech recognition, Signal processing
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments: School of Science & Technology > Computer Science
[thumbnail of WASPAA_paper_REVISED.pdf]
Preview
Text - Accepted Version
Download (272kB) | Preview

Export

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login