City Research Online - Quaternion Anti-Transfer Learning for Speech Emotion Recognition

Quaternion Anti-Transfer Learning for Speech Emotion Recognition

Guizzo, E., Weyde, T. ORCID: 0000-0001-8028-9905, Tarroni, G. ORCID: 0000-0002-0341-6138 & Comminiello, D. (2023). Quaternion Anti-Transfer Learning for Speech Emotion Recognition. In: 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 22-25 Oct 2023, New Paltz, NY, USA. doi: 10.1109/WASPAA58266.2023.10248082

Abstract

This study explores the benefits of anti-transfer learning with quaternion neural networks for robust, effective, and efficient speech emotion recognition. Anti-transfer learning selectively promotes task invariance through the introduction of a deep feature loss at training time. It has been shown to improve the performance of speech emotion recognition models by encouraging the independence of emotion predictions from specific uttered words and characteristics of the speaker’s voice. However, the improved accuracy comes at a cost of increased computation time and memory requirements. In order to reduce the resource demand of anti-transfer, we propose to exploit quaternion-valued processing. We design, implement, and evaluate the use of quaternion anti-transfer learning on the basis of the VGG16 architecture and quaternion embeddings on multiple datasets for different speech emotion recognition task setups. The effectiveness of this approach depends on the layer where it is applied, with early layers offering a good compromise between performance gain and resource requirements. Our results show that anti-transfer in the quaternion domain can enhance generalisation while reducing the model’s demand for computation and memory.

Publication Type:	Conference or Workshop Item (Paper)
Additional Information:	© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Publisher Keywords:	Training, Emotion recognition, Quaternions, Computational modeling, Memory management, Speech recognition, Signal processing
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments:	School of Science & Technology > Computer Science

Preview

Text - Accepted Version
Download (272kB) | Preview

Export

Downloads

Downloads per month over past year

View more statistics

Metadata

Altmetric

Funder Information

CORE (COnnecting REpositories)

Actions (login required)

Admin Login

Creators:	Guizzo, E. Weyde, T. ORCID: 0000-0001-8028-9905 Tarroni, G. ORCID: 0000-0002-0341-6138 Comminiello, D.
Event Title:	2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
Event Type:	Workshop
Event Location:	New Paltz, NY, USA
Event Dates:	22-25 Oct 2023
Status:	Published
Refereed:	Yes
Journal or Publication Title:	2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
Publisher:	IEEE
ISSN:	1931-1168
URI:	https://openaccess.city.ac.uk/id/eprint/31680
Date available in CRO:	07 Nov 2023 11:17
Date deposited:	7 November 2023
Dates:	Date Event 13 July 2023 Accepted 15 September 2023 Published Online