Non-Negative Tensor Factorization Applied to Music Genre Classification

Benetos, E. & Kotropoulos, C. (2010). Non-Negative Tensor Factorization Applied to Music Genre Classification. IEEE Transactions on Audio, Speech & Language Processing, 18(8), pp. 1955-1967. doi: 10.1109/TASL.2010.2040784

[img]
Preview
PDF
Download (224kB) | Preview

Abstract

Music genre classification techniques are typically applied to the data matrix whose columns are the feature vectors extracted from music recordings. In this paper, a feature vector is extracted using a texture window of one sec, which enables the representation of any 30 sec long music recording as a time sequence of feature vectors, thus yielding a feature matrix. Consequently, by stacking the feature matrices associated to any dataset recordings, a tensor is created, a fact which necessitates studying music genre classification using tensors. First, a novel algorithm for non-negative tensor factorization (NTF) is derived that extends the non-negative matrix factorization. Several variants of the NTF algorithm emerge by employing different cost functions from the class of Bregman divergences. Second, a novel supervised NTF classifier is proposed, which trains a basis for each class separately and employs basis orthogonalization. A variety of spectral, temporal, perceptual, energy, and pitch descriptors is extracted from 1000 recordings of the GTZAN dataset, which are distributed across 10 genre classes. The NTF classifier performance is compared against that of the multilayer perceptron and the support vector machines by applying a stratified 10-fold cross validation. A genre classification accuracy of 78.9% is reported for the NTF classifier demonstrating the superiority of the aforementioned multilinear classifier over several data matrix-based state-of-the-art classifiers.

Item Type: Article
Additional Information: © 2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
Uncontrolled Keywords: Content based retrieval, Cost function, Data mining, Feature extraction, Informatics, Multilayer perceptrons, Music information retrieval, Stacking, Support vector machines, Tensile stress
Subjects: M Music and Books on Music > M Music
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: School of Informatics > Department of Computing
URI: http://openaccess.city.ac.uk/id/eprint/2048

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics