City Research Online

Adjusting the imbalance ratio by the dimensionality of imbalanced data

Zhu, R. ORCID: 0000-0002-9944-0369, Guo, Y. and Xue, J-H. (2020). Adjusting the imbalance ratio by the dimensionality of imbalanced data. Pattern Recognition Letters, doi: 10.1016/j.patrec.2020.03.004

Abstract

Class-imbalance extent metrics measure how imbalanced the data are. In pattern classification, it is usually expected that the higher the imbalance extent, the worse the classification performance, and thus an appropriate imbalance extent metric should show a negative correlation with the classification performance. Existing metrics, such as the popular imbalance ratio (IR), only consider the effect of the sample sizes of different classes. However, we note that the dimensionality of imbalanced data also affects the classification performance. Datasets with the same IR can present distinct classification performances when their dimensionalities are different, making IR suboptimal to reflect the imbalance extent for classification. We also observe that the classification performance becomes better with more discriminative features. Inspired by these observations, we propose a new imbalance extent metric, the adjusted IR, by adding a penalty term of the number of discriminative features that is effectively determined by the Pearson correlation test. The adjusted IR adaptively revises the IR when the number of discriminative features varies. The empirical studies demonstrate the effectiveness of the adjusted IR, in terms of its better negative correlation with the classification performance.

Publication Type: Article
Additional Information: © 2020. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Publisher Keywords: Imbalanced data, imbalance extent, imbalanced learning, imbalance ratio, Pearson correlation test
Subjects: H Social Sciences > HF Commerce > HF5601 Accounting
Q Science > QA Mathematics
Departments: Business School > Actuarial Science & Insurance
Date Deposited: 03 Mar 2020 14:07
URI: https://openaccess.city.ac.uk/id/eprint/23834
[img] Text - Accepted Version
This document is not freely accessible due to copyright restrictions.
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Export

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login