A Survey of Methods for Handling Disk Data Imbalance

Chen, Yuehui; Li, Qiang; Wu, Peng; Yuan, Shuangshuang

A Survey of Methods for Handling Disk Data Imbalance

Authors: Yuehui Chen
Qiang Li
Peng Wu
Shuangshuang Yuan
Publication date: 13 October 2023
Publisher

Abstract

Class imbalance exists in many classification problems, and since the data is designed for accuracy, imbalance in data classes can lead to classification challenges with a few classes having higher misclassification costs. The Backblaze dataset, a widely used dataset related to hard discs, has a small amount of failure data and a large amount of health data, which exhibits a serious class imbalance. This paper provides a comprehensive overview of research in the field of imbalanced data classification. The discussion is organized into three main aspects: data-level methods, algorithmic-level methods, and hybrid methods. For each type of method, we summarize and analyze the existing problems, algorithmic ideas, strengths, and weaknesses. Additionally, the challenges of unbalanced data classification are discussed, along with strategies to address them. It is convenient for researchers to choose the appropriate method according to their needs

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.08867

Last time updated on 04/01/2024