5 research outputs found

    Heartbeat Anomaly Detection using Adversarial Oversampling

    Full text link
    Cardiovascular diseases are one of the most common causes of death in the world. Prevention, knowledge of previous cases in the family, and early detection is the best strategy to reduce this fact. Different machine learning approaches to automatic diagnostic are being proposed to this task. As in most health problems, the imbalance between examples and classes is predominant in this problem and affects the performance of the automated solution. In this paper, we address the classification of heartbeats images in different cardiovascular diseases. We propose a two-dimensional Convolutional Neural Network for classification after using a InfoGAN architecture for generating synthetic images to unbalanced classes. We call this proposal Adversarial Oversampling and compare it with the classical oversampling methods as SMOTE, ADASYN, and RandomOversampling. The results show that the proposed approach improves the classifier performance for the minority classes without harming the performance in the balanced classes

    A Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets

    Get PDF
    In imbalanced learning methods, resampling methods modify an imbalanced dataset to form a balanced dataset. Balanced data sets perform better than imbalanced datasets for many base classifiers. This paper proposes a cost-sensitive ensemble method based on cost-sensitive support vector machine (SVM), and query-by-committee (QBC) to solve imbalanced data classification. The proposed method first divides the majority-class dataset into several subdatasets according to the proportion of imbalanced samples and trains subclassifiers using AdaBoost method. Then, the proposed method generates candidate training samples by QBC active learning method and uses cost-sensitive SVM to learn the training samples. By using 5 class-imbalanced datasets, experimental results show that the proposed method has higher area under ROC curve (AUC), F-measure, and G-mean than many existing class-imbalanced learning methods

    A Survey of Methods for Handling Disk Data Imbalance

    Full text link
    Class imbalance exists in many classification problems, and since the data is designed for accuracy, imbalance in data classes can lead to classification challenges with a few classes having higher misclassification costs. The Backblaze dataset, a widely used dataset related to hard discs, has a small amount of failure data and a large amount of health data, which exhibits a serious class imbalance. This paper provides a comprehensive overview of research in the field of imbalanced data classification. The discussion is organized into three main aspects: data-level methods, algorithmic-level methods, and hybrid methods. For each type of method, we summarize and analyze the existing problems, algorithmic ideas, strengths, and weaknesses. Additionally, the challenges of unbalanced data classification are discussed, along with strategies to address them. It is convenient for researchers to choose the appropriate method according to their needs

    A Web Cache Replacement Strategy for Safety-Critical Systems

    Get PDF
    A Safety-Critical System (SCS), such as a spacecraft, is usually a complex system. It produces a large amount of test data during a comprehensive testing process. The large amount of data is often managed by a comprehensive test data query system. The primary factor affecting the management experience of a comprehensive test data query system is the performance of querying the test data. It is a big challenge to manage and maintain the huge and complex testing data.To address this challenge, a web cache replacement algorithm which can effectively improve the query performance and reduce the network latency is needed. However, a general-purpose web cache replacement algorithm usually cannot be directly applied to this type of system due to the low hit rate and low byte hit rate. In order to improve the hit rate and byte hit rate, a data stream mining technology is introduced, and a new web cache algorithm GDSF-DST (Greedy Dual-Size Frequency with Data Stream Technology) for the Safety-Critical System (SCS) is proposed based on the original GDSF algorithm. The experimental results show that compared with state of the art traditional algorithms, GDSF-DST achieves competitive performance and improves the hit rate and byte hit rate by about 20%

    Developing Algorithms for Quantifying the Super Resolution Microscopic Data: Applications to the Quantification of Protein-Reorganization in Bacteria Responding to Treatment by Silver Ions

    Get PDF
    Histone-like nucleoid structuring proteins (HNS) play significant roles in shaping the chromosomal DNA, regulation of transcriptional networks in microbes, as well as bacterial responses to environmental changes such as temperature fluctuations. In this work, the intracellular organization of HNS proteins in E. coli bacteria was investigated utilizing super-resolution fluorescence microscopy, which surpasses conventional microscopy by 10–20 fold in spatial resolution. More importantly, the changes of the spatial distribution of HNS proteins in E. coli, by addition of silver ions into the growth medium were explored. To quantify the spatial distribution of HNS in bacteria and its changes, an automatic method based on Voronoi diagram was implemented. The HNS proteins localized in super-resolution fluorescence microscopy were segmented and clustered based on several quantitative parameters, such as molecular areas, molecular densities, and mean inter-molecular distances of the k-th rank, all of which were computed from the Voronoi diagrams. These parameters, as well as the associated clustering analysis, allowed us to quantify how the spatial organization of HNS proteins responds to silver, and provided insight into understanding how microbes adapt to new environments
    corecore