5 research outputs found
Heartbeat Anomaly Detection using Adversarial Oversampling
Cardiovascular diseases are one of the most common causes of death in the
world. Prevention, knowledge of previous cases in the family, and early
detection is the best strategy to reduce this fact. Different machine learning
approaches to automatic diagnostic are being proposed to this task. As in most
health problems, the imbalance between examples and classes is predominant in
this problem and affects the performance of the automated solution. In this
paper, we address the classification of heartbeats images in different
cardiovascular diseases. We propose a two-dimensional Convolutional Neural
Network for classification after using a InfoGAN architecture for generating
synthetic images to unbalanced classes. We call this proposal Adversarial
Oversampling and compare it with the classical oversampling methods as SMOTE,
ADASYN, and RandomOversampling. The results show that the proposed approach
improves the classifier performance for the minority classes without harming
the performance in the balanced classes
A Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets
In imbalanced learning methods, resampling methods modify an imbalanced dataset to form a balanced dataset. Balanced data sets perform better than imbalanced datasets for many base classifiers. This paper proposes a cost-sensitive ensemble method based on cost-sensitive support vector machine (SVM), and query-by-committee (QBC) to solve imbalanced data classification. The proposed method first divides the majority-class dataset into several subdatasets according to the proportion of imbalanced samples and trains subclassifiers using AdaBoost method. Then, the proposed method generates candidate training samples by QBC active learning method and uses cost-sensitive SVM to learn the training samples. By using 5 class-imbalanced datasets, experimental results show that the proposed method has higher area under ROC curve (AUC), F-measure, and G-mean than many existing class-imbalanced learning methods
A Survey of Methods for Handling Disk Data Imbalance
Class imbalance exists in many classification problems, and since the data is
designed for accuracy, imbalance in data classes can lead to classification
challenges with a few classes having higher misclassification costs. The
Backblaze dataset, a widely used dataset related to hard discs, has a small
amount of failure data and a large amount of health data, which exhibits a
serious class imbalance. This paper provides a comprehensive overview of
research in the field of imbalanced data classification. The discussion is
organized into three main aspects: data-level methods, algorithmic-level
methods, and hybrid methods. For each type of method, we summarize and analyze
the existing problems, algorithmic ideas, strengths, and weaknesses.
Additionally, the challenges of unbalanced data classification are discussed,
along with strategies to address them. It is convenient for researchers to
choose the appropriate method according to their needs
A Web Cache Replacement Strategy for Safety-Critical Systems
A Safety-Critical System (SCS), such as a spacecraft, is usually a complex system. It produces a large amount of test data during a comprehensive testing process. The large amount of data is often managed by a comprehensive test data query system. The primary factor affecting the management experience of a comprehensive test data query system is the performance of querying the test data. It is a big challenge to manage and maintain the huge and complex testing data.To address this challenge, a web cache replacement algorithm which can effectively improve the query performance and reduce the network latency is needed. However, a general-purpose web cache replacement algorithm usually cannot be directly applied to this type of system due to the low hit rate and low byte hit rate. In order to improve the hit rate and byte hit rate, a data stream mining technology is introduced, and a new web cache algorithm GDSF-DST (Greedy Dual-Size Frequency with Data Stream Technology) for the Safety-Critical System (SCS) is proposed based on the original GDSF algorithm. The experimental results show that compared with state of the art traditional algorithms, GDSF-DST achieves competitive performance and improves the hit rate and byte hit rate by about 20%
Developing Algorithms for Quantifying the Super Resolution Microscopic Data: Applications to the Quantification of Protein-Reorganization in Bacteria Responding to Treatment by Silver Ions
Histone-like nucleoid structuring proteins (HNS) play significant roles in shaping the chromosomal DNA, regulation of transcriptional networks in microbes, as well as bacterial responses to environmental changes such as temperature fluctuations. In this work, the intracellular organization of HNS proteins in E. coli bacteria was investigated utilizing super-resolution fluorescence microscopy, which surpasses conventional microscopy by 10–20 fold in spatial resolution. More importantly, the changes of the spatial distribution of HNS proteins in E. coli, by addition of silver ions into the growth medium were explored. To quantify the spatial distribution of HNS in bacteria and its changes, an automatic method based on Voronoi diagram was implemented. The HNS proteins localized in super-resolution fluorescence microscopy were segmented and clustered based on several quantitative parameters, such as molecular areas, molecular densities, and mean inter-molecular distances of the k-th rank, all of which were computed from the Voronoi diagrams. These parameters, as well as the associated clustering analysis, allowed us to quantify how the spatial organization of HNS proteins responds to silver, and provided insight into understanding how microbes adapt to new environments