10 research outputs found

    Resampling Algorithms for Multi-Label Classification

    Get PDF
    Statistics and Actuarial Scienc

    Hybrid multi-label classification model for medical applications based on adaptive synthetic data and ensemble learning

    Get PDF
    In recent years, both machine learning and computer vision have seen growth in the use of multi-label categorization. SMOTE is now being utilized in existing research for data balance, and SMOTE does not consider that nearby examples may be from different classes when producing synthetic samples. As a result, there can be more class overlap and more noise. To avoid this problem, this work presented an innovative technique called Adaptive Synthetic Data-Based Multi-label Classification (ASDMLC). Adaptive Synthetic (ADASYN) sampling is a sampling strategy for learning from unbalanced data sets. ADASYN weights minority class instances by learning difficulty. For hard-to-learn minority class cases, synthetic data are created. Their numerical variables are normalized with the help of the Min-Max technique to standardize the magnitude of each variable’s impact on the outcomes. The values of the attribute in this work are changed to a new range, from 0 to 1, using the normalization approach. To raise the accuracy of multi-label classification, Velocity-Equalized Particle Swarm Optimization (VPSO) is utilized for feature selection. In the proposed approach, to overcome the premature convergence problem, standard PSO has been improved by equalizing the velocity with each dimension of the problem. To expose the inherent label dependencies, the multi-label classification ensemble of Adaptive Neuro-Fuzzy Inference System (ANFIS), Probabilistic Neural Network (PNN), and Clustering-Based Decision tree methods will be processed based on an averaging method. The following criteria, including precision, recall, accuracy, and error rate, are used to assess performance. The suggested model’s multi-label classification accuracy is 90.88%, better than previous techniques, which is PCT, HOMER, and ML-Forest is 65.57%, 70.66%, and 82.29%, respectively

    Analysis of Cellular and Subcellular Morphology using Machine Learning in Microscopy Images

    Full text link
    Human cells undergo various morphological changes due to progression in the cell-cycle or environmental factors. Classification of these morphological states is vital for effective clinical decisions. Automated classification systems based on machine learning models are data-driven and efficient and help to avoid subjective outcomes. However, the efficacy of these models is highly dependent on the feature description along with the amount and nature of the training data. This thesis presents three studies of automated image-based classification of cellular and subcellular morphologies. The first study presents 3D Sorted Random Projections (SRP) which includes the proposed approach to compute 3D plane information for texture description of 3D nuclear images. The proposed 3D SRP is used to classify nuclear morphology and measure changes in heterochromatin, which in turn helps to characterise cellular states. Classification performance evaluated on 3D images of the human fibroblast and prostate cancer cell lines shows that 3D SRP provides better classification than other feature descriptors. The second study is on imbalanced multiclass and single-label classification of blood cell images. The scarcity of minority sam ples causes a drop in classification performance on minority classes. This study proposes oversampling of minority samples us ing data augmentation approaches, namely mixup, WGAN-div and novel nonlinear mixup, along with a minority class focussed sampling strategy. Classification performance evaluated using F1-score shows that the proposed deep learning framework out performs state-of-the art approaches on publicly available images of human T-lymphocyte cells and red blood cells. The third study is on protein subcellular localisation, which is an imbalanced multiclass and multilabel classification problem. In order to handle data imbalance, this study proposes an oversampling method which includes synthetic images constructed using nonlinear mixup and geometric/colour transformations. The regularisation capability of nonlinear mixup is further improved for protein images. In addition, an imbalance aware sampling strategy is proposed to identify minority and medium classes in the dataset and include them during training. Classification performance evaluated on the Human Protein Atlas Kaggle challenge dataset using F1-score shows that the proposed deep learning framework achieves better predictions than existing methods

    Multi-label Rule Learning

    Get PDF
    Research on multi-label classification is concerned with developing and evaluating algorithms that learn a predictive model for the automatic assignment of data points to a subset of predefined class labels. This is in contrast to traditional classification settings, where individual data points cannot be assigned to more than a single class. As many practical use cases demand a flexible categorization of data, where classes must not necessarily be mutually exclusive, multi-label classification has become an established topic of machine learning research. Nowadays, it is used for the assignment of keywords to text documents, the annotation of multimedia files, such as images, videos, or audio recordings, as well as for diverse applications in biology, chemistry, social network analysis, or marketing. During the past decade, increasing interest in the topic has resulted in a wide variety of different multi-label classification methods. Following the principles of supervised learning, they derive a model from labeled training data, which can afterward be used to obtain predictions for yet unseen data. Besides complex statistical methods, such as artificial neural networks, symbolic learning approaches have not only been shown to provide state-of-the-art performance in many applications but are also a common choice in safety-critical domains that demand human-interpretable and verifiable machine learning models. In particular, rule learning algorithms have a long history of active research in the scientific community. They are often argued to meet the requirements of interpretable machine learning due to the human-legible representation of learned knowledge in terms of logical statements. This work presents a modular framework for implementing multi-label rule learning methods. It does not only provide a unified view of existing rule-based approaches to multi-label classification, but also facilitates the development of new learning algorithms. Two novel instantiations of the framework are investigated to demonstrate its flexibility. Whereas the first one relies on traditional rule learning techniques and focuses on interpretability, the second one is based on a generalization of the gradient boosting framework and focuses on predictive performance rather than the simplicity of models. Motivated by the increasing demand for highly scalable learning algorithms that are capable of processing large amounts of training data, this work also includes an extensive discussion of algorithmic optimizations and approximation techniques for the efficient induction of rules. As the novel multi-label classification methods that are presented in this work can be viewed as instantiations of the same framework, they can both benefit from most of these principles. Their effectiveness and efficiency are compared to existing baselines experimentally
    corecore