Analysis of Cellular and Subcellular Morphology using Machine Learning in Microscopy Images

Abstract

Human cells undergo various morphological changes due to progression in the cell-cycle or environmental factors. Classification of these morphological states is vital for effective clinical decisions. Automated classification systems based on machine learning models are data-driven and efficient and help to avoid subjective outcomes. However, the efficacy of these models is highly dependent on the feature description along with the amount and nature of the training data. This thesis presents three studies of automated image-based classification of cellular and subcellular morphologies. The first study presents 3D Sorted Random Projections (SRP) which includes the proposed approach to compute 3D plane information for texture description of 3D nuclear images. The proposed 3D SRP is used to classify nuclear morphology and measure changes in heterochromatin, which in turn helps to characterise cellular states. Classification performance evaluated on 3D images of the human fibroblast and prostate cancer cell lines shows that 3D SRP provides better classification than other feature descriptors. The second study is on imbalanced multiclass and single-label classification of blood cell images. The scarcity of minority sam ples causes a drop in classification performance on minority classes. This study proposes oversampling of minority samples us ing data augmentation approaches, namely mixup, WGAN-div and novel nonlinear mixup, along with a minority class focussed sampling strategy. Classification performance evaluated using F1-score shows that the proposed deep learning framework out performs state-of-the art approaches on publicly available images of human T-lymphocyte cells and red blood cells. The third study is on protein subcellular localisation, which is an imbalanced multiclass and multilabel classification problem. In order to handle data imbalance, this study proposes an oversampling method which includes synthetic images constructed using nonlinear mixup and geometric/colour transformations. The regularisation capability of nonlinear mixup is further improved for protein images. In addition, an imbalance aware sampling strategy is proposed to identify minority and medium classes in the dataset and include them during training. Classification performance evaluated on the Human Protein Atlas Kaggle challenge dataset using F1-score shows that the proposed deep learning framework achieves better predictions than existing methods

    Similar works

    Full text

    thumbnail-image

    Available Versions