34 research outputs found

    A multiresolution approach to automated classification of protein subcellular location images

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Fluorescence microscopy is widely used to determine the subcellular location of proteins. Efforts to determine location on a proteome-wide basis create a need for automated methods to analyze the resulting images. Over the past ten years, the feasibility of using machine learning methods to recognize all major subcellular location patterns has been convincingly demonstrated, using diverse feature sets and classifiers. On a well-studied data set of 2D HeLa single-cell images, the best performance to date, 91.5%, was obtained by including a set of multiresolution features. This demonstrates the value of multiresolution approaches to this important problem.</p> <p>Results</p> <p>We report here a novel approach for the classification of subcellular location patterns by classifying in multiresolution subspaces. Our system is able to work with any feature set and any classifier. It consists of multiresolution (MR) decomposition, followed by feature computation and classification in each MR subspace, yielding local decisions that are then combined into a global decision. With 26 texture features alone and a neural network classifier, we obtained an increase in accuracy on the 2D HeLa data set to 95.3%.</p> <p>Conclusion</p> <p>We demonstrate that the space-frequency localized information in the multiresolution subspaces adds significantly to the discriminative power of the system. Moreover, we show that a vastly reduced set of features is sufficient, consisting of our novel modified Haralick texture features. Our proposed system is general, allowing for any combinations of sets of features and any combination of classifiers.</p

    Many Local Pattern Texture Features: Which Is Better for Image-Based Multilabel Human Protein Subcellular Localization Classification?

    Get PDF
    Human protein subcellular location prediction can provide critical knowledge for understanding a protein’s function. Since significant progress has been made on digital microscopy, automated image-based protein subcellular location classification is urgently needed. In this paper, we aim to investigate more representative image features that can be effectively used for dealing with the multilabel subcellular image samples. We prepared a large multilabel immunohistochemistry (IHC) image benchmark from the Human Protein Atlas database and tested the performance of different local texture features, including completed local binary pattern, local tetra pattern, and the standard local binary pattern feature. According to our experimental results from binary relevance multilabel machine learning models, the completed local binary pattern, and local tetra pattern are more discriminative for describing IHC images when compared to the traditional local binary pattern descriptor. The combination of these two novel local pattern features and the conventional global texture features is also studied. The enhanced performance of final binary relevance classification model trained on the combined feature space demonstrates that different features are complementary to each other and thus capable of improving the accuracy of classification

    HAR_Locator: a novel protein subcellular location prediction model of immunohistochemistry images based on hybrid attention modules and residual units

    Get PDF
    Introduction: Proteins located in subcellular compartments have played an indispensable role in the physiological function of eukaryotic organisms. The pattern of protein subcellular localization is conducive to understanding the mechanism and function of proteins, contributing to investigating pathological changes of cells, and providing technical support for targeted drug research on human diseases. Automated systems based on featurization or representation learning and classifier design have attracted interest in predicting the subcellular location of proteins due to a considerable rise in proteins. However, large-scale, fine-grained protein microscopic images are prone to trapping and losing feature information in the general deep learning models, and the shallow features derived from statistical methods have weak supervision abilities.Methods: In this work, a novel model called HAR_Locator was developed to predict the subcellular location of proteins by concatenating multi-view abstract features and shallow features, whose advanced advantages are summarized in the following three protocols. Firstly, to get discriminative abstract feature information on protein subcellular location, an abstract feature extractor called HARnet based on Hybrid Attention modules and Residual units was proposed to relieve gradient dispersion and focus on protein-target regions. Secondly, it not only improves the supervision ability of image information but also enhances the generalization ability of the HAR_Locator through concatenating abstract features and shallow features. Finally, a multi-category multi-classifier decision system based on an Artificial Neural Network (ANN) was introduced to obtain the final output results of samples by fitting the most representative result from five subset predictors.Results: To evaluate the model, a collection of 6,778 immunohistochemistry (IHC) images from the Human Protein Atlas (HPA) database was used to present experimental results, and the accuracy, precision, and recall evaluation indicators were significantly increased to 84.73%, 84.77%, and 84.70%, respectively, compared with baseline predictors

    SEGMENTATION AND INFORMATICS IN MULTIDIMENSIONAL FLUORESCENCE OPTICAL MICROSCOPY IMAGES

    Get PDF
    Recent advances in the field of optical microscopy have enabled scientists to observe and image complex biological processes across a wide range of spatial and temporal resolution, resulting in an exponential increase in optical microscopy data. Manual analysis of such large volumes of data is extremely time consuming and often impossible if the changes cannot be detected by the human eye. Naturally it is essential to design robust, accurate and high performance image processing and analysis tools to extract biologically significant results. Furthermore, the presentation of the results to the end-user, post analysis, is also an equally challenging issue, especially when the data (and/or the hypothesis) involves several spatial/hierarchical scales (e.g., tissues, cells, (sub)-nuclear components). This dissertation concentrates on a subset of such problems such as robust edge detection, automatic nuclear segmentation and selection in multi-dimensional tissue images, spatial analysis of gene localization within the cell nucleus, information visualization and the development of a computational framework for efficient and high-throughput processing of large datasets. Initially, we have developed 2D nuclear segmentation and selection algorithms which help in the development of an integrated approach for determining the preferential spatial localization of certain genes within the cell nuclei which is emerging as a promising technique for the diagnosis of breast cancer. Quantification requires accurate segmentation of 100 to 200 cell nuclei in each patient tissue sample in order to draw a statistically significant result. Thus, for large scale analysis involving hundreds of patients, manual processing is too time consuming and subjective. We have developed an integrated workflow that selects, following 2D automatic segmentation, a sub-population of accurately delineated nuclei for positioning of fluorescence in situ hybridization labeled genes of interest in tissue samples. Application of the method was demonstrated for discriminating normal and cancerous breast tissue sections based on the differential positioning of the HES5 gene. Automatic results agreed with manual analysis in 11 out of 14 cancers, all 4 normal cases and all 5 non-cancerous breast disease cases, thus showing the accuracy and robustness of the proposed approach. As a natural progression from the 2D analysis algorithms to 3D, we first developed a robust and accurate probabilistic edge detection method for 3D tissue samples since several down stream analysis procedures such as segmentation and tracking rely on the performance of edge detection. The method based on multiscale and multi-orientation steps surpasses several other conventional edge detectors in terms of its performance. Subsequently, given an appropriate edge measure, we developed an optimal graphcut-based 3D nuclear segmentation technique for samples where the cell nuclei are volume or surface labeled. It poses the problem as one of finding minimal closure in a directed graph and solves it efficiently using the maxflow-mincut algorithm. Both interactive and automatic versions of the algorithm are developed. The algorithm outperforms, in terms of three metrics that are commonly used to evaluate segmentation algorithms, a recently reported geodesic distance transform-based 3D nuclear segmentation method which in turns was reported to outperform several other popular tools that segment 3D nuclei in tissue samples. Finally, to apply some of the aforementioned methods to large microscopic datasets, we have developed a user friendly computing environment called MiPipeline which supports high throughput data analysis, data and process provenance, visual programming and seamlessly integrated information visualization of hierarchical biological data. The computational part of the environment is based on LONI Pipeline distributed computing server and the interactive information visualization makes use of several javascript based libraries to visualize an XML-based backbone file populated with essential meta-data and results

    Image analysis for gene expression based phenotype characterization in yeast cells

    Get PDF
      Image analysis of objects in the microscope scale requires accuracy so that measurements can be used to differentiate between groups of objects that are being studied. This thesis deals with measurements in yeast biology that are obtained through microscope images. We study the algorithms and workflow of image analysis of yeast cells in order to understand and improve the measurement accuracy. The Saccharomyces cerevisiae cell is widely used as a model organism in the life sciences. It is essential to study the gene and protein behaviour within these cells, and consequently making it possible to find treatment and solutions for genetic and hereditary diseases. This is possible since many processes that occurs at the molecular level in this organism are similar to those in human cells. In the research group Imaging and Bioinformatics, we have developed a framework for analysis of yeast cells. This framework is intended to serve as a support for research in yeast biology. The framework is integrated in one application and presented via a GUI. The application integrates modules and algorithms including segmentation, measurement, analysis and visualization.  Erasmus-Mundus, Raymond-Sackler, LSBSLIACS - OU

    Image Processing and Simulation Toolboxes of Microscopy Images of Bacterial Cells

    Get PDF
    Recent advances in microscopy imaging technology have allowed the characterization of the dynamics of cellular processes at the single-cell and single-molecule level. Particularly in bacterial cell studies, and using the E. coli as a case study, these techniques have been used to detect and track internal cell structures such as the Nucleoid and the Cell Wall and fluorescently tagged molecular aggregates such as FtsZ proteins, Min system proteins, inclusion bodies and all the different types of RNA molecules. These studies have been performed with using multi-modal, multi-process, time-lapse microscopy, producing both morphological and functional images. To facilitate the finding of relationships between cellular processes, from small-scale, such as gene expression, to large-scale, such as cell division, an image processing toolbox was implemented with several automatic and/or manual features such as, cell segmentation and tracking, intra-modal and intra-modal image registration, as well as the detection, counting and characterization of several cellular components. Two segmentation algorithms of cellular component were implemented, the first one based on the Gaussian Distribution and the second based on Thresholding and morphological structuring functions. These algorithms were used to perform the segmentation of Nucleoids and to identify the different stages of FtsZ Ring formation (allied with the use of machine learning algorithms), which allowed to understand how the temperature influences the physical properties of the Nucleoid and correlated those properties with the exclusion of protein aggregates from the center of the cell. Another study used the segmentation algorithms to study how the temperature affects the formation of the FtsZ Ring. The validation of the developed image processing methods and techniques has been based on benchmark databases manually produced and curated by experts. When dealing with thousands of cells and hundreds of images, these manually generated datasets can become the biggest cost in a research project. To expedite these studies in terms of time and lower the cost of the manual labour, an image simulation was implemented to generate realistic artificial images. The proposed image simulation toolbox can generate biologically inspired objects that mimic the spatial and temporal organization of bacterial cells and their processes, such as cell growth and division and cell motility, and cell morphology (shape, size and cluster organization). The image simulation toolbox was shown to be useful in the validation of three cell tracking algorithms: Simple Nearest-Neighbour, Nearest-Neighbour with Morphology and DBSCAN cluster identification algorithm. It was shown that the Simple Nearest-Neighbour still performed with great reliability when simulating objects with small velocities, while the other algorithms performed better for higher velocities and when there were larger clusters present

    Computational models and approaches for lung cancer diagnosis

    Full text link
    The success of treatment of patients with cancer depends on establishing an accurate diagnosis. To this end, the aim of this study is to developed novel lung cancer diagnostic models. New algorithms are proposed to analyse the biological data and extract knowledge that assists in achieving accurate diagnosis results

    Image Analysis Algorithms for Single-Cell Study in Systems Biology

    Get PDF
    With the contiguous shift of biology from a qualitative toward a quantitative field of research, digital microscopy and image-based measurements are drawing increased interest. Several methods have been developed for acquiring images of cells and intracellular organelles. Traditionally, acquired images are analyzed manually through visual inspection. The increasing volume of data is challenging the scope of manual analysis, and there is a need to develop methods for automated analysis. This thesis examines the development and application of computational methods for acquisition and analysis of images from single-cell assays. The thesis proceeds with three different aspects.First, a study evaluates several methods for focusing microscopes and proposes a novel strategy to perform focusing in time-lapse imaging. The method relies on the nature of the focus-drift and its predictability. The study shows that focus-drift is a dynamical system with a small randomness. Therefore, a prediction-based method is employed to track the focus-drift overtime. A prototype implementation of the proposed method is created by extending the Nikon EZ-C1 Version 3.30 (Tokyo, Japan) imaging platform for acquiring images with a Nikon Eclipse (TE2000-U, Nikon, Japan) microscope.Second, a novel method is formulated to segment individual cells from a dense cluster. The method incorporates multi-resolution analysis with maximum-likelihood estimation (MAMLE) for cell detection. The MAMLE performs cell segmentation in two phases. The initial phase relies on a cutting-edge filter, edge detection in multi-resolution with a morphological operator, and threshold decomposition for adaptive thresholding. It estimates morphological features from the initial results. In the next phase, the final segmentation is constructed by boosting the initial results with the estimated parameters. The MAMLE method is evaluated with de novo data sets as well as with benchmark data from public databases. An empirical evaluation of the MAMLE method confirms its accuracy.Third, a comparative study is carried out on performance evaluation of state-ofthe-art methods for the detection of subcellular organelles. This study includes eleven algorithms developed in different fields for segmentation. The evaluation procedure encompasses a broad set of samples, ranging from benchmark data to synthetic images. The result from this study suggests that there is no particular method which performs superior to others in the test samples. Next, the effect of tetracycline on transcription dynamics of tetA promoter in Escherichia coli (E. coli ) cells is studied. This study measures expressions of RNA by tagging the MS2d-GFP vector with a target gene. The RNAs are observed as intracellular spots in confocal images. The kernel density estimation (KDE) method for detecting the intracellular spots is employed to quantify the individual RNA molecules.The thesis summarizes the results from five publications. Most of the publications are associated with different methods for imaging and analysis of microscopy. Confocal images with E. coli cells are targeted as the primary area of application. However, potential applications beyond the primary target are also made evident. The findings of the research are confirmed empirically

    Deep Representation Learning with Limited Data for Biomedical Image Synthesis, Segmentation, and Detection

    Get PDF
    Biomedical imaging requires accurate expert annotation and interpretation that can aid medical staff and clinicians in automating differential diagnosis and solving underlying health conditions. With the advent of Deep learning, it has become a standard for reaching expert-level performance in non-invasive biomedical imaging tasks by training with large image datasets. However, with the need for large publicly available datasets, training a deep learning model to learn intrinsic representations becomes harder. Representation learning with limited data has introduced new learning techniques, such as Generative Adversarial Networks, Semi-supervised Learning, and Self-supervised Learning, that can be applied to various biomedical applications. For example, ophthalmologists use color funduscopy (CF) and fluorescein angiography (FA) to diagnose retinal degenerative diseases. However, fluorescein angiography requires injecting a dye, which can create adverse reactions in the patients. So, to alleviate this, a non-invasive technique needs to be developed that can translate fluorescein angiography from fundus images. Similarly, color funduscopy and optical coherence tomography (OCT) are also utilized to semantically segment the vasculature and fluid build-up in spatial and volumetric retinal imaging, which can help with the future prognosis of diseases. Although many automated techniques have been proposed for medical image segmentation, the main drawback is the model's precision in pixel-wise predictions. Another critical challenge in the biomedical imaging field is accurately segmenting and quantifying dynamic behaviors of calcium signals in cells. Calcium imaging is a widely utilized approach to studying subcellular calcium activity and cell function; however, large datasets have yielded a profound need for fast, accurate, and standardized analyses of calcium signals. For example, image sequences from calcium signals in colonic pacemaker cells ICC (Interstitial cells of Cajal) suffer from motion artifacts and high periodic and sensor noise, making it difficult to accurately segment and quantify calcium signal events. Moreover, it is time-consuming and tedious to annotate such a large volume of calcium image stacks or videos and extract their associated spatiotemporal maps. To address these problems, we propose various deep representation learning architectures that utilize limited labels and annotations to address the critical challenges in these biomedical applications. To this end, we detail our proposed semi-supervised, generative adversarial networks and transformer-based architectures for individual learning tasks such as retinal image-to-image translation, vessel and fluid segmentation from fundus and OCT images, breast micro-mass segmentation, and sub-cellular calcium events tracking from videos and spatiotemporal map quantification. We also illustrate two multi-modal multi-task learning frameworks with applications that can be extended to other domains of biomedical applications. The main idea is to incorporate each of these as individual modules to our proposed multi-modal frameworks to solve the existing challenges with 1) Fluorescein angiography synthesis, 2) Retinal vessel and fluid segmentation, 3) Breast micro-mass segmentation, and 4) Dynamic quantification of calcium imaging datasets

    Image analysis and statistical modeling for applications in cytometry and bioprocess control

    Get PDF
    Today, signal processing has a central role in many of the advancements in systems biology. Modern signal processing is required to provide efficient computational solutions to unravel complex problems that are either arduous or impossible to obtain using conventional approaches. For example, imaging-based high-throughput experiments enable cells to be examined at even subcellular level yielding huge amount of image data. Cytometry is an integral part of such experiments and involves measurement of different cell parameters which requires extraction of quantitative experimental values from cell microscopy images. In order to do that for such large number of images, fast and accurate automated image analysis methods are required. In another example, modeling of bioprocesses and their scale-up is a challenging task where different scales have different parameters and often there are more variables than the available number of observations thus requiring special methodology. In many biomedical cell microscopy studies, it is necessary to analyze the images at single cell or even subcellular level since owing to the heterogeneity of cell populations the population-averaged measurements are often inconclusive. Moreover, the emergence of imaging-based high-content screening experiments, especially for drug design, has put single cell analysis at the forefront since it is required to study the dynamics of single-cell gene expressions for tracking and quantification of cell phenotypic variations. The ability to perform single cell analysis depends on the accuracy of image segmentation in detecting individual cells from images. However, clumping of cells at both nuclei and cytoplasm level hinders accurate cell image segmentation. Part of this thesis work concentrates on developing accurate automated methods for segmentation of bright field as well as multichannel fluorescence microscopy images of cells with an emphasis on clump splitting so that cells are separated from each other as well as from background. The complexity in bioprocess development and control crave for the usage of computational modeling and data analysis approaches for process optimization and scale-up. This is also asserted by the fact that obtaining a priori knowledge needed for the development of traditional scale-up criteria may at times be difficult. Moreover, employment of efficient process modeling may provide the added advantage of automatic identification of influential control parameters. Determination of the values of the identified parameters and the ability to predict them at different scales help in process control and in achieving their scale-up. Bioprocess modeling and control can also benefit from single cell analysis where the latter could add a new dimension to the former once imaging-based in-line sensors allow for monitoring of key variables governing the processes. In this thesis we exploited signal processing techniques for statistical modeling of bioprocess and its scale-up as well as for development of fully automated methods for biomedical cell microscopy image segmentation beginning from image pre-processing and initial segmentation to clump splitting and image post-processing with the goal to facilitate the high-throughput analysis. In order to highlight the contribution of this work, we present three application case studies where we applied the developed methods to solve the problems of cell image segmentation and bioprocess modeling and scale-up
    corecore