5,324 research outputs found
Accurate detection of dysmorphic nuclei using dynamic programming and supervised classification
A vast array of pathologies is typified by the presence of nuclei with an abnormal morphology. Dysmorphic nuclear phenotypes feature dramatic size changes or foldings, but also entail much subtler deviations such as nuclear protrusions called blebs. Due to their unpredictable size, shape and intensity, dysmorphic nuclei are often not accurately detected in standard image analysis routines. To enable accurate detection of dysmorphic nuclei in confocal and widefield fluorescence microscopy images, we have developed an automated segmentation algorithm, called Blebbed Nuclei Detector (BleND), which relies on two-pass thresholding for initial nuclear contour detection, and an optimal path finding algorithm, based on dynamic programming, for refining these contours. Using a robust error metric, we show that our method matches manual segmentation in terms of precision and outperforms state-of-the-art nuclear segmentation methods. Its high performance allowed for building and integrating a robust classifier that recognizes dysmorphic nuclei with an accuracy above 95%. The combined segmentation-classification routine is bound to facilitate nucleus-based diagnostics and enable real-time recognition of dysmorphic nuclei in intelligent microscopy workflows
Tailored for Real-World: A Whole Slide Image Classification System Validated on Uncurated Multi-Site Data Emulating the Prospective Pathology Workload.
Standard of care diagnostic procedure for suspected skin cancer is microscopic examination of hematoxylin & eosin stained tissue by a pathologist. Areas of high inter-pathologist discordance and rising biopsy rates necessitate higher efficiency and diagnostic reproducibility. We present and validate a deep learning system which classifies digitized dermatopathology slides into 4 categories. The system is developed using 5,070 images from a single lab, and tested on an uncurated set of 13,537 images from 3 test labs, using whole slide scanners manufactured by 3 different vendors. The system\u27s use of deep-learning-based confidence scoring as a criterion to consider the result as accurate yields an accuracy of up to 98%, and makes it adoptable in a real-world setting. Without confidence scoring, the system achieved an accuracy of 78%. We anticipate that our deep learning system will serve as a foundation enabling faster diagnosis of skin cancer, identification of cases for specialist review, and targeted diagnostic classifications
Predicting Genetic Regulatory Response Using Classification
We present a novel classification-based method for learning to predict gene
regulatory response. Our approach is motivated by the hypothesis that in simple
organisms such as Saccharomyces cerevisiae, we can learn a decision rule for
predicting whether a gene is up- or down-regulated in a particular experiment
based on (1) the presence of binding site subsequences (``motifs'') in the
gene's regulatory region and (2) the expression levels of regulators such as
transcription factors in the experiment (``parents''). Thus our learning task
integrates two qualitatively different data sources: genome-wide cDNA
microarray data across multiple perturbation and mutant experiments along with
motif profile data from regulatory sequences. We convert the regression task of
predicting real-valued gene expression measurement to a classification task of
predicting +1 and -1 labels, corresponding to up- and down-regulation beyond
the levels of biological and measurement noise in microarray measurements. The
learning algorithm employed is boosting with a margin-based generalization of
decision trees, alternating decision trees. This large-margin classifier is
sufficiently flexible to allow complex logical functions, yet sufficiently
simple to give insight into the combinatorial mechanisms of gene regulation. We
observe encouraging prediction accuracy on experiments based on the Gasch S.
cerevisiae dataset, and we show that we can accurately predict up- and
down-regulation on held-out experiments. Our method thus provides predictive
hypotheses, suggests biological experiments, and provides interpretable insight
into the structure of genetic regulatory networks.Comment: 8 pages, 4 figures, presented at Twelfth International Conference on
Intelligent Systems for Molecular Biology (ISMB 2004), supplemental website:
http://www.cs.columbia.edu/compbio/geneclas
Recommended from our members
Validation of machine learning models to detect amyloid pathologies across institutions.
Semi-quantitative scoring schemes like the Consortium to Establish a Registry for Alzheimer's Disease (CERAD) are the most commonly used method in Alzheimer's disease (AD) neuropathology practice. Computational approaches based on machine learning have recently generated quantitative scores for whole slide images (WSIs) that are highly correlated with human derived semi-quantitative scores, such as those of CERAD, for Alzheimer's disease pathology. However, the robustness of such models have yet to be tested in different cohorts. To validate previously published machine learning algorithms using convolutional neural networks (CNNs) and determine if pathological heterogeneity may alter algorithm derived measures, 40 cases from the Goizueta Emory Alzheimer's Disease Center brain bank displaying an array of pathological diagnoses (including AD with and without Lewy body disease (LBD), and / or TDP-43-positive inclusions) and levels of Aβ pathologies were evaluated. Furthermore, to provide deeper phenotyping, amyloid burden in gray matter vs whole tissue were compared, and quantitative CNN scores for both correlated significantly to CERAD-like scores. Quantitative scores also show clear stratification based on AD pathologies with or without additional diagnoses (including LBD and TDP-43 inclusions) vs cases with no significant neurodegeneration (control cases) as well as NIA Reagan scoring criteria. Specifically, the concomitant diagnosis group of AD + TDP-43 showed significantly greater CNN-score for cored plaques than the AD group. Finally, we report that whole tissue computational scores correlate better with CERAD-like categories than focusing on computational scores from a field of view with densest pathology, which is the standard of practice in neuropathological assessment per CERAD guidelines. Together these findings validate and expand CNN models to be robust to cohort variations and provide additional proof-of-concept for future studies to incorporate machine learning algorithms into neuropathological practice
Assessment of algorithms for mitosis detection in breast cancer histopathology images
The proliferative activity of breast tumors, which is routinely estimated by counting of mitotic figures in hematoxylin and eosin stained histology sections, is considered to be one of the most important prognostic markers. However, mitosis counting is laborious, subjective and may suffer from low inter-observer agreement. With the wider acceptance of whole slide images in pathology labs, automatic image analysis has been proposed as a potential solution for these issues.
In this paper, the results from the Assessment of Mitosis Detection Algorithms 2013 (AMIDA13) challenge are described. The challenge was based on a data set consisting of 12 training and 11 testing subjects, with more than one thousand annotated mitotic figures by multiple observers. Short descriptions and results from the evaluation of eleven methods are presented. The top performing method has an error rate that is comparable to the inter-observer agreement among pathologists
Consensus and meta-analysis regulatory networks for combining multiple microarray gene expression datasets
Microarray data is a key source of experimental data for modelling gene regulatory interactions from expression levels. With the rapid increase of publicly available microarray data comes the opportunity to produce regulatory network models based on multiple datasets. Such models are potentially more robust with greater confidence, and place less reliance on a single dataset. However, combining datasets directly can be difficult as experiments are often conducted on different microarray platforms, and in different laboratories leading to inherent biases in the data that are not always removed through pre-processing such as normalisation. In this paper we compare two frameworks for combining microarray datasets to model regulatory networks: pre- and post-learning aggregation. In pre-learning approaches, such as using simple scale-normalisation prior to the concatenation of datasets, a model is learnt from a combined dataset, whilst in post-learning aggregation individual models are learnt from each dataset and the models are combined. We present two novel approaches for post-learning aggregation, each based on aggregating high-level features of Bayesian network models that have been generated from different microarray expression datasets. Meta-analysis Bayesian networks are based on combining statistical confidences attached to network edges whilst Consensus Bayesian networks identify consistent network features across all datasets. We apply both approaches to multiple datasets from synthetic and real (Escherichia coli and yeast) networks and demonstrate that both methods can improve on networks learnt from a single dataset or an aggregated dataset formed using a standard scale-normalisation
Functional Bipartite Ranking: a Wavelet-Based Filtering Approach
It is the main goal of this article to address the bipartite ranking issue
from the perspective of functional data analysis (FDA). Given a training set of
independent realizations of a (possibly sampled) second-order random function
with a (locally) smooth autocorrelation structure and to which a binary label
is randomly assigned, the objective is to learn a scoring function s with
optimal ROC curve. Based on linear/nonlinear wavelet-based approximations, it
is shown how to select compact finite dimensional representations of the input
curves adaptively, in order to build accurate ranking rules, using recent
advances in the ranking problem for multivariate data with binary feedback.
Beyond theoretical considerations, the performance of the learning methods for
functional bipartite ranking proposed in this paper are illustrated by
numerical experiments
FindFoci: a focus detection algorithm with automated parameter training that closely matches human assignments, reduces human inconsistencies and increases speed of analysis
Accurate and reproducible quantification of the accumulation of proteins into foci in cells is essential for data interpretation and for biological inferences. To improve reproducibility, much emphasis has been placed on the preparation of samples, but less attention has been given to reporting and standardizing the quantification of foci. The current standard to quantitate foci in open-source software is to manually determine a range of parameters based on the outcome of one or a few representative images and then apply the parameter combination to the analysis of a larger dataset. Here, we demonstrate the power and utility of using machine learning to train a new algorithm (FindFoci) to determine optimal parameters. FindFoci closely matches human assignments and allows rapid automated exploration of parameter space. Thus, individuals can train the algorithm to mirror their own assignments and then automate focus counting using the same parameters across a large number of images. Using the training algorithm to match human assignments of foci, we demonstrate that applying an optimal parameter combination from a single image is not broadly applicable to analysis of other images scored by the same experimenter or by other experimenters. Our analysis thus reveals wide variation in human assignment of foci and their quantification. To overcome this, we developed training on multiple images, which reduces the inconsistency of using a single or a few images to set parameters for focus detection. FindFoci is provided as an open-source plugin for ImageJ
- …