589 research outputs found
Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains
BACKGROUND: The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. RESULTS: A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. CONCLUSION: This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request
Molecular Science for Drug Development and Biomedicine
With the avalanche of biological sequences generated in the postgenomic age, molecular science is facing an unprecedented challenge, i.e., how to timely utilize the huge amount of data to benefit human beings. Stimulated by such a challenge, a rapid development has taken place in molecular science, particularly in the areas associated with drug development and biomedicine, both experimental and theoretical. The current thematic issue was launched with the focus on the topic of âMolecular Science for Drug Development and Biomedicineâ, in hopes to further stimulate more useful techniques and findings from various approaches of molecular science for drug development and biomedicine
CLASSIFIERS BASED ON A NEW APPROACH TO ESTIMATE THE FISHER SUBSPACE AND THEIR APPLICATIONS
In this thesis we propose a novel classifier, and its extensions, based on a novel estimation of the Fisher Subspace. The proposed classifiers have been developed to deal with high dimensional and highly unbalanced datasets whose cardinality is low. The efficacy of the proposed techniques has been proved by the results achieved on real and synthetic datasets, and by the comparison with state of the art predictors
Histopathological image analysis : a review
Over the past decade, dramatic increases in computational power and improvement in image analysis algorithms have allowed the development of powerful computer-assisted analytical approaches to radiological data. With the recent advent of whole slide digital scanners, tissue histopathology slides can now be digitized and stored in digital image form. Consequently, digitized tissue histopathology has now become amenable to the application of computerized image analysis and machine learning techniques. Analogous to the role of computer-assisted diagnosis (CAD) algorithms in medical imaging to complement the opinion of a radiologist, CAD algorithms have begun to be developed for disease detection, diagnosis, and prognosis prediction to complement the opinion of the pathologist. In this paper, we review the recent state of the art CAD technology for digitized histopathology. This paper also briefly describes the development and application of novel image analysis technology for a few specific histopathology related problems being pursued in the United States and Europe
Recommended from our members
When the machine does not know measuring uncertainty in deep learning models of medical images
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonRecently, Deep learning (DL), which involves powerful black box predictors, has outperformed
human experts in several medical diagnostic problems. However, these methods focus
exclusively on improving the accuracy of point predictions without assessing their outputsâ
quality and ignore the asymmetric cost involved in different types of misclassification errors.
Neural networks also do not deliver confidence in predictions and suffer from over and
under confidence, i.e. are not well calibrated. Knowing how much confidence there is in a
prediction is essential for gaining cliniciansâ trust in the technology.
Calibrated uncertainty quantification is a challenging problem as no ground truth is
available. To address this, we make two observations: (i) cost-sensitive deep neural networks
with Dropweights models better quantify calibrated predictive uncertainty, and (ii) estimated
uncertainty with point predictions in Deep Ensembles Bayesian Neural Networks with
DropWeights can lead to a more informed decision and improve prediction quality.
This dissertation focuses on quantifying uncertainty using concepts from cost-sensitive
neural networks, calibration of confidence, and Dropweights ensemble method. First, we
show how to improve predictive uncertainty by deep ensembles of neural networks with Dropweights
learning an approximate distribution over its weights in medical image segmentation
and its application in active learning. Second, we use the Jackknife resampling technique
to correct bias in quantified uncertainty in image classification and propose metrics to measure
uncertainty performance. The third part of the thesis is motivated by the discrepancy
between the model predictive error and the objective in quantified uncertainty when costs for
misclassification errors or unbalanced datasets are asymmetric. We develop cost-sensitive
modifications of the neural networks in disease detection and propose metrics to measure the
quality of quantified uncertainty. Finally, we leverage an adaptive binning strategy to measure
uncertainty calibration error that directly corresponds to estimated uncertainty performance
and address problematic evaluation methods.
We evaluate the effectiveness of the tools on nuclei images segmentation, multi-class
Brain MRI image classification, multi-level cell type-specific protein expression prediction in
ImmunoHistoChemistry (IHC) images and cost-sensitive classification for Covid-19 detection
from X-Rays and CT image dataset. Our approach is thoroughly validated by measuring the
quality of uncertainty. It produces an equally good or better result and paves the way for the
future that addresses the practical problems at the intersection of deep learning and Bayesian
decision theory.
In conclusion, our study highlights the opportunities and challenges of the application of
estimated uncertainty in deep learning models of medical images, representing the confidence of the modelâs prediction, and the uncertainty quality metrics show a significant improvement
when using Deep Ensembles Bayesian Neural Networks with DropWeights
Bioinformatics applied to human genomics and proteomics: development of algorithms and methods for the discovery of molecular signatures derived from omic data and for the construction of co-expression and interaction networks
[EN] The present PhD dissertation develops and applies Bioinformatic methods and tools to address key current problems in the analysis of human omic data. This PhD has been organised by main objectives into four different chapters focused on: (i) development of an algorithm for the analysis of changes and heterogeneity in large-scale omic data; (ii) development of a method for non-parametric feature selection; (iii) integration and analysis of human protein-protein interaction networks and (iv) integration and analysis of human co-expression networks derived from tissue expression data and evolutionary profiles of proteins.
In the first chapter, we developed and tested a new robust algorithm in R, called DECO, for the discovery of subgroups of features and samples within large-scale omic datasets, exploring all feature differences possible heterogeneity, through the integration of both data dispersion and predictor-response information in a new statistic parameter called h (heterogeneity score). In the second chapter, we present a simple non-parametric statistic to measure the cohesiveness of categorical variables along any quantitative variable, applicable to feature selection in all types of big data sets. In the third chapter, we describe an analysis of the human interactome integrating two global datasets from high-quality proteomics technologies: HuRI (a human protein-protein interaction network generated by a systematic experimental screening based on Yeast-Two-Hybrid technology) and Cell-Atlas (a comprehensive map of subcellular localization of human proteins generated by antibody imaging). This analysis aims to create a framework for the subcellular localization characterization supported by the human protein-protein interactome. In the fourth chapter, we developed a full integration of three high-quality proteome-wide resources (Human Protein Atlas, OMA and TimeTree) to generate a robust human co-expression network across tissues assigning each human protein along the evolutionary timeline. In this way, we investigate how old in evolution and how correlated are the different human proteins, and we place all them in a common interaction network.
As main general comment, all the work presented in this PhD uses and develops a wide variety of bioinformatic and statistical tools for the analysis, integration and enlighten of molecular signatures and biological networks using human omic data. Most of this data corresponds to sample cohorts generated in recent biomedical studies on specific human diseases
A survey on big multimedia data processing and management in smart cities
© 2019 Association for Computing Machinery. All rights reserved. Integration of embedded multimedia devices with powerful computing platforms, e.g., machine learning platforms, helps to build smart cities and transforms the concept of Internet of Things into Internet of Multimedia Things (IoMT). To provide different services to the residents of smart cities, the IoMT technology generates big multimedia data. The management of big multimedia data is a challenging task for IoMT technology. Without proper management, it is hard to maintain consistency, reusability, and reconcilability of generated big multimedia data in smart cities. Various machine learning techniques can be used for automatic classification of raw multimedia data and to allow machines to learn features and perform specific tasks. In this survey, we focus on various machine learning platforms that can be used to process and manage big multimedia data generated by different applications in smart cities. We also highlight various limitations and research challenges that need to be considered when processing big multimedia data in real-time
Probabilistic multiple kernel learning
The integration of multiple and possibly heterogeneous information sources for an overall decision-making process has been an open and unresolved research direction in computing science since its very beginning. This thesis attempts to address parts of that direction by proposing probabilistic data integration algorithms for multiclass decisions where an observation of interest is assigned to one of many categories based on a plurality of information channels
- âŠ