1,498 research outputs found
Inhibition in multiclass classification
The role of inhibition is investigated in a multiclass support vector machine formalism inspired by the brain structure of insects. The so-called mushroom bodies have a set of output neurons, or classification functions,
that compete with each other to encode a particular input. Strongly active output neurons depress or inhibit the remaining outputs without knowing which is correct or incorrect. Accordingly, we propose to use a
classification function that embodies unselective inhibition and train it in the large margin classifier framework. Inhibition leads to more robust classifiers in the sense that they perform better on larger areas of appropriate hyperparameters when assessed with leave-one-out strategies. We also show that the classifier with inhibition is a tight bound to probabilistic exponential models and is Bayes consistent for 3-class problems.
These properties make this approach useful for data sets with a limited number of labeled examples. For larger data sets, there is no significant comparative advantage to other multiclass SVM approaches
Inhibition in multiclass classification
The role of inhibition is investigated in a multiclass support vector machine formalism inspired by the brain structure of insects. The so-called mushroom bodies have a set of output neurons, or classification functions,
that compete with each other to encode a particular input. Strongly active output neurons depress or inhibit the remaining outputs without knowing which is correct or incorrect. Accordingly, we propose to use a
classification function that embodies unselective inhibition and train it in the large margin classifier framework. Inhibition leads to more robust classifiers in the sense that they perform better on larger areas of appropriate hyperparameters when assessed with leave-one-out strategies. We also show that the classifier with inhibition is a tight bound to probabilistic exponential models and is Bayes consistent for 3-class problems.
These properties make this approach useful for data sets with a limited number of labeled examples. For larger data sets, there is no significant comparative advantage to other multiclass SVM approaches
Unconventional machine learning of genome-wide human cancer data
Recent advances in high-throughput genomic technologies coupled with
exponential increases in computer processing and memory have allowed us to
interrogate the complex aberrant molecular underpinnings of human disease from
a genome-wide perspective. While the deluge of genomic information is expected
to increase, a bottleneck in conventional high-performance computing is rapidly
approaching. Inspired in part by recent advances in physical quantum
processors, we evaluated several unconventional machine learning (ML)
strategies on actual human tumor data. Here we show for the first time the
efficacy of multiple annealing-based ML algorithms for classification of
high-dimensional, multi-omics human cancer data from the Cancer Genome Atlas.
To assess algorithm performance, we compared these classifiers to a variety of
standard ML methods. Our results indicate the feasibility of using
annealing-based ML to provide competitive classification of human cancer types
and associated molecular subtypes and superior performance with smaller
training datasets, thus providing compelling empirical evidence for the
potential future application of unconventional computing architectures in the
biomedical sciences
Development of Machine Learning Models for Generation and Activity Prediction of the Protein Tyrosine Kinase Inhibitors
The field of computational drug discovery and development continues to grow at a rapid pace, using generative machine learning approaches to present us with solutions to high dimensional and complex problems in drug discovery and design. In this work, we present a platform of Machine Learning based approaches for generation and scoring of novel kinase inhibitor molecules. We utilized a binary Random Forest classification model to develop a Machine Learning based scoring function to evaluate the generated molecules on Kinase Inhibition Likelihood. By training the model on several chemical features of each known kinase inhibitor, we were able to create a metric that captures the differences between a SRC Kinase Inhibitor and a non-SRC Kinase Inhibitor. We implemented the scoring function into a Biased and Unbiased Bayesian Optimization framework to generate molecules based on features of SRC Kinase Inhibitors. We then used similarity metrics such as Tanimoto Similarity to assess their closeness to that of known SRC Kinase Inhibitors. The molecules generated from this experiment demonstrated potential for belonging to the SRC Kinase Inhibitor family though chemical synthesis would be needed to confirm the results. The top molecules generated from the Unbiased and Biased Bayesian Optimization experiments were calculated to respectively have Tanimoto Similarity scores of 0.711 and 0.709 to known SRC Kinase Inhibitors. With calculated Kinase Inhibition Likelihood scores of 0.586 and 0.575, the top molecules generated from the Bayesian Optimization demonstrate a disconnect between the similarity scores to known SRC Kinase Inhibitors and the calculated Kinase Inhibition Likelihood score. It was found that implementing a bias into the Bayesian Optimization process had little effect on the quality of generated molecules. In addition, several molecules generated from the Bayesian Optimization process were sent to the School of Pharmacy for chemical synthesis which gives the experiment more concrete results. The results of this study demonstrated that generating molecules throughBayesian Optimization techniques could aid in the generation of molecules for a specific kinase family, but further expansions of the techniques would be needed for substantial results
Recommended from our members
Computer modelling of metabolic adaptions during mitochondrial dysfunction and machine learning to predict novel mitochondrial disease genes
Mitochondria are organelles found in almost every eukaryote and are primarily responsible for generating chemical energy in the form of adenosine triphosphate. This thesis investigates two main causes of mitochondrial dysfunction: mitochondrial toxicity arising from side-effects of drugs; and mitochondrial diseases arising from defects in nuclear-encoded genes.
Novel chemical entities being developed as drug leads are screened for cellular toxicity in which mitochondrial dysfunction is a major cause. However, our lack of understanding of the metabolic adaptations to mitochondrial dysfunction limits the accurate screening of mitochondrial dysfunction for pharmaceutical companies, thus preventing potentially useful drugs from being developed. To further our understanding of these adaptations, I analysed a large-scale metabolomics data set of rats administered a known mitochondrial complex III inhibitor. The analyses revealed many perturbed pathways which can be exploited as biomarkers of mild mitochondrial dysfunction, a condition which is currently clinically undetectable during the drug development process. To direct future studies on mitochondrial dysfunction, a multi-organ model of mitochondrial metabolism was generated and used to simulate inhibition of the mitochondrial respiratory complexes. The simulations of complex III inhibition accurately predicted many of the metabolite behaviours identified in the metabolomics analyses and provided theories for their significance. Simulations of the other complexes’ inhibitions identified many unique behaviours which can be used to direct future studies, studies which would greatly improve our understanding of the metabolic adaptations and provide higher confidence biomarkers.
Mitochondrial dysfunction is linked to many late onset diseases such as Parkinson’s, and inborn errors of mitochondrial metabolism cause severe neurological and physiological diseases. Patients with suspected mitochondrial disease have their DNA sequenced and analysed. Diagnosis of mitochondrial disease by sequencing requires knowledge of the mitochondrial proteome, which is currently incomplete. A predicted mitochondrial proteome was generated using a support vector machine trained using the abundance of protein localisation data available in the MitoMiner database. The support vector machine identified 442 novel mitochondrional proteins. The current success rate of diagnosing mitochondrial disease using sequencing is currently limited by our inability to filter and prioritise a patient’s DNA variants. Patients which do not have a variant in one of the already known mitochondrial disease genes are usually left with over hundreds of potential disease-causing variants. A probability of being disease-causing for each gene in the mitochondrial proteome was generated using two trained neural networks. The networks were trained on a large amount of different data sources for differentiating mitochondrial disease genes including protein-protein interaction network metrics, gene tissue expression and protein evolution. The predicted probabilities allow for better filtering and prioritisation of a patient’s variants for candidate disease-causing genes to be experimentally verified. The predicted mitochondrial proteome and their predicted disease-causing probabilities are currently used in an NGS analysis pipeline at the MRC Mitochondrial Biology Unit for diagnosing mitochondrial disease patient samples
TIMMA-R : an R package for predicting synergistic multi-targeted drug combinations in cancer cell lines or patient-derived samples
Network pharmacology-based prediction of multi-targeted drug combinations is becoming a promising strategy to improve anticancer efficacy and safety. We developed a logic-based network algorithm, called Target Inhibition Interaction using Maximization and Minimization Averaging (TIMMA), which predicts the effects of drug combinations based on their binary drug-target interactions and single-drug sensitivity profiles in a given cancer sample. Here, we report the R implementation of the algorithm (TIMMA-R), which is much faster than the original MATLAB code. The major extensions include modeling of multiclass drug-target profiles and network visualization. We also show that the TIMMA-R predictions are robust to the intrinsic noise in the experimental data, thus making it a promising high-throughput tool to prioritize drug combinations in various cancer types for follow-up experimentation or clinical applications.Peer reviewe
Machine learning of visual object categorization: an application of the SUSTAIN model
Formal models of categorization are psychological theories that try to describe the process of categorization in a lawful way, using the language of mathematics. Their mathematical formulation makes it possible for the models to generate precise, quantitative predictions. SUSTAIN (Love, Medin & Gureckis, 2004) is a powerful formal model of categorization that has been used to model a range of human experimental data, describing the process of categorization in terms of an adaptive clustering principle. Love et al. (2004) suggested a possible application of the model in the field of object recognition and categorization. The present study explores this possibility, investigating at the same time the utility of using a formal model of categorization in a typical machine learning task. The image categorization performance of SUSTAIN on a well-known image set is compared with that of a linear Support Vector Machine, confirming the capability of SUSTAIN to perform image categorization with a reasonable accuracy, even if at a rather high computational cost
Leukemia multiclass assessment and classification from Microarray and RNA-seq technologies integration at gene expression level
In more recent years, a significant increase in the number of available biological experiments
has taken place due to the widespread use of massive sequencing data. Furthermore,
the continuous developments in the machine learning and in the high performance
computing areas, are allowing a faster and more efficient analysis and processing of this
type of data. However, biological information about a certain disease is normally widespread
due to the use of different sequencing technologies and different manufacturers, in different
experiments along the years around the world. Thus, nowadays it is of paramount importance
to attain a correct integration of biologically-related data in order to achieve genuine
benefits from them. For this purpose, this work presents an integration of multiple Microarray
and RNA-seq platforms, which has led to the design of a multiclass study by collecting samples
from the main four types of leukemia, quantified at gene expression. Subsequently, in
order to find a set of differentially expressed genes with the highest discernment capability
among different types of leukemia, an innovative parameter referred to as coverage is presented
here. This parameter allows assessing the number of different pathologies that a
certain gen is able to discern. It has been evaluated together with other widely known
parameters under assessment of an ANOVA statistical test which corroborated its filtering
power when the identified genes are subjected to a machine learning process at multiclass
level. The optimal tuning of gene extraction evaluated parameters by means of this statistical
test led to the selection of 42 highly relevant expressed genes. By the use of minimum-
Redundancy Maximum-Relevance (mRMR) feature selection algorithm, these genes were
reordered and assessed under the operation of four different classification techniques. Outstanding
results were achieved by taking exclusively the first ten genes of the ranking into
consideration. Finally, specific literature was consulted on this last subset of genes, revealing
the occurrence of practically all of them with biological processes related to leukemia. At sight of these results, this study underlines the relevance of considering a new parameter
which facilitates the identification of highly valid expressed genes for simultaneously discerning
multiple types of leukemia.This work was supported by Project
TIN2015-71873-R (Spanish Ministry of Economy
and Competitiveness -MINECO- and the European Regional Development Fund -ERDF) and Junta de
Andalucı´a (P12–TIC–2082)
Multiclass methods in the analysis of metabolomic datasets: the example of raspberry cultivar volatile compounds detected by GC-MS and PTR-MS
Multiclass sample classification and marker selection are cutting-edge problems in metabolomics. In the present study we address the classification of 14 raspberry cultivars having different levels of gray mold (Botrytis cinerea) susceptibility. We characterized raspberry cultivars by two headspace analysis methods, namely solid-phase microextraction/gas chromatography–mass spectrometry (SPME/GC–MS) and proton transfer reaction-mass spectrometry (PTR-MS). Given the high number of classes, advanced data mining methods are necessary. Random Forest (RF), Penalized Discriminant Analysis (PDA), Discriminant Partial Least Squares (dPLS) and Support Vector Machine (SVM) have been employed for cultivar classification and Random Forest-Recursive Feature Elimination (RF-RFE) has been used to perform feature selection. In particular the most important GC–MS and PTR-MS variables related to gray mold susceptibility of the selected raspberry cultivars have been investigated. Moving from GC–MS profiling to the more rapid and less invasive PTR-MS fingerprinting leads to a cultivar characterization which is still related to the corresponding Botrytis susceptibility level and therefore marker identification is still possible.Fil: Cappellin, Luca. Fondazione Edmund Mach. Research and Innovation Centre; ItaliaFil: Aprea, Eugenio. Fondazione Edmund Mach. Research and Innovation Centre; ItaliaFil: Granitto, Pablo Miguel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; ArgentinaFil: Romano, Andrea. Fondazione Edmund Mach. Research and Innovation Centre; ItaliaFil: Gasperi, Flavia. Fondazione Edmund Mach. Research and Innovation Centre; ItaliaFil: Biasioli, Franco. Fondazione Edmund Mach. Research and Innovation Centre; Itali
- …