1,498 research outputs found

    Inhibition in multiclass classification

    Get PDF
    The role of inhibition is investigated in a multiclass support vector machine formalism inspired by the brain structure of insects. The so-called mushroom bodies have a set of output neurons, or classification functions, that compete with each other to encode a particular input. Strongly active output neurons depress or inhibit the remaining outputs without knowing which is correct or incorrect. Accordingly, we propose to use a classification function that embodies unselective inhibition and train it in the large margin classifier framework. Inhibition leads to more robust classifiers in the sense that they perform better on larger areas of appropriate hyperparameters when assessed with leave-one-out strategies. We also show that the classifier with inhibition is a tight bound to probabilistic exponential models and is Bayes consistent for 3-class problems. These properties make this approach useful for data sets with a limited number of labeled examples. For larger data sets, there is no significant comparative advantage to other multiclass SVM approaches

    Inhibition in multiclass classification

    Get PDF
    The role of inhibition is investigated in a multiclass support vector machine formalism inspired by the brain structure of insects. The so-called mushroom bodies have a set of output neurons, or classification functions, that compete with each other to encode a particular input. Strongly active output neurons depress or inhibit the remaining outputs without knowing which is correct or incorrect. Accordingly, we propose to use a classification function that embodies unselective inhibition and train it in the large margin classifier framework. Inhibition leads to more robust classifiers in the sense that they perform better on larger areas of appropriate hyperparameters when assessed with leave-one-out strategies. We also show that the classifier with inhibition is a tight bound to probabilistic exponential models and is Bayes consistent for 3-class problems. These properties make this approach useful for data sets with a limited number of labeled examples. For larger data sets, there is no significant comparative advantage to other multiclass SVM approaches

    Unconventional machine learning of genome-wide human cancer data

    Full text link
    Recent advances in high-throughput genomic technologies coupled with exponential increases in computer processing and memory have allowed us to interrogate the complex aberrant molecular underpinnings of human disease from a genome-wide perspective. While the deluge of genomic information is expected to increase, a bottleneck in conventional high-performance computing is rapidly approaching. Inspired in part by recent advances in physical quantum processors, we evaluated several unconventional machine learning (ML) strategies on actual human tumor data. Here we show for the first time the efficacy of multiple annealing-based ML algorithms for classification of high-dimensional, multi-omics human cancer data from the Cancer Genome Atlas. To assess algorithm performance, we compared these classifiers to a variety of standard ML methods. Our results indicate the feasibility of using annealing-based ML to provide competitive classification of human cancer types and associated molecular subtypes and superior performance with smaller training datasets, thus providing compelling empirical evidence for the potential future application of unconventional computing architectures in the biomedical sciences

    Development of Machine Learning Models for Generation and Activity Prediction of the Protein Tyrosine Kinase Inhibitors

    Get PDF
    The field of computational drug discovery and development continues to grow at a rapid pace, using generative machine learning approaches to present us with solutions to high dimensional and complex problems in drug discovery and design. In this work, we present a platform of Machine Learning based approaches for generation and scoring of novel kinase inhibitor molecules. We utilized a binary Random Forest classification model to develop a Machine Learning based scoring function to evaluate the generated molecules on Kinase Inhibition Likelihood. By training the model on several chemical features of each known kinase inhibitor, we were able to create a metric that captures the differences between a SRC Kinase Inhibitor and a non-SRC Kinase Inhibitor. We implemented the scoring function into a Biased and Unbiased Bayesian Optimization framework to generate molecules based on features of SRC Kinase Inhibitors. We then used similarity metrics such as Tanimoto Similarity to assess their closeness to that of known SRC Kinase Inhibitors. The molecules generated from this experiment demonstrated potential for belonging to the SRC Kinase Inhibitor family though chemical synthesis would be needed to confirm the results. The top molecules generated from the Unbiased and Biased Bayesian Optimization experiments were calculated to respectively have Tanimoto Similarity scores of 0.711 and 0.709 to known SRC Kinase Inhibitors. With calculated Kinase Inhibition Likelihood scores of 0.586 and 0.575, the top molecules generated from the Bayesian Optimization demonstrate a disconnect between the similarity scores to known SRC Kinase Inhibitors and the calculated Kinase Inhibition Likelihood score. It was found that implementing a bias into the Bayesian Optimization process had little effect on the quality of generated molecules. In addition, several molecules generated from the Bayesian Optimization process were sent to the School of Pharmacy for chemical synthesis which gives the experiment more concrete results. The results of this study demonstrated that generating molecules throughBayesian Optimization techniques could aid in the generation of molecules for a specific kinase family, but further expansions of the techniques would be needed for substantial results

    TIMMA-R : an R package for predicting synergistic multi-targeted drug combinations in cancer cell lines or patient-derived samples

    Get PDF
    Network pharmacology-based prediction of multi-targeted drug combinations is becoming a promising strategy to improve anticancer efficacy and safety. We developed a logic-based network algorithm, called Target Inhibition Interaction using Maximization and Minimization Averaging (TIMMA), which predicts the effects of drug combinations based on their binary drug-target interactions and single-drug sensitivity profiles in a given cancer sample. Here, we report the R implementation of the algorithm (TIMMA-R), which is much faster than the original MATLAB code. The major extensions include modeling of multiclass drug-target profiles and network visualization. We also show that the TIMMA-R predictions are robust to the intrinsic noise in the experimental data, thus making it a promising high-throughput tool to prioritize drug combinations in various cancer types for follow-up experimentation or clinical applications.Peer reviewe

    Machine learning of visual object categorization: an application of the SUSTAIN model

    Get PDF
    Formal models of categorization are psychological theories that try to describe the process of categorization in a lawful way, using the language of mathematics. Their mathematical formulation makes it possible for the models to generate precise, quantitative predictions. SUSTAIN (Love, Medin & Gureckis, 2004) is a powerful formal model of categorization that has been used to model a range of human experimental data, describing the process of categorization in terms of an adaptive clustering principle. Love et al. (2004) suggested a possible application of the model in the field of object recognition and categorization. The present study explores this possibility, investigating at the same time the utility of using a formal model of categorization in a typical machine learning task. The image categorization performance of SUSTAIN on a well-known image set is compared with that of a linear Support Vector Machine, confirming the capability of SUSTAIN to perform image categorization with a reasonable accuracy, even if at a rather high computational cost

    Leukemia multiclass assessment and classification from Microarray and RNA-seq technologies integration at gene expression level

    Get PDF
    In more recent years, a significant increase in the number of available biological experiments has taken place due to the widespread use of massive sequencing data. Furthermore, the continuous developments in the machine learning and in the high performance computing areas, are allowing a faster and more efficient analysis and processing of this type of data. However, biological information about a certain disease is normally widespread due to the use of different sequencing technologies and different manufacturers, in different experiments along the years around the world. Thus, nowadays it is of paramount importance to attain a correct integration of biologically-related data in order to achieve genuine benefits from them. For this purpose, this work presents an integration of multiple Microarray and RNA-seq platforms, which has led to the design of a multiclass study by collecting samples from the main four types of leukemia, quantified at gene expression. Subsequently, in order to find a set of differentially expressed genes with the highest discernment capability among different types of leukemia, an innovative parameter referred to as coverage is presented here. This parameter allows assessing the number of different pathologies that a certain gen is able to discern. It has been evaluated together with other widely known parameters under assessment of an ANOVA statistical test which corroborated its filtering power when the identified genes are subjected to a machine learning process at multiclass level. The optimal tuning of gene extraction evaluated parameters by means of this statistical test led to the selection of 42 highly relevant expressed genes. By the use of minimum- Redundancy Maximum-Relevance (mRMR) feature selection algorithm, these genes were reordered and assessed under the operation of four different classification techniques. Outstanding results were achieved by taking exclusively the first ten genes of the ranking into consideration. Finally, specific literature was consulted on this last subset of genes, revealing the occurrence of practically all of them with biological processes related to leukemia. At sight of these results, this study underlines the relevance of considering a new parameter which facilitates the identification of highly valid expressed genes for simultaneously discerning multiple types of leukemia.This work was supported by Project TIN2015-71873-R (Spanish Ministry of Economy and Competitiveness -MINECO- and the European Regional Development Fund -ERDF) and Junta de Andalucı´a (P12–TIC–2082)

    Multiclass methods in the analysis of metabolomic datasets: the example of raspberry cultivar volatile compounds detected by GC-MS and PTR-MS

    Get PDF
    Multiclass sample classification and marker selection are cutting-edge problems in metabolomics. In the present study we address the classification of 14 raspberry cultivars having different levels of gray mold (Botrytis cinerea) susceptibility. We characterized raspberry cultivars by two headspace analysis methods, namely solid-phase microextraction/gas chromatography–mass spectrometry (SPME/GC–MS) and proton transfer reaction-mass spectrometry (PTR-MS). Given the high number of classes, advanced data mining methods are necessary. Random Forest (RF), Penalized Discriminant Analysis (PDA), Discriminant Partial Least Squares (dPLS) and Support Vector Machine (SVM) have been employed for cultivar classification and Random Forest-Recursive Feature Elimination (RF-RFE) has been used to perform feature selection. In particular the most important GC–MS and PTR-MS variables related to gray mold susceptibility of the selected raspberry cultivars have been investigated. Moving from GC–MS profiling to the more rapid and less invasive PTR-MS fingerprinting leads to a cultivar characterization which is still related to the corresponding Botrytis susceptibility level and therefore marker identification is still possible.Fil: Cappellin, Luca. Fondazione Edmund Mach. Research and Innovation Centre; ItaliaFil: Aprea, Eugenio. Fondazione Edmund Mach. Research and Innovation Centre; ItaliaFil: Granitto, Pablo Miguel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; ArgentinaFil: Romano, Andrea. Fondazione Edmund Mach. Research and Innovation Centre; ItaliaFil: Gasperi, Flavia. Fondazione Edmund Mach. Research and Innovation Centre; ItaliaFil: Biasioli, Franco. Fondazione Edmund Mach. Research and Innovation Centre; Itali
    corecore