167 research outputs found
A Pan-cancer Somatic Mutation Embedding using Autoencoders
Background: Next generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimensional tumor samples characterised with germline and somatic mutation data requires advance computational modelling for data interpretation. In this work, we propose to analyze this complex data with neural network learning, a methodology that made impressive advances in image and natural language processing. Results: Here we present a tumor mutation profile analysis pipeline based on an autoencoder model, which is used to discover better representations of lower dimensionality from large somatic mutation data of 40 different tumor types and subtypes. Kernel learning with hierarchical cluster analysis are used to assess the quality of the learned somatic mutation embedding, on which support vector machine models are used to accurately classify tumor subtypes. Conclusions: The learned latent space maps the original samples in a much lower dimension while keeping the biological signals from the original tumor samples. This pipeline and the resulting embedding allows an easier exploration of the heterogeneity within and across tumor types and to perform an accurate classification of tumor samples in the pan-cancer somatic mutation landscape.Fil: Palazzo, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina. Universidad Tecnológica Nacional; ArgentinaFil: Beauseroy, Pierre. Université de Technologie de Troyes; FranciaFil: Yankilevich, Patricio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentin
Feature extraction and selection using statistical dependence criteria
Dimensionality reduction using feature extraction and selection approaches is a common stage of many regression and classification tasks.
In recent years there have been significant e orts to reduce the dimension of the feature space without lossing information that is relevant for prediction. This objective can be cast into a conditional independence condition between the response or class labels and the transformed features.
Building on this, in this work we use measures of statistical dependence to estimate a lower-dimensional linear subspace of the features that retains the su cient information. Unlike likelihood-based and many momentbased methods, the proposed approach is semi-parametric and does not require model assumptions on the data. A regularized version to achieve simultaneous variable selection is presented too. Experiments with simulated data show that the performance of the proposed method compares favorably to well-known linear dimension reduction techniques.Sociedad Argentina de Informática e Investigación Operativa (SADIO
Feature extraction and selection using statistical dependence criteria
Dimensionality reduction using feature extraction and selection approaches is a common stage of many regression and classification tasks.
In recent years there have been significant e orts to reduce the dimension of the feature space without lossing information that is relevant for prediction. This objective can be cast into a conditional independence condition between the response or class labels and the transformed features.
Building on this, in this work we use measures of statistical dependence to estimate a lower-dimensional linear subspace of the features that retains the su cient information. Unlike likelihood-based and many momentbased methods, the proposed approach is semi-parametric and does not require model assumptions on the data. A regularized version to achieve simultaneous variable selection is presented too. Experiments with simulated data show that the performance of the proposed method compares favorably to well-known linear dimension reduction techniques.Sociedad Argentina de Informática e Investigación Operativa (SADIO
Hepatocellular Carcinoma tumor stage classification and gene selection using machine learning models
Cancer researchers are facing the opportunity to analyze and learn from big quantities of omic profiles of tumor samples. Different omic data is now available in several databases and the bioinformatics data analysis and interpretation are current bottlenecks. In this study somatic mutations and gene expression data from Hepatocellular carcinoma tumor samples are used to discriminate by Kernel Learning between tumor subtypes and early and late stages. This classification will allow medical doctors to establish an appropriate treatment according to the tumor stage. By building kernel machines we could discriminate both classes with an acceptable classification accuracy. Feature selection have been implemented to select the key genes which differential expression improves the separability between the samples of early and late stages.Special Issue dedicated to JAIIO 2018 (Jornadas Argentinas de Informática).Sociedad Argentina de Informática e Investigación Operativ
Gene-Based Multiclass Cancer Diagnosis with Class-Selective Rejections
Supervised learning of microarray data is receiving much attention in recent years. Multiclass
cancer diagnosis, based on selected gene profiles, are used as adjunct of clinical diagnosis. However,
supervised diagnosis may hinder patient care, add expense or confound a result. To avoid this
misleading, a multiclass cancer diagnosis with class-selective rejection is proposed. It rejects some
patients from one, some, or all classes in order to ensure a higher reliability while reducing time
and expense costs. Moreover, this classifier takes into account asymmetric penalties dependant
on each class and on each wrong or partially correct decision. It is based on ν-1-SVM coupled
with its regularization path and minimizes a general loss function defined in the class-selective
rejection scheme. The state of art multiclass algorithms can be considered as a particular case of
the proposed algorithm where the number of decisions is given by the classes and the loss function
is defined by the Bayesian risk. Two experiments are carried out in the Bayesian and the class
selective rejection frameworks. Five genes selected datasets are used to assess the performance of
the proposed method. Results are discussed and accuracies are compared with those computed by
the Naive Bayes, Nearest Neighbor, Linear Perceptron, Multilayer Perceptron, and Support Vector
Machines classifiers
Learning Kernels from genetic profiles to discriminate tumor subtypes
Our work aims to perform the feature selection step on Multiple Kernel Learning by optimizing the Kernel Target Alignment score. It begins by building feature-wise gaussian kernel functions. Then by a constrained linear combination of the feature-wise kernels, we aim to increase the Kernel Target Alignment to obtain a new optimized custom kernel. The linear combination results in a sparse solution where only few kernels survive to improve KTA and consequently a reduced feature subset is obtained. Reducing considerably the original gene set allow to study deeper the selected genes for clinical purposes. The higher the KTA obtained, the better the feature selection, since we want to build custom kernels to use them for classification purposes later. The final kernel after optimizing the KTA is built by a linear combination of ‘Ki’ kernels, each one associated to a μi coefficient. The μ vector is computed during the optimization process.Sociedad Argentina de Informática e Investigación Operativ
Learning Multiclass Rules with Class-Selective Rejection and Performance Constraints
International audienc
Learning Kernels from genetic profiles to discriminate tumor subtypes
Our work aims to perform the feature selection step on Multiple Kernel Learning by optimizing the Kernel Target Alignment score. It begins by building feature-wise gaussian kernel functions. Then by a constrained linear combination of the feature-wise kernels, we aim to increase the Kernel Target Alignment to obtain a new optimized custom kernel. The linear combination results in a sparse solution where only few kernels survive to improve KTA and consequently a reduced feature subset is obtained. Reducing considerably the original gene set allow to study deeper the selected genes for clinical purposes. The higher the KTA obtained, the better the feature selection, since we want to build custom kernels to use them for classification purposes later. The final kernel after optimizing the KTA is built by a linear combination of ‘Ki’ kernels, each one associated to a μi coefficient. The μ vector is computed during the optimization process.Sociedad Argentina de Informática e Investigación Operativ
Learning Kernels from genetic profiles to discriminate tumor subtypes
Our work aims to perform the feature selection step on Multiple Kernel Learning by optimizing the Kernel Target Alignment score. It begins by building feature-wise gaussian kernel functions. Then by a constrained linear combination of the feature-wise kernels, we aim to increase the Kernel Target Alignment to obtain a new optimized custom kernel. The linear combination results in a sparse solution where only few kernels survive to improve KTA and consequently a reduced feature subset is obtained. Reducing considerably the original gene set allow to study deeper the selected genes for clinical purposes. The higher the KTA obtained, the better the feature selection, since we want to build custom kernels to use them for classification purposes later. The final kernel after optimizing the KTA is built by a linear combination of ‘Ki’ kernels, each one associated to a μi coefficient. The μ vector is computed during the optimization process.Sociedad Argentina de Informática e Investigación Operativ
Extraction d'attributs discriminants par optimisation de fonctions paramétrées
Une méthode est proposée pour extraire automatiquement des attributs discriminants dans le cas d'un processus décrit à l'aide d'une base d'exemples étiquetés. Les attributs sont sélectionnés, à l'aide de familles de fonctions paramétrées, en déterminant les paramètres optimaux par rapport à un critère de séparabilité des classes. Les fonctions paramétrées choisies mesurent des caractéristiques correspondant aux moments d'ordre 0 ou 1 d'une représentation uni- ou bi-dimensionnelle pondérée. L'aspect continu des fonctions paramétrées permet d'explorer un ensemble infini d'attributs et d'éviter de traiter un problème de complexité combinatoire. Le critère mesurant la séparabilité des classes est basé sur les matrices de dispersion, et permet la sélection conjointe d'attributs. L'élaboration d'un classifieur linéaire, adapté aux attributs extraits est proposé. La méthode est appliquée à des signaux simulés décrits par leur représentation temporelle
- …