8,547 research outputs found
A transfer-learning approach to feature extraction from cancer transcriptomes with deep autoencoders
Publicado en Lecture Notes in Computer Science.The diagnosis and prognosis of cancer are among the more
challenging tasks that oncology medicine deals with. With the main aim
of fitting the more appropriate treatments, current personalized medicine
focuses on using data from heterogeneous sources to estimate the evolu-
tion of a given disease for the particular case of a certain patient. In recent
years, next-generation sequencing data have boosted cancer prediction by
supplying gene-expression information that has allowed diverse machine
learning algorithms to supply valuable solutions to the problem of cancer
subtype classification, which has surely contributed to better estimation
of patient’s response to diverse treatments. However, the efficacy of these
models is seriously affected by the existing imbalance between the high
dimensionality of the gene expression feature sets and the number of sam-
ples available for a particular cancer type. To counteract what is known
as the curse of dimensionality, feature selection and extraction methods
have been traditionally applied to reduce the number of input variables
present in gene expression datasets. Although these techniques work by
scaling down the input feature space, the prediction performance of tradi-
tional machine learning pipelines using these feature reduction strategies
remains moderate. In this work, we propose the use of the Pan-Cancer
dataset to pre-train deep autoencoder architectures on a subset com-
posed of thousands of gene expression samples of very diverse tumor
types. The resulting architectures are subsequently fine-tuned on a col-
lection of specific breast cancer samples. This transfer-learning approach
aims at combining supervised and unsupervised deep learning models
with traditional machine learning classification algorithms to tackle the
problem of breast tumor intrinsic-subtype classification.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
- …