2,557 research outputs found
Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM
Motivation: High-throughput data is providing a comprehensive view of the molecular changes in cancer tissues. New technologies allow for the simultaneous genome-wide assay of the state of genome copy number variation, gene expression, DNA methylation and epigenetics of tumor samples and cancer cell lines
Post-transcriptional knowledge in pathway analysis increases the accuracy of phenotypes classification
Motivation: Prediction of phenotypes from high-dimensional data is a crucial
task in precision biology and medicine. Many technologies employ genomic
biomarkers to characterize phenotypes. However, such elements are not
sufficient to explain the underlying biology. To improve this, pathway analysis
techniques have been proposed. Nevertheless, such methods have shown lack of
accuracy in phenotypes classification. Results: Here we propose a novel
methodology called MITHrIL (Mirna enrIched paTHway Impact anaLysis) for the
analysis of signaling pathways, which has built on top of the work of Tarca et
al., 2009. MITHrIL extends pathways by adding missing regulatory elements, such
as microRNAs, and their interactions with genes. The method takes as input the
expression values of genes and/or microRNAs and returns a list of pathways
sorted according to their deregulation degree, together with the corresponding
statistical significance (p-values). Our analysis shows that MITHrIL
outperforms its competitors even in the worst case. In addition, our method is
able to correctly classify sets of tumor samples drawn from TCGA. Availability:
MITHrIL is freely available at the following URL:
http://alpha.dmi.unict.it/mithril
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
An integrative analysis of cancer gene expression studies using Bayesian latent factor modeling
We present an applied study in cancer genomics for integrating data and
inferences from laboratory experiments on cancer cell lines with observational
data obtained from human breast cancer studies. The biological focus is on
improving understanding of transcriptional responses of tumors to changes in
the pH level of the cellular microenvironment. The statistical focus is on
connecting experimentally defined biomarkers of such responses to clinical
outcome in observational studies of breast cancer patients. Our analysis
exemplifies a general strategy for accomplishing this kind of integration
across contexts. The statistical methodologies employed here draw heavily on
Bayesian sparse factor models for identifying, modularizing and correlating
with clinical outcome these signatures of aggregate changes in gene expression.
By projecting patterns of biological response linked to specific experimental
interventions into observational studies where such responses may be evidenced
via variation in gene expression across samples, we are able to define
biomarkers of clinically relevant physiological states and outcomes that are
rooted in the biology of the original experiment. Through this approach we
identify microenvironment-related prognostic factors capable of predicting long
term survival in two independent breast cancer datasets. These results suggest
possible directions for future laboratory studies, as well as indicate the
potential for therapeutic advances though targeted disruption of specific
pathway components.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS261 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Statistical Methods in Integrative Genomics
Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions
A model for gene deregulation detection using expression data
In tumoral cells, gene regulation mechanisms are severely altered, and these
modifications in the regulations may be characteristic of different subtypes of
cancer. However, these alterations do not necessarily induce differential
expressions between the subtypes. To answer this question, we propose a
statistical methodology to identify the misregulated genes given a reference
network and gene expression data. Our model is based on a regulatory process in
which all genes are allowed to be deregulated. We derive an EM algorithm where
the hidden variables correspond to the status (under/over/normally expressed)
of the genes and where the E-step is solved thanks to a message passing
algorithm. Our procedure provides posterior probabilities of deregulation in a
given sample for each gene. We assess the performance of our method by
numerical experiments on simulations and on a bladder cancer data set
Undisclosed, unmet and neglected challenges in multi-omics studies
[EN] Multi-omics approaches have become a reality in both large genomics projects and small laboratories. However, the multi-omics research community still faces a number of issues that have either not been sufficiently discussed or for which current solutions are still limited. In this Perspective, we elaborate on these limitations and suggest points of attention for future research. We finally discuss new opportunities and challenges brought to the field by the rapid development of single-cell high-throughput molecular technologies.This work has been funded by the Spanish Ministry of Science and Innovation with grant
number BES-2016-076994 to A.A.-L.Tarazona, S.; Arzalluz-Luque, Á.; Conesa, A. (2021). Undisclosed, unmet and neglected challenges in multi-omics studies. Nature Computational Science. 1(6):395-402. https://doi.org/10.1038/s43588-021-00086-z3954021
- …