2,827 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Integrative methods for analyzing big data in precision medicine
We provide an overview of recent developments in big data analyses in the context of precision medicine and health informatics. With the advance in technologies capturing molecular and medical data, we entered the area of “Big Data” in biology and medicine. These data offer many opportunities to advance precision medicine. We outline key challenges in precision medicine and present recent advances in data integration-based methods to uncover personalized information from big data produced by various omics studies. We survey recent integrative methods for disease subtyping, biomarkers discovery, and drug repurposing, and list the tools that are available to domain scientists. Given the ever-growing nature of these big data, we highlight key issues that big data integration methods will face
Integrative methods for analysing big data in precision medicine
We provide an overview of recent developments in big data analyses in the context of precision medicine and health informatics. With the advance in technologies capturing molecular and medical data, we entered the area of “Big Data” in biology and medicine. These data offer many opportunities to advance precision medicine. We outline key challenges in precision medicine and present recent advances in data integration-based methods to uncover personalized information from big data produced by various omics studies. We survey recent integrative methods for disease subtyping, biomarkers discovery, and drug repurposing, and list the tools that are available to domain scientists. Given the ever-growing nature of these big data, we highlight key issues that big data integration methods will face
Recommended from our members
Pleiotropic and Epistatic Network-Based Discovery: Integrated Networks for Target Gene Discovery
Biological organisms are complex systems that are composed of functional networks of interacting molecules and macro-molecules. Complex phenotypes are the result of orchestrated, hierarchical, heterogeneous collections of expressed genomic variants. However, the effects of these variants are the result of historic selective pressure and current environmental and epigenetic signals, and, as such, their co-occurrence can be seen as genome-wide correlations in a number of different manners. Biomass recalcitrance (i.e., the resistance of plants to degradation or deconstruction, which ultimately enables access to a plant’s sugars) is a complex polygenic phenotype of high importance to biofuels initiatives. This study makes use of data derived from the re-sequenced genomes from over 800 different Populus trichocarpa genotypes in combination with metabolomic and pyMBMS data across this population, as well as co-expression and co-methylation networks in order to better understand the molecular interactions involved in recalcitrance, and identify target genes involved in lignin biosynthesis/degradation. A Lines Of Evidence (LOE) scoring system is developed to integrate the information in the different layers and quantify the number of lines of evidence linking genes to target functions. This new scoring system was applied to quantify the lines of evidence linking genes to lignin-related genes and phenotypes across the network layers, and allowed for the generation of new hypotheses surrounding potential new candidate genes involved in lignin biosynthesis in P. trichocarpa, including various AGAMOUS-LIKE genes. The resulting Genome Wide Association Study networks, integrated with Single Nucleotide Polymorphism (SNP) correlation, co-methylation, and co-expression networks through the LOE scores are proving to be a powerful approach to determine the pleiotropic and epistatic relationships underlying cellular functions and, as such, the molecular basis for complex phenotypes, such as recalcitrance
Transcriptomic data integration for precision medicine in leukemia
This thesis is comprised of three studies demonstrating the application of different statistical and bioinformatic approaches to address distinct challenges of implementing precision medicine strategies for hematological malignancies. The approaches focus on the analysis of next-generation sequencing data, including both genomic and transcriptomics, to deconvolute disease biology and underlying mechanisms of drug sensitivities and resistance. The outcomes of the studies have clinical implications for advancing current diagnosis and treatment paradigms in patients with hematological diseases.
Study I, RNA sequencing has not been widely adopted in a clinical diagnostic setting due to continuous development and lack of standardization. Here, the aim was to evaluate the efficiency of two different RNA-seq library preparation protocols applied to cells collected from acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) patients. The poly-A-tailed mRNA selection (PA) and ribo- depletion (RD) based RNA-seq library preparation protocols were compared and evaluated for detection of gene fusions, variant calling and gene expression profiling. Overall, both protocols produced broadly consistent results and similar outcomes. However, the PA protocol was more efficient in quantifying expression of leukemia marker genes and drug targets. It also provided higher sensitivity and specificity for expression-based classification of leukemia. In contrast, the RD protocol was more suitable for gene fusion detection and captured a greater number of transcripts. Importantly, high technical variations were observed in samples from two leukemia patient cases suggesting further development of strategies for transcriptomic quantification and data analysis.
Study II, the BCL-2 inhibitor venetoclax is an approved and effective agent in combination with hypomethylating agents or low dose cytarabine for AML patients, unfit for intensive induction chemotherapy. However, a limited number of patients responding to venetoclax and development of resistance to the treatment presents a challenge for using the drug to benefit the majority of the AML patients. The aim was to investigate genomic and transcriptomic biomarkers for venetoclax sensitivity and enable identification of the patients who are most responsive to venetoclax treatment. We found that venetoclax sensitive samples are enriched with WT1 and IDH1/IDH2 mutations. Intriguingly, HOX family genes, including HOXB9, HOXA5, HOXB3, HOXB4, were found to be significantly overexpressed in venetoclax sensitive patients. Thus, these HOX-cluster genes expression biomarkers can be explored in a clinical trial setting to stratify AML patients responding to venetoclax based therapies.
Study III, venetoclax treatment does not benefit all AML patients that demands identifying biomarkers to exclude the patients from venetoclax based therapies. The aim was to investigate transcriptomic biomarkers for ex vivo venetoclax resistance in AML patients. The correlation of ex vivo venetoclax response with gene expression profiles using a machine learning approach revealed significant overexpression of S100 family genes, S100A8 and S100A9. Moreover, high expression ofS100A9was found to be associated with birabresib (BET inhibitor) sensitivity. The overexpression of S100A8 and S100A9 could potentially be used to detect and monitor venetoclax resistance. The combination of BCL-2 and BET inhibitors may sensitize AML cells to venetoclax upon BET inhibition and block leukemic cell survival.In this thesis, the aim was to utilize gene expression information for advanced precision medicine outcomes in patients with hematological malignancies. In the study, I, the contemporary mainstream library preparation protocols, Ribo-depletion and PolyA enrichment used for RNA sequencing, were compared in order to select the protocol that suffices the goal of the experiment, especially in patients with acute leukemias. In study II, we applied bioinformatics approaches to identify IDH1/2 mutation and HOX family gene expression correlated with ex vivo sensitivity to BCL-2 inhibitor venetoclax in acute myeloid leukemia (AML) patients. In study III, statistical and machine learning methods were implemented to identify S100A8/A9 gene expression biomarkers for ex vivo resistance to venetoclax in AML patients. In summary, this thesis addresses the challenges of utilizing gene expression information to stratify patients based on biomarkers to promote precision medicine practice in hematological malignancies
Next-generation sequencing identifies mechanisms of tumourigenesis caused by loss of SMARCB1 in Malignant Rhabdoid Tumours
PhD ThesisIntroduction: Malignant Rhabdoid Tumours (MRT) are unique malignancies caused by
biallelic inactivation of a single gene (SMARCB1). SMARCB1 encodes for a protein that
is part of the SWI/SNF chromatin remodelling complex, responsible for the regulation
of hundreds of downstream genes/pathways. Despite the simple biology of these
tumours, no studies have identified the critical pathways involved in tumourigenesis.
The understanding of downstream effects is essential to identifying therapeutic targets
that can improve the outcome of MRT patients.
Methods: RNA-seq and 450K-methylation analyses have been performed in MRT
human primary malignancies (n > 39) and in 4 MRT cell lines in which lentivirus was
used to re-express SMARCB1 (G401, A204, CHLA-266, and STA-WT1). The MRT cell
lines were treated with 5-aza-2 -deoxycytidine followed by global gene transcription
analysis (RNA-seq and 450K-methylation) to investigate how changes in methylation
lead to tumourigenesis.
Results: We show that primary Malignant Rhabdoid Tumours present a unique and
distinct expression/methylation profile which confirms that MRT broadly constitute a
single and different tumour type from other paediatric malignancies. However, despite
their common cause MRT can be can sub-group by location (i.e. CNS or kidney). We
observe that re-expression of SMARCB1 in MRT cell lines determines
activation/inactivation of specific downstream pathways such as IL-6/TGF beta. We
also observe a direct correlation between alterations in methylation and gene expression
in CD44, GLI2, GLI3, CDKN1A, CDKN2A and JARID after SMARB1 re-expression.
Loss of SMARCB1 also promotes expression of aberrant isoforms and novel transcripts
and causes genome-wide changes in SWI/SNF binding.
Conclusion: Next generation transcriptome and methylome analysis in primary MRT
and in functional models give us detailed downstream effects of SMARCB1 loss in
Malignant Rhabdoid Tumours. The integration of data from both primary and functional
models has provided, for the first time, a genome-wide catalogue of SMARCB1
tumourigenic changes (validated using systems biology). Here we show how a single
V
deletion of SMARCB1 is responsible for deregulation of expression, methylation status
and binding at the promoter regions of potent tumour-suppressor genes. The genes,
pathways and biological mechanisms indicated as key in tumour development may
ultimately be targetable therapeutically and will lead to better treatments for what is
currently one of the most lethal paediatric cancers.NECCR, Children with cancer UK, Brain Trust, Love Oliver,
CCLG, Karen and Iain Wark, The Smiley Ridley Fund, whose financial support made
this project possible
Undisclosed, unmet and neglected challenges in multi-omics studies
[EN] Multi-omics approaches have become a reality in both large genomics projects and small laboratories. However, the multi-omics research community still faces a number of issues that have either not been sufficiently discussed or for which current solutions are still limited. In this Perspective, we elaborate on these limitations and suggest points of attention for future research. We finally discuss new opportunities and challenges brought to the field by the rapid development of single-cell high-throughput molecular technologies.This work has been funded by the Spanish Ministry of Science and Innovation with grant
number BES-2016-076994 to A.A.-L.Tarazona, S.; Arzalluz-Luque, Á.; Conesa, A. (2021). Undisclosed, unmet and neglected challenges in multi-omics studies. Nature Computational Science. 1(6):395-402. https://doi.org/10.1038/s43588-021-00086-z3954021
Kernel methods in genomics and computational biology
Support vector machines and kernel methods are increasingly popular in
genomics and computational biology, due to their good performance in real-world
applications and strong modularity that makes them suitable to a wide range of
problems, from the classification of tumors to the automatic annotation of
proteins. Their ability to work in high dimension, to process non-vectorial
data, and the natural framework they provide to integrate heterogeneous data
are particularly relevant to various problems arising in computational biology.
In this chapter we survey some of the most prominent applications published so
far, highlighting the particular developments in kernel methods triggered by
problems in biology, and mention a few promising research directions likely to
expand in the future
The altered transcriptome and DNA methylation profiles of docetaxel resistance in breast cancer PDX models
Taxanes are standard therapy in clinical practice for metastatic breast cancer; however, primary or acquired chemoresistance are a common cause of mortality. Breast cancer patient-derived xenografts (PDX) are powerful tools for the study of cancer biology and drug treatment response. Specific DNA methylation patterns have been associated to different breast cancer subtypes but its association with chemoresistance remains unstudied. Aiming to elucidate docetaxel resistance mechanisms, we performed genome-wide DNA methylation in breast cancer PDX models, including luminal and triple-negative breast cancer (TNBC) models sensitive to docetaxel, their matched models after emergence of chemoresistance and residual disease after short-term docetaxel treatment. We found that DNA methylation profiles from breast cancer PDX models maintain the subtype-specific methylation patterns of clinical samples. Two main DNA methylation clusters were found in TNBC PDX and remain stable during the emergence of docetaxel resistance; however, some genes/pathways were differentially methylated according to docetaxel response. A DNA methylation signature of resistance able to segregate TNBC based on chemotherapy response was identified. Transcriptomic profiling of selected sensitive/resistant pairs and integrative analysis with methylation data demonstrated correlation between some differentially methylated and expressed genes in docetaxel-resistant TNBC PDX models. Multiple gene expression changes were found after the emergence of docetaxel resistance in TNBC. DNA methylation and transcriptional changes identified between docetaxel-sensitive and -resistant TNBC PDX models or residual disease may have predictive value for chemotherapy response in TNBC. IMPLICATIONS: Subtype-specific DNA methylation patterns are maintained in breast cancer PDX models. While no global methylation changes were found, we uncovered differentially DNA methylated and expressed genes/pathways associated with the emergence of docetaxel resistance in TNBC
- …