15,239 research outputs found
Detection of regulator genes and eQTLs in gene networks
Genetic differences between individuals associated to quantitative phenotypic
traits, including disease states, are usually found in non-coding genomic
regions. These genetic variants are often also associated to differences in
expression levels of nearby genes (they are "expression quantitative trait
loci" or eQTLs for short) and presumably play a gene regulatory role, affecting
the status of molecular networks of interacting genes, proteins and
metabolites. Computational systems biology approaches to reconstruct causal
gene networks from large-scale omics data have therefore become essential to
understand the structure of networks controlled by eQTLs together with other
regulatory genes, and to generate detailed hypotheses about the molecular
mechanisms that lead from genotype to phenotype. Here we review the main
analytical methods and softwares to identify eQTLs and their associated genes,
to reconstruct co-expression networks and modules, to reconstruct causal
Bayesian gene and module networks, and to validate predicted networks in
silico.Comment: minor revision with typos corrected; review article; 24 pages, 2
figure
Recommended from our members
The robust selection of predictive genes via a simple classifier
Identifying genes that direct the mechanism of a disease from expression data is extremely useful in understanding how that mechanism works.
This in turn may lead to better diagnoses and potentially can lead to a cure for that disease. This task becomes extremely challenging when the
data are characterised by only a small number of samples and a high number of dimensions, as it is often the case with gene expression data.
Motivated by this challenge, we present a general framework that focuses on simplicity and data perturbation. These are the keys for the robust
identification of the most predictive features in such data. Within this framework, we propose a simple selective naĀØıve Bayes classifier discovered using a global search technique, and combine it with data perturbation to
increase its robustness to small sample sizes.
An extensive validation of the method was carried out using two applied datasets from the field of microarrays and a simulated dataset, all
confounded by small sample sizes and high dimensionality. The method has been shown capable of identifying genes previously confirmed or associated with prostate cancer and viral infections
Modeling Big Medical Survival Data Using Decision Tree Analysis with Apache Spark
In many medical studies, an outcome of interest is not only whether an event occurred, but when an event occurred; and an example of this is Alzheimerās disease (AD). Identifying patients with Mild Cognitive Impairment (MCI) who are likely to develop Alzheimerās disease (AD) is highly important for AD treatment. Previous studies suggest that not all MCI patients will convert to AD. Massive amounts of data from longitudinal and extensive studies on thousands of Alzheimerās patients have been generated. Building a computational model that can predict conversion form MCI to AD can be highly beneficial for early intervention and treatment planning for AD. This work presents a big data model that contains machine-learning techniques to determine the level of AD in a participant and predict the time of conversion to AD. The proposed framework considers one of the widely used screening assessment for detecting cognitive impairment called Montreal Cognitive Assessment (MoCA). MoCA data set was collected from different centers and integrated into our large data framework storage using a Hadoop Data File System (HDFS); the data was then analyzed using an Apache Spark framework. The accuracy of the proposed framework was compared with a semi-parametric Cox survival analysis model
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
- ā¦