13,075 research outputs found
Recommended from our members
The Computational Diet: A Review of Computational Methods Across Diet, Microbiome, and Health.
Food and human health are inextricably linked. As such, revolutionary impacts on health have been derived from advances in the production and distribution of food relating to food safety and fortification with micronutrients. During the past two decades, it has become apparent that the human microbiome has the potential to modulate health, including in ways that may be related to diet and the composition of specific foods. Despite the excitement and potential surrounding this area, the complexity of the gut microbiome, the chemical composition of food, and their interplay in situ remains a daunting task to fully understand. However, recent advances in high-throughput sequencing, metabolomics profiling, compositional analysis of food, and the emergence of electronic health records provide new sources of data that can contribute to addressing this challenge. Computational science will play an essential role in this effort as it will provide the foundation to integrate these data layers and derive insights capable of revealing and understanding the complex interactions between diet, gut microbiome, and health. Here, we review the current knowledge on diet-health-gut microbiota, relevant data sources, bioinformatics tools, machine learning capabilities, as well as the intellectual property and legislative regulatory landscape. We provide guidance on employing machine learning and data analytics, identify gaps in current methods, and describe new scenarios to be unlocked in the next few years in the context of current knowledge
A Distance-Based Test of Association Between Paired Heterogeneous Genomic Data
Due to rapid technological advances, a wide range of different measurements
can be obtained from a given biological sample including single nucleotide
polymorphisms, copy number variation, gene expression levels, DNA methylation
and proteomic profiles. Each of these distinct measurements provides the means
to characterize a certain aspect of biological diversity, and a fundamental
problem of broad interest concerns the discovery of shared patterns of
variation across different data types. Such data types are heterogeneous in the
sense that they represent measurements taken at very different scales or
described by very different data structures. We propose a distance-based
statistical test, the generalized RV (GRV) test, to assess whether there is a
common and non-random pattern of variability between paired biological
measurements obtained from the same random sample. The measurements enter the
test through distance measures which can be chosen to capture particular
aspects of the data. An approximate null distribution is proposed to compute
p-values in closed-form and without the need to perform costly Monte Carlo
permutation procedures. Compared to the classical Mantel test for association
between distance matrices, the GRV test has been found to be more powerful in a
number of simulation settings. We also report on an application of the GRV test
to detect biological pathways in which genetic variability is associated to
variation in gene expression levels in ovarian cancer samples, and present
results obtained from two independent cohorts
Comparative Analysis Association and Prediction of Various Phenotypic Traits of Oryza Sativa
Understanding the genotype-phenotype relationship and accurately predicting breeding values are crucial aspects of crop improvement programs. This paper investigates the genetic basis ,association of phenotypic trait height and yield and predicts the phenotypic traits of Oryza Sativa (rice) through a comprehensive approach encompassing genome-wide association studies (GWAS), phylogenetic analysis, machine learning algorithms, and the development of a graphical user interface (GUI) application. Genotypic and phenotypic data were collected from the RiceVarMap database. The genotypic information consisted of gene variation IDs, while the phenotype data included plant height. Data preprocessing involved the creation of a sequence. fasta file and multiple sequence alignment using the ClustalW tool. A phylogenetic tree was then constructed to analyse the subpopulations of Oryza Sativa. Clustering techniques were applied to further explore the genetic relationships among the samples. A GWAS file was generated to identify associations between genotype and phenotype. Subsequently, machine learning algorithms were employed for the classification and prediction of genomic estimated breeding values (GEBV) for height and yield traits. Random Forest emerged as the most accurate algorithm with 85% accuracy. To facilitate user interaction and data exploration, a GUI application was developed using Flask, allowing users to access the phylogenetic tree, height, and yield information, GWAS results, and make predictions. We explored there is a strong positive association between phenotypic trait height and yield
A Strategy analysis for genetic association studies with known inbreeding
Background: Association studies consist in identifying the genetic variants which are related to a specific disease through the use of statistical multiple hypothesis testing or segregation analysis in pedigrees. This type of studies has been very successful in the case of Mendelian monogenic disorders while it has been less successful in identifying genetic variants related to complex diseases where the insurgence depends on the interactions between different genes and the environment. The current technology allows to genotype more than a million of markers and this number has been rapidly increasing in the last years with the imputation based on templates sets and whole genome sequencing. This type of data introduces a great amount of noise in the statistical analysis and usually requires a great number of samples. Current methods seldom take into account gene-gene and gene-environment interactions which are fundamental especially in complex diseases. In this paper we propose to use a non-parametric additive model to detect the genetic variants related to diseases which accounts for interactions of unknown order. Although this is not new to
the current literature, we show that in an isolated population, where the most related subjects share also most of their genetic code, the use of additive models may be improved if the available genealogical tree is taken into account. Specifically, we form a sample of cases and controls with the highest inbreeding by means of the Hungarian method, and estimate the set of genes/environmental variables, associated with the disease, by means of Random Forest.
Results: We have evidence, from statistical theory, simulations and two applications, that we build a suitable
procedure to eliminate stratification between cases and controls and that it also has enough precision in
identifying genetic variants responsible for a disease. This procedure has been successfully used for the betathalassemia, which is a well known Mendelian disease, and also to the common asthma where we have identified
candidate genes that underlie to the susceptibility of the asthma. Some of such candidate genes have been also found related to common asthma in the current literature.
Conclusions: The data analysis approach, based on selecting the most related cases and controls along with the Random Forest model, is a powerful tool for detecting genetic variants associated to a disease in isolated
populations. Moreover, this method provides also a prediction model that has accuracy in estimating the unknown disease status and that can be generally used to build kit tests for a wide class of Mendelian diseases
Recommended from our members
Motif-informed analysis of phenotype heterogeneity in cancer
The landscape of cancer genomics harbors a wealth of DNA motifs, whose thorough analysis and integration provide a pivotal method to decipher the complex molecular interactions underlying cancer. This dissertation delineates novel computational methodologies for robust DNA motif analysis and data integration, aiming to elucidate the implications of DNA motifs on cancer heterogeneity and clinical outcomes. Chapter 1 lays the groundwork by showing the significance of DNA motifs in the genomic framework and delineating the current biomarkers in cancer. It highlights the opportunity that DNA motif analysis presents in unveiling a nuanced understanding of genomic interactions. It also indicates the motivations and specific aims of the study of both DNA motif quantification and co-localization analysis. In Chapter 2, a foundational marker for quantifying the prevalence of DNA repetitive motifs, termed as “Non-B DNA Burden”, is introduced. A user-centric platform is also developed to facilitate the efficient computation and visualization of this metric across various genomic scales. Together, they are offering a novel perspective for analyzing DNA motif heterogeneity. Transitioning to Chapter 3, the focus evolves toward an integrated marker approach. By integrating the prevalence analysis of DNA motifs in conjunction with the frequency of co-localized mutations, novel markers mlTNB (mutation-localized total non-B burden) and nbTMB (non-B informed tumor mutation burden) are proposed. Their potential in predicting cancer prognosis and treatment responses is specifically explored. Chapter 4 broadens the analytical foundation by defining MoCoLo (Motif Co-Localization), a robust statistical framework for testing multi-modal DNA motif co-localization. Through this framework, we are able to explore the complex interplay of genomic features and provide a methodical approach to investigate their co-localization in a multi-modal data integration context. Case studies are employed to showcase the utility of MoCoLo in examining the co-localization of genomic features, thus facilitating the understanding of genomic interactions that are pivotal to cancer biology. Chapter 5 synthesizes the findings from the preceding explorations, outlining the contributions of the developed methodologies to the field of cancer genomics and bioinformatics. It demonstrates the potential impact of DNA motif analysis and data integration on understanding phenotype heterogeneity in cancer and shows the prospective avenues it provides for impactful future research. Overall, this work is structured to contribute to the bioinformatics community by weaving together innovative tools and analyses focused on DNA motif analysis and data integration. It strives to pave a beneficial way forward to a deeper understanding of the cancer genome, thereby enhancing potential diagnostic and therapeutic strategies.Cellular and Molecular Biolog
The Population Genetic Signature of Polygenic Local Adaptation
Adaptation in response to selection on polygenic phenotypes may occur via
subtle allele frequencies shifts at many loci. Current population genomic
techniques are not well posed to identify such signals. In the past decade,
detailed knowledge about the specific loci underlying polygenic traits has
begun to emerge from genome-wide association studies (GWAS). Here we combine
this knowledge from GWAS with robust population genetic modeling to identify
traits that may have been influenced by local adaptation. We exploit the fact
that GWAS provide an estimate of the additive effect size of many loci to
estimate the mean additive genetic value for a given phenotype across many
populations as simple weighted sums of allele frequencies. We first describe a
general model of neutral genetic value drift for an arbitrary number of
populations with an arbitrary relatedness structure. Based on this model we
develop methods for detecting unusually strong correlations between genetic
values and specific environmental variables, as well as a generalization of
comparisons to test for over-dispersion of genetic values among
populations. Finally we lay out a framework to identify the individual
populations or groups of populations that contribute to the signal of
overdispersion. These tests have considerably greater power than their single
locus equivalents due to the fact that they look for positive covariance
between like effect alleles, and also significantly outperform methods that do
not account for population structure. We apply our tests to the Human Genome
Diversity Panel (HGDP) dataset using GWAS data for height, skin pigmentation,
type 2 diabetes, body mass index, and two inflammatory bowel disease datasets.
This analysis uncovers a number of putative signals of local adaptation, and we
discuss the biological interpretation and caveats of these results.Comment: 42 pages including 8 figures and 3 tables; supplementary figures and
tables not included on this upload, but are mostly unchanged from v
- …