8 research outputs found
Studying alternative splicing regulatory networks through partial correlation analysis
The identification of links between exons and their regulators or targets and between co-spliced exons in human, mouse and rat provides novel insights into the alternative splicing regulatory network
Innovated higher criticism for detecting sparse signals in correlated noise
Higher criticism is a method for detecting signals that are both sparse and
weak. Although first proposed in cases where the noise variables are
independent, higher criticism also has reasonable performance in settings where
those variables are correlated. In this paper we show that, by exploiting the
nature of the correlation, performance can be improved by using a modified
approach which exploits the potential advantages that correlation has to offer.
Indeed, it turns out that the case of independent noise is the most difficult
of all, from a statistical viewpoint, and that more accurate signal detection
(for a given level of signal sparsity and strength) can be obtained when
correlation is present. We characterize the advantages of correlation by
showing how to incorporate them into the definition of an optimal detection
boundary. The boundary has particularly attractive properties when correlation
decays at a polynomial rate or the correlation matrix is Toeplitz.Comment: Published in at http://dx.doi.org/10.1214/09-AOS764 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Considering dependence among genes and markers for false discovery control in eQTL mapping
Motivation: Multiple comparison adjustment is a significant and challenging statistical issue in large-scale biological studies. In previous studies, dependence among genes is largely ignored. However, such dependence may be strong for some genomic-scale studies such as genetical genomics [also called expression quantitative trait loci (eQTL) mapping] in which thousands of genes are treated as quantitative traits and mapped to different genetical markers. Besides the dependence among markers, the dependence among the expression levels of genes can also have a significant impact on data analysis and interpretation
Design and analysis of genetical genomics studies and their potential applications in livestock research
Quantitative Trait Loci (QTL) mapping has been widely used to identify
genetic loci attributable to the variation observed in complex traits. In recent years,
gene expression phenotypes have emerged as a new type of quantitative trait for
which QTL can be mapped. Locating sequence variation that has an effect on gene
expression (eQTL) is thought to be a promising way to elucidate the genetic
architecture of quantitative traits. This thesis explores a number of methodological
aspects of eQTL mapping (also known as “genetical genomics”) and considers some
practical strategies for applying this approach to livestock populations.
One of the exciting prospects of genetical genomics is that the combination of
expression studies with fine mapping of functional trait loci can guide the
reconstruction of gene networks. The thesis begins with an analysis in which
correlations between gene expression and meat quality traits in pigs are investigated
in relation to a pork meat quality QTL previously identified. The influence on power
due to factors including sample size and records of matched subjects is discussed. An
efficient experimental design for two-colour microarrays is then put forward, and it is
shown to be an effective use of microarrays for mapping additive eQTL in outbred
crosses under simulation. However, designs optimised for detecting both additive
and dominance eQTL are found to be less effective.
Data collected from livestock populations usually have a pedigreed structure.
Many family-based association mapping methods are rather computationally
intensive, hence are time-consuming when analysing very large numbers of traits.
The application of a novel family-based association method is demonstrated; it is
shown to be fast, accurate and flexible for genetical genomics. Furthermore, the
results show that multiple testing correction alone is not sufficient to control type I
errors in genetical genomics and that careful data filtering is essential. While it is
important to limit false positives, it is desirable not to miss many true signals. A
multi-trait analysis based on grouping of functionally related genes is devised to
detect some of the signals overlooked by a univariate analysis. Using an inbred rat
dataset, 13 loci are identified with significant linkage to gene sets of various
functions defined by Gene Ontology. Applying this method to livestock species is
possible, but the current level of annotations is a limiting factor. Finally, the thesis
concludes with some current opinions on the development of genetical genomics and
its impact on livestock genetics research
The 1000 Genomes Toxicity Screening Project: Utilizing the power of human genome variation for population-scale in vitro testing
Incorporation of novel toxicity screening approaches is a crucial tool for tackling the complex contemporary challenges in evaluating the human health hazards of exposure to chemicals. A shift in toxicity testing from in vivo to in vitro methods may efficiently prioritize compounds, reveal new mechanisms, and enable predictive modeling. Quantitative high-throughput screening (qHTS) is a major source of data for computational toxicology. However, current in vitro testing paradigms such as Tox21 or NexGen still have major gaps that need addressing, such as population-based in vitro approaches to qHTS screening. This study evaluated the hypothesis that comparative population genomics with efficient in vitro experimental design can be used for the evaluation of the potential hazard, mode of action, and the extent of population variability in response to chemicals. In Aim 1, we evaluated and assessed the validity of in vitro genetically-anchored population human model system in assessing chemical toxicity and identifying candidate genetic susceptibility. We screened 81 human lymphoblast cell lines with 240 chemicals at 12 different concentrations and assessed the toxic response using different endpoints (cell death and caspase production). We evaluated the toxic responses to a panel of chemicals observed in lymphoblast cell lines, and compared them to other toxic responses seen with different cell lines that originate from different sources. In Aim 2, we expanded our model to include more than one population, to increase statistical power to detect genetic variants associated with toxicological response. The goals were to(1) quantitatively assess population-based toxicological hazard to environmental contaminants, (2) determine the extent of human inter-individual variability in chemical toxicity, identify susceptible sub-populations or races, (3) understand the genetic determinants of the inter-individual variability, (4) generate testable hypotheses about toxicity pathways by leveraging genetic and genomic data from 1000 Genomes and HapMap Projects, and (5) use the data obtained from this research to build predictive in silico models. In Aim 3, we addressed some of the remaining challenges in our model, such as the ability to screen chemical mixtures. We explored the potential and efficiency of our model in assessing new challenges such as the evaluation of environmental chemical mixtures in a population in vitro screening, and the extrapolation of the in vitro hazard to an oral equivalent dose. In summary, this research not only will use novel tools to investigate population genetically anchored variability, but it will also offer exceptional methodology for incorporating scientifically-based estimates of uncertainty in risk assessment.Doctor of Philosoph
Recommended from our members
Post-Genomic Approaches to Personalized Medicine: Applications in Exome Sequencing, Microbiome, and COPD
Since the completion of the sequencing of the human genome at the turn of the century, genomics has revolutionized the study of biology and medicine by providing high-throughput and quantitative methods for measuring molecular activities. Microarray and next generation sequencing emerged as important inflection points where the rate of data generation skyrocketed. The high dimensionality nature and the rapid growth in the volume of data precipitated a unique computational challenge in massive data analysis and interpretation. Noise and signal structure in the data varies significantly across types of data and technologies; thus, the context of the data generation process itself plays an important role in detecting key and oftentimes subtle signals. In this dissertation, we discuss four areas where contextualizing the data aids discoveries of disease-causing variants, complex relationships in the human microecology, interplay between gene and environment, and genetic regulation of gene expression. These studies, each in its own unique way, have helped made possible discoveries and expanded the horizon of our understanding of the human body, in health and disease