350 research outputs found

    Block network mapping approach to quantitative trait locus analysis

    Get PDF
    BACKGROUND: Advances in experimental biology have enabled the collection of enormous troves of data on genomic variation in living organisms. The interpretation of this data to extract actionable information is one of the keys to developing novel therapeutic strategies to treat complex diseases. Network organization of biological data overcomes measurement noise in several biological contexts. Does a network approach, combining information about the linear organization of genomic markers with correlative information on these markers in a Bayesian formulation, lead to an analytic method with higher power for detecting quantitative trait loci? RESULTS: Block Network Mapping, combining Similarity Network Fusion (Wang et al., NM 11:333-337, 2014) with a Bayesian locus likelihood evaluation, leads to large improvements in area under the receiver operating characteristic and power over interval mapping with expectation maximization. The method has a monotonically decreasing false discovery rate as a function of effect size, unlike interval mapping. CONCLUSIONS: Block Network Mapping provides an alternative data-driven approach to mapping quantitative trait loci that leverages correlations in the sampled genotypes. The evaluation methodology can be combined with existing approaches such as Interval Mapping. Python scripts are available at http://lbm.niddk.nih.gov/vipulp/ . Genotype data is available at http://churchill-lab.jax.org/website/GattiDOQTL . BMC Bioinformatics 2016 Dec 22; 17(1):544

    Nine quick tips for efficient bioinformatics curriculum development and training.

    Get PDF
    Biomedical research is becoming increasingly data driven. New technologies that generate large-scale, complex data are continually emerging and evolving. As a result, there is a concurrent need for training researchers to use and understand new computational tools. Here we describe an efficient and effective approach to developing curriculum materials that can be deployed in a research environment to meet this need

    Cleaning Genotype Data from Diversity Outbred Mice.

    Get PDF
    Data cleaning is an important first step in most statistical analyses, including efforts to map the genetic loci that contribute to variation in quantitative traits. Here we illustrate approaches to quality control and cleaning of array-based genotyping data for multiparent populations (experimental crosses derived from more than two founder strains), using MegaMUGA array data from a set of 291 Diversity Outbred (DO) mice. Our approach employs data visualizations that can reveal problems at the level of individual mice or with individual SNP markers. We find that the proportion of missing genotypes for each mouse is an effective indicator of sample quality. We use microarray probe intensities for SNPs on the X and Y chromosomes to confirm the sex of each mouse, and we use the proportion of matching SNP genotypes between pairs of mice to detect sample duplicates. We use a hidden Markov model (HMM) reconstruction of the founder haplotype mosaic across each mouse genome to estimate the number of crossovers and to identify potential genotyping errors. To evaluate marker quality, we find that missing data and genotyping error rates are the most effective diagnostics. We also examine the SNP genotype frequencies with markers grouped according to their minor allele frequency in the founder strains. For markers with high apparent error rates, a scatterplot of the allele-specific probe intensities can reveal the underlying cause of incorrect genotype calls. The decision to include or exclude low-quality samples can have a significant impact on the mapping results for a given study. We find that the impact of low-quality markers on a given study is often minimal, but reporting problematic markers can improve the utility of the genotyping array across many studies

    A Primer on High-Throughput Computing for Genomic Selection

    Get PDF
    High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin–Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans

    Doxorubicin-Induced Cardiotoxicity in Collaborative Cross (CC) Mice Recapitulates Individual Cardiotoxicity in Humans.

    Get PDF
    Anthracyclines cause progressive cardiotoxicity whose ultimate severity is individual to the patient. Genetic determinants contributing to this variation are difficult to study using current mouse models. Our objective was to determine whether a spectrum of anthracycline induced cardiac disease can be elicited across 10 Collaborative Cross mouse strains given the same dose of doxorubicin. Mice from ten distinct strains were given 5 mg/kg of doxorubicin intravenously once weekly for 5 weeks (total 25 mg/kg). Mice were killed at acute or chronic timepoints. Body weight was assessed weekly, followed by terminal complete blood count, pathology and a panel of biomarkers. Linear models were fit to assess effects of treatment, sex, and sex-by-treatment interactions for each timepoint. Impaired growth and cardiac pathology occurred across all strains. Severity of these varied by strain and sex, with greater severity in males. Cardiac troponin I and myosin light chain 3 demonstrated strain- and sex-specific elevations in the acute phase with subsequent decline despite ongoing progression of cardiac disease. Acute phase cardiac troponin I levels predicted the ultimate severity of cardiac pathology poorly, whereas myosin light chain 3 levels predicted the extent of chronic cardiac injury in males. Strain- and sex-dependent renal toxicity was evident. Regenerative anemia manifested during the acute period. We confirm that variable susceptibility to doxorubicin-induced cardiotoxicity observed in humans can be modeled in a panel of CC strains. In addition, we identified a potential predictive biomarker in males. CC strains provide reproducible models to explore mechanisms contributing to individual susceptibility in humans

    Mouse genome-wide association and systems genetics identifies Lhfp as a regulator of bone mass.

    Get PDF
    Bone mineral density (BMD) is a strong predictor of osteoporotic fracture. It is also one of the most heritable disease-associated quantitative traits. As a result, there has been considerable effort focused on dissecting its genetic basis. Here, we performed a genome-wide association study (GWAS) in a panel of inbred strains to identify associations influencing BMD. This analysis identified a significant (P = 3.1 x 10-12) BMD locus on Chromosome [email protected] Mbp that replicated in two separate inbred strain panels and overlapped a BMD quantitative trait locus (QTL) previously identified in a F2 intercross. The association mapped to a 300 Kbp region containing four genes; Gm2447, Gm20750, Cog6, and Lhfp. Further analysis found that Lipoma HMGIC Fusion Partner (Lhfp) was highly expressed in bone and osteoblasts. Furthermore, its expression was regulated by a local expression QTL (eQTL), which overlapped the BMD association. A co-expression network analysis revealed that Lhfp was strongly connected to genes involved in osteoblast differentiation. To directly evaluate its role in bone, Lhfp deficient mice (Lhfp-/-) were created using CRISPR/Cas9. Consistent with genetic and network predictions, bone marrow stromal cells (BMSCs) from Lhfp-/- mice displayed increased osteogenic differentiation. Lhfp-/- mice also had elevated BMD due to increased cortical bone mass. Lastly, we identified SNPs in human LHFP that were associated (P = 1.2 x 10-5) with heel BMD. In conclusion, we used GWAS and systems genetics to identify Lhfp as a regulator of osteoblast activity and bone mass

    Heading Down the Wrong Pathway: on the Influence of Correlation within Gene Sets

    Get PDF
    Background Analysis of microarray experiments often involves testing for the overrepresentation of pre-defined sets of genes among lists of genes deemed individually significant. Most popular gene set testing methods assume the independence of genes within each set, an assumption that is seriously violated, as extensive correlation between genes is a well-documented phenomenon. Results We conducted a meta-analysis of over 200 datasets from the Gene Expression Omnibus in order to demonstrate the practical impact of strong gene correlation patterns that are highly consistent across experiments. We show that a common independence assumption-based gene set testing procedure produces very high false positive rates when applied to data sets for which treatment groups have been randomized, and that gene sets with high internal correlation are more likely to be declared significant. A reanalysis of the same datasets using an array resampling approach properly controls false positive rates, leading to more parsimonious and high-confidence gene set findings, which should facilitate pathway-based interpretation of the microarray data. Conclusions These findings call into question many of the gene set testing results in the literature and argue strongly for the adoption of resampling based gene set testing criteria in the peer reviewed biomedical literature

    Toxicogenetics: population-based testing of drug and chemical safety in mouse models

    Get PDF
    The rapid decline in the cost of dense genotyping is paving the way for new DNA sequence-based laboratory tests to move quickly into clinical practice, and to ultimately help realize the promise of ‘personalized’ therapies. These advances are based on the growing appreciation of genetics as an important dimension in science and the practice of investigative pharmacology and toxicology. On the clinical side, both the regulators and the pharmaceutical industry hope that the early identification of individuals prone to adverse drug effects will keep advantageous medicines on the market for the benefit of the vast majority of prospective patients. On the environmental health protection side, there is a clear need for better science to define the range and causes of susceptibility to adverse effects of chemicals in the population, so that the appropriate regulatory limits are established. In both cases, most of the research effort is focused on genome-wide association studies in humans where de novo genotyping of each subject is required. At the same time, the power of population-based preclinical safety testing in rodent models (e.g., mouse) remains to be fully exploited. Here, we highlight the approaches available to utilize the knowledge of DNA sequence and genetic diversity of the mouse as a species in mechanistic toxicology research. We posit that appropriate genetically defined mouse models may be combined with the limited data from human studies to not only discover the genetic determinants of susceptibility, but to also understand the molecular underpinnings of toxicity

    R/qtl2: Software for Mapping Quantitative Trait Loci with High-Dimensional Data and Multiparent Populations.

    Get PDF
    R/qtl2 is an interactive software environment for mapping quantitative trait loci (QTL) in experimental populations. The R/qtl2 software expands the scope of the widely used R/qtl software package to include multiparent populations derived from more than two founder strains, such as the Collaborative Cross and Diversity Outbred mice, heterogeneous stocks, and MAGIC plant populations. R/qtl2 is designed to handle modern high-density genotyping data and high-dimensional molecular phenotypes, including gene expression and proteomics. R/qtl2 includes the ability to perform genome scans using a linear mixed model to account for population structure, and also includes features to impute SNPs based on founder strain genomes and to carry out association mapping. The R/qtl2 software provides all of the basic features needed for QTL mapping, including graphical displays and summary reports, and it can be extended through the creation of add-on packages. R/qtl2, which is free and open source software written in the R and C++ programming languages, comes with a test framework
    corecore