15 research outputs found

    Profound Perturbation of the Metabolome in Obesity Is Associated with Health Risk.

    Get PDF
    Obesity is a heterogeneous phenotype that is crudely measured by body mass index (BMI). There is a need for a more precise yet portable method of phenotyping and categorizing risk in large numbers of people with obesity to advance clinical care and drug development. Here, we used non-targeted metabolomics and whole-genome sequencing to identify metabolic and genetic signatures of obesity. We find that obesity results in profound perturbation of the metabolome; nearly a third of the assayed metabolites associated with changes in BMI. A metabolome signature identifies the healthy obese and lean individuals with abnormal metabolomes-these groups differ in health outcomes and underlying genetic risk. Specifically, an abnormal metabolome associated with a 2- to 5-fold increase in cardiovascular events when comparing individuals who were matched for BMI but had opposing metabolome signatures. Because metabolome profiling identifies clinically meaningful heterogeneity in obesity, this approach could help select patients for clinical trials

    Profiling of short-tandem-repeat disease alleles in 12,632 human whole genomes

    No full text
    Short tandem repeats (STRs) are hyper-mutable sequences in the human genome. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases. There are challenges associated with accurately determining the length polymorphism of STR loci in the genome by next-generation sequencing (NGS). In particular, accurate detection of pathological STR expansion is limited by the sequence read length during whole-genome analysis. We developed TREDPARSE, a software package that incorporates various cues from read alignment and paired-end distance distribution, as well as a sequence stutter model, in a probabilistic framework to infer repeat sizes for genetic loci, and we used this software to infer repeat sizes for 30 known disease loci. Using simulated data, we show that TREDPARSE outperforms other available software. We sampled the full genome sequences of 12,632 individuals to an average read depth of approximately 30× to 40× with Illumina HiSeq X. We identified 138 individuals with risk alleles at 15 STR disease loci. We validated a representative subset of the samples (n = 19) by Sanger and by Oxford Nanopore sequencing. Additionally, we validated the STR calls against known allele sizes in a set of GeT-RM reference cell-line materials (n = 6). Several STR loci that are entirely guanine or cytosines (G or C) have insufficient read evidence for inference and therefore could not be assayed precisely by TREDPARSE. TREDPARSE extends the limit of STR size detection beyond the physical sequence read length. This extension is critical because many of the disease risk cutoffs are close to or beyond the short sequence read length of 100 to 150 bases

    Size matters: finding the most informative set of window lengths

    No full text
    Event sequences often contain continuous variability at different levels. In other words, their properties and characteristics change at different rates, concurrently. For example, the sales of a product may slowly become more frequent over a period of several weeks, but there may be interesting variation within a week at the same time. To provide an accurate and robust “view” of such multi-level structural behavior, one needs to determine the appropriate levels of granularity for analyzing the underlying sequence. We introduce the novel problem of finding the best set of window lengths for analyzing discrete event sequences. We define suitable criteria for choosing window lengths and propose an efficient method to solve the problem. We give examples of tasks that demonstrate the applicability of the problem and present extensive experiments on both synthetic data and real data from two domains: text and DNA. We find that the optimal sets of window lengths themselves can provide new insight into the data, e.g., the burstiness of events affects the optimal window lengths for measuring the event frequencies
    corecore