30 research outputs found

    A user-friendly tool using systems biology models to infer cell functions from omics

    Get PDF
    Please click Additional Files below to see the full abstract

    On the design of clone-based haplotyping

    Get PDF
    Background: Haplotypes are important for assessing genealogy and disease susceptibility of individual genomes, but are difficult to obtain with routine sequencing approaches. Experimental haplotype reconstruction based on assembling fragments of individual chromosomes is promising, but with variable yields due to incompletely understood parameter choices. Results: We parameterize the clone-based haplotyping problem in order to provide theoretical and empirical assessments of the impact of different parameters on haplotype assembly. We confirm the intuition that long clones help link together heterozygous variants and thus improve haplotype length. Furthermore, given the length of the clones, we address how to choose the other parameters, including number of pools, clone coverage and sequencing coverage, so as to maximize haplotype length. We model the problem theoretically and show empirically the benefits of using larger clones with moderate number of pools and sequencing coverage. In particular, using 140 kb BAC clones, we construct haplotypes for a personal genome and assemble haplotypes with N50 values greater than 2.6 Mb. These assembled haplotypes are longer and at least as accurate as haplotypes of existing clone-based strategies, whether in vivo or in vitro. Conclusions: Our results provide practical guidelines for the development and design of clone-based methods to achieve long range, high-resolution and accurate haplotypes

    HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data

    Get PDF
    BACKGROUND: The human leukocyte antigen (HLA) system is a genomic region involved in regulating the human immune system by encoding cell membrane major histocompatibility complex (MHC) proteins that are responsible for self-recognition. Understanding the variation in this region provides important insights into autoimmune disorders, disease susceptibility, oncological immunotherapy, regenerative medicine, transplant rejection, and toxicogenomics. Traditional approaches to HLA typing are low throughput, target only a few genes, are labor intensive and costly, or require specialized protocols. RNA sequencing promises a relatively inexpensive, high-throughput solution for HLA calling across all genes, with the bonus of complete transcriptome information and widespread availability of historical data. Existing tools have been limited in their ability to accurately and comprehensively call HLA genes from RNA-seq data. RESULTS: We created HLAProfiler ( https://github.com/ExpressionAnalysis/HLAProfiler ), a k-mer profile-based method for HLA calling in RNA-seq data which can identify rare and common HLA alleles with > 99% accuracy at two-field precision in both biological and simulated data. For 68% of novel alleles not present in the reference database, HLAProfiler can correctly identify the two-field precision or exact coding sequence, a significant advance over existing algorithms. CONCLUSIONS: HLAProfiler allows for accurate HLA calls in RNA-seq data, reliably expanding the utility of these data in HLA-related research and enabling advances across a broad range of disciplines. Additionally, by using the observed data to identify potential novel alleles and update partial alleles, HLAProfiler will facilitate further improvements to the existing database of reference HLA alleles. HLAProfiler is available at https://expressionanalysis.github.io/HLAProfiler/

    Applicability of Precision Medicine Approaches to Managing Hypertension in Rural Populations

    Get PDF
    As part of the Heart Healthy Lenoir Project, we developed a practice level intervention to improve blood pressure control. The goal of this study was: (i) to determine if single nucleotide polymorphisms (SNPs) that associate with blood pressure variation, identified in large studies, are applicable to blood pressure control in subjects from a rural population; (ii) to measure the association of these SNPs with subjects’ responsiveness to the hypertension intervention; and (iii) to identify other SNPs that may help understand patient-specific responses to an intervention. We used a combination of candidate SNPs and genome-wide analyses to test associations with either baseline systolic blood pressure (SBP) or change in systolic blood pressure one year after the intervention in two genetically defined ancestral groups: African Americans (AA) and Caucasian Americans (CAU). Of the 48 candidate SNPs, 13 SNPs associated with baseline SBP in our study; however, one candidate SNP, rs592582, also associated with a change in SBP after one year. Using our study data, we identified 4 and 15 additional loci that associated with a change in SBP in the AA and CAU groups, respectively. Our analysis of gene-age interactions identified genotypes associated with SBP improvement within different age groups of our populations. Moreover, our integrative analysis identified AQP4-AS1 and PADI2 as genes whose expression levels may contribute to the pleiotropy of complex traits involved in cardiovascular health and blood pressure regulation in response to an intervention targeting hypertension. In conclusion, the identification of SNPs associated with the success of a hypertension treatment intervention suggests that genetic factors in combination with age may contribute to an individual’s success in lowering SBP. If these findings prove to be applicable to other populations, the use of this genetic variation in making patient-specific interventions may help providers with making decisions to improve patient outcomes. Further investigation is required to determine the role of this genetic variance with respect to the management of hypertension such that more precise treatment recommendations may be made in the future as part of personalized medicine

    Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery.

    Get PDF
    Electronic Health Record (EHR) systems typically define laboratory test results using the Laboratory Observation Identifier Names and Codes (LOINC) and can transmit them using Fast Healthcare Interoperability Resource (FHIR) standards. LOINC has not yet been semantically integrated with computational resources for phenotype analysis. Here, we provide a method for mapping LOINC-encoded laboratory test results transmitted in FHIR standards to Human Phenotype Ontology (HPO) terms. We annotated the medical implications of 2923 commonly used laboratory tests with HPO terms. Using these annotations, our software assesses laboratory test results and converts each result into an HPO term. We validated our approach with EHR data from 15,681 patients with respiratory complaints and identified known biomarkers for asthma. Finally, we provide a freely available SMART on FHIR application that can be used within EHR systems. Our approach allows readily available laboratory tests in EHR to be reused for deep phenotyping and exploits the hierarchical structure of HPO to integrate distinct tests that have comparable medical interpretations for association studies

    Model-based assessment of mammalian cell metabolic functionalities using omics data.

    Get PDF
    Omics experiments are ubiquitous in biological studies, leading to a deluge of data. However, it is still challenging to connect changes in these data to changes in cell functions because of complex interdependencies between genes, proteins, and metabolites. Here, we present a framework allowing researchers to infer how metabolic functions change on the basis of omics data. To enable this, we curated and standardized lists of metabolic tasks that mammalian cells can accomplish. Genome-scale metabolic networks were used to define gene sets associated with each metabolic task. We further developed a framework to overlay omics data on these sets and predict pathway usage for each metabolic task. We demonstrated how this approach can be used to quantify metabolic functions of diverse biological samples from the single cell to whole tissues and organs by using multiple transcriptomic datasets. To facilitate its adoption, we integrated the approach into GenePattern (www.genepattern.org-CellFie)

    Enabling clinical genomics by reducing false discovery in next-generation sequencing data

    Full text link
    Thesis (Ph.D.)--Boston University PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at [email protected]. Thank you.Next-generation sequencing technologies are ushering in the next generation of clinical diagnostics. However, even minute sequencing error rates can make for unwieldy numbers of false positives in single-genome variation analysis, potentially requiring prioritization and validation of hundreds of errors per patient. In order to interpret accurately the variation in an individual whole human genome, it is essential to fully characterize the quality of the data being interpreted. Here I present methods for improving the accuracy of next-generation sequencing variant calls, as well as assessing the specificity, sensitivity and thresholding of those calls. In particular, I present an algorithm for detecting heterozygous deletions that has clinical relevance to the most prevalent neuro-degenerative disease, neuronal ceroid lipofuscinosis (NCL). I describe a platform-independent method for choosing variant calling thresholds, and I present a toolkit for calibrating sequencing quality scores by applying this method to genome replicates(mkSProC). I illustrate the specificity and sensitivity of variables influencing phase confidence to enable targeted experimental phasing and also to quantify confidence in computationally finishing experimental phasing. I combine experimental phasing results with expression data to find allele-specifically expressed (ASE) genes, and describe a feature that I added to a web server of regulatory-motif binding sites (UniPROBE) that can be used for, among other things, finding motifs to potentially explain ASE. Applying the methods I describe to genomic sequence data, expression data and phase data will further our understanding of causal variation and reduce experimental costs through targeted validation
    corecore