147 research outputs found
Phenotype Similarity Regression for Identifying the Genetic Determinants of Rare Diseases.
Rare genetic disorders, which can now be studied systematically with affordable genome sequencing, are often caused by high-penetrance rare variants. Such disorders are often heterogeneous and characterized by abnormalities spanning multiple organ systems ascertained with variable clinical precision. Existing methods for identifying genes with variants responsible for rare diseases summarize phenotypes with unstructured binary or quantitative variables. The Human Phenotype Ontology (HPO) allows composite phenotypes to be represented systematically but association methods accounting for the ontological relationship between HPO terms do not exist. We present a Bayesian method to model the association between an HPO-coded patient phenotype and genotype. Our method estimates the probability of an association together with an HPO-coded phenotype characteristic of the disease. We thus formalize a clinical approach to phenotyping that is lacking in standard regression techniques for rare disease research. We demonstrate the power of our method by uncovering a number of true associations in a large collection of genome-sequenced and HPO-coded cases with rare diseases.This work was supported by NIHR award RG65966 (D.G. and E.T.) and the Medical Research Council programme grant MC UP 0801/1 (D.G. and S.R.). The NIHR BioResource – Rare Diseases projects were approved by Research Ethics Committees in the UK and appropriate national ethics authorities in non-UK enrolment centres (see Supplemental Note). We are grateful to Dr William J Astle for advice on the statistical model and for providing comments on the manuscript. We are particularly thankful to the BPD project members for granting access to detailed HPO terms of patientsThis is the final version of the article. It first appeared from Elsevier via http://dx.doi.org/10.1016/j.ajhg.2016.01.00
A comparative study of RNA-seq analysis strategies.
Three principal approaches have been proposed for inferring the set of transcripts expressed in RNA samples using RNA-seq. The simplest approach uses curated annotations, which assumes the transcripts in a sample are a subset of the transcripts listed in a curated database. A more ambitious method involves aligning reads to a reference genome and using the alignments to infer the transcript structures, possibly with the aid of a curated transcript database. The most challenging approach is to assemble reads into putative transcripts de novo without the aid of reference data. We have systematically assessed the properties of these three approaches through a simulation study. We have found that the sensitivity of computational transcript set estimation is severely limited. Computational approaches (both genome-guided and de novo assembly) produce a large number of artefacts, which are assigned large expression estimates and absorb a substantial proportion of the signal when performing expression analysis. The approach using curated annotations shows good expression correlation even when the annotations are incomplete. Furthermore, any incorrect transcripts present in a curated set do not absorb much signal, so it is preferable to have a curation set with high sensitivity than high precision. Software to simulate transcript sets, expression values and sequence reads under a wider range of parameter values and to compare sensitivity, precision and signal-to-noise ratios of different methods is freely available online (https://github.com/boboppie/RSSS) and can be expanded by interested parties to include methods other than the exemplars presented in this article.This work was supported by the Wellcome Trust (WT097679); the Cambridge Biomedical Research Centre; Cancer Research UK (C14303/A10825) and the Medical Research Council (G1002319).This is the final version of the article. It was first available from Oxford University Press via http://dx.doi.org/10.1093/bib/bbv00
Hybrid mice reveal parent-of-origin and cis- and trans- regulatory effects in the retina
A fundamental challenge in genomics is to map DNA sequence variants onto changes in gene expression. Gene expression is regulated by cis-regulatory elements (CREs, i.e., enhancers, promoters, and silencers) and the trans factors (e.g., transcription factors) that act upon them. A powerful approach to dissecting cis and trans effects is to compare F1 hybrids with F0 homozygotes. Using this approach and taking advantage of the high frequency of polymorphisms in wild-derived inbred Cast/EiJ mice relative to the reference strain C57BL/6J, we conducted allele-specific mRNA-seq analysis in the adult mouse retina, a disease-relevant neural tissue. We found that cis effects account for the bulk of gene regulatory divergence in the retina. Many CREs contained functional (i.e., activating or silencing) cis-regulatory variants mapping onto altered expression of genes, including genes associated with retinal disease. By comparing our retinal data with previously published liver data, we found that most of the cis effects identified were tissue-specific. Lastly, by comparing reciprocal F1 hybrids, we identified evidence of imprinting in the retina for the first time. Our study provides a framework and resource for mapping cis-regulatory variants onto changes in gene expression, and underscores the importance of studying cis-regulatory variants in the context of retinal disease
Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads
We present a novel pipeline and methodology for simultaneously estimating isoform expression and allelic imbalance in diploid organisms using RNA-seq data. We achieve this by modeling the expression of haplotype-specific isoforms. If unknown, the two parental isoform sequences can be individually reconstructed. A new statistical method, MMSEQ, deconvolves the mapping of reads to multiple transcripts (isoforms or haplotype-specific isoforms). Our software can take into account non-uniform read generation and works with paired-end reads
Extensive co-operation between the Epstein-Barr virus EBNA3 proteins in the manipulation of host gene expression and epigenetic chromatin modification.
Epstein-Barr virus (EBV) is able to drive the transformation of B-cells, resulting in the generation of lymphoblastoid cell lines (LCLs) in vitro. EBV nuclear proteins EBNA3A and EBNA3C are necessary for efficient transformation, while EBNA3B is dispensable. We describe a transcriptome analysis of BL31 cells infected with a series of EBNA3-knockout EBVs, including one deleted for all three EBNA3 genes. Using Affymetrix Exon 1.0 ST microarrays analysed with the MMBGX algorithm, we have identified over 1000 genes whose regulation by EBV requires one of the EBNA3s. Remarkably, a third of the genes identified require more than one EBNA3 for their regulation, predominantly EBNA3C co-operating with either EBNA3B, EBNA3A or both. The microarray was validated by real-time PCR, while ChIP analysis of a selection of co-operatively repressed promoters indicates a role for polycomb group complexes. Targets include genes involved in apoptosis, cell migration and B-cell differentiation, and show a highly significant but subtle alteration in genes involved in mitosis. In order to assess the relevance of the BL31 system to LCLs, we analysed the transcriptome of a set of EBNA3B knockout (3BKO) LCLs. Around a third of the genes whose expression level in LCLs was altered in the absence of EBNA3B were also altered in 3BKO-BL31 cell lines.Among these are TERT and TCL1A, implying that EBV-induced changes in the expression of these genes are not required for B-cell transformation. We also identify 26 genes that require both EBNA3A and EBNA3B for their regulation in LCLs. Together, this shows the complexity of the interaction between EBV and its host, whereby multiple EBNA3 proteins co-operate to modulate the behaviour of the host cell
MMBGX: a method for estimating expression at the isoform level and detecting differential splicing using whole-transcript Affymetrix arrays
Affymetrix has recently developed whole-transcript GeneChips—‘Gene’ and ‘Exon’ arrays—which interrogate exons along the length of each gene. Although each probe on these arrays is intended to hybridize perfectly to only one transcriptional target, many probes match multiple transcripts located in different parts of the genome or alternative isoforms of the same gene. Existing statistical methods for estimating expression do not take this into account and are thus prone to producing inflated estimates. We propose a method, Multi-Mapping Bayesian Gene eXpression (MMBGX), which disaggregates the signal at ‘multi-match’ probes. When applied to Gene arrays, MMBGX removes the upward bias of gene-level expression estimates. When applied to Exon arrays, it can further disaggregate the signal between alternative transcripts of the same gene, providing expression estimates of individual splice variants. We demonstrate the performance of MMBGX on simulated data and a tissue mixture data set. We then show that MMBGX can estimate the expression of alternative isoforms within one experimental condition, confirming our results by RT-PCR. Finally, we show that our method for detecting differential splicing has a lower error rate than standard exon-level approaches on a previously validated colon cancer data set
Inherited platelet disorders: toward DNA-based diagnosis.
Variations in platelet number, volume, and function are largely genetically controlled, and many loci associated with platelet traits have been identified by genome-wide association studies (GWASs).(1) The genome also contains a large number of rare variants, of which a tiny fraction underlies the inherited diseases of humans. Research over the last 3 decades has led to the discovery of 51 genes harboring variants responsible for inherited platelet disorders (IPDs). However, the majority of patients with an IPD still do not receive a molecular diagnosis. Alongside the scientific interest, molecular or genetic diagnosis is important for patients. There is increasing recognition that a number of IPDs are associated with severe pathologies, including an increased risk of malignancy, and a definitive diagnosis can inform prognosis and care. In this review, we give an overview of these disorders grouped according to their effect on platelet biology and their clinical characteristics. We also discuss the challenge of identifying candidate genes and causal variants therein, how IPDs have been historically diagnosed, and how this is changing with the introduction of high-throughput sequencing. Finally, we describe how integration of large genomic, epigenomic, and phenotypic datasets, including whole genome sequencing data, GWASs, epigenomic profiling, protein-protein interaction networks, and standardized clinical phenotype coding, will drive the discovery of novel mechanisms of disease in the near future to improve patient diagnosis and management.The authors thank the members of the BRIDGE-bleeding, thrombotic, and platelet disorders (BPD) and ThromboGenomics Consortia for their contributions. The BRIDGE-BPD and ThromboGenomics studies, including the enrollment of cases, sequencing, and analysis, received support from the National Institute for Health Research (NIHR) BioResource–Rare Diseases. The NIHR BioResource is funded by the NIHR.
C.L. is the recipient of a Clinical Research Training Fellowship award from the MRC and M.A.L. and C.L. are also supported by the Imperial College London NIHR Biomedical Research Centre. E.T. is supported by the NIHR BioResource and research in the Ouwehand laboratory receives support from the British Heart Foundation, European Commission, MRC, NHS Blood and Transplant, NIHR and Wellcome Trust.This is the author accepted manuscript. The final version is available from American Society of Hematology via http://dx.doi.org/10.1182/blood-2016-03-378588
A new pedigree with thrombomodulin-associated coagulopathy in which delayed fibrinolysis is partially attenuated by co-inherited TAFI deficiency
ACKNOWLEDGEMENTS We thank NIHR BioResource volunteers for their participation, and gratefully acknowledge NIHR BioResource centres, NHS Trusts and staff for their contribution. We thank the National Institute for Health Research and NHS Blood and Transplant. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. S.K.W. was supported during this work by the Medical Research Council (MR/K023489/1) and is now funded through an NIHR-funded Academic Clinical Lectureship. K.D. is supported as a HSST trainee by NHS Health Education England. N.J.M. and C.S.W. are supported by the British Heart Foundation (PG/15/82/31721). J.C.M. is a fellow of the Research Foundation Flanders (FWO Vlaanderen; 1137717N). A.D.M. is supported by the NIHR Biomedical Research Centre at the University Hospitals Bristol National Health Service Foundation Trust and the University of Bristol. We thank Prof Paul Declerck and Prof Ann Gils, University Leuven, Belgium for the kind gift of the MA-T12D11 antibody. We acknowledge technical assistance from Dorien Leenaerts, University of Antwerp, Belgium and Michela Donnarumma, University of Aberdeen, UK.Peer reviewedPublisher PD
- …