67 research outputs found
Recommended from our members
Development of computational approaches for whole-genome sequence variation and deep phenotyping
The rare disease pulmonary arterial hypertension (PAH) results in high blood pressure in the lung caused by narrowing of lung arteries. Genes causative in PAH were discovered through family studies and very often harbour rare variants. However, the genetic cause in heritable (31%) and idiopathic (79%) PAH cases is not yet known but are speculated to be caused by rare variants. Advances in high-throughput sequencing (HTS) technologies made it possible to detect variants in 98% of the human genome. A drop in sequencing costs made it feasible to sequence 10,000 individuals including 1,250 subjects diagnosed with PAH and relatives as part of the NIHR Bioresource – Rare (BR-RD) disease study. This large cohort allows the genome-wide identification of rare variants to discover novel causative genes associated with PAH in a case-control study to advance our understanding of the underlying aetiology.
In the first part of my thesis, I establish a phenotype capture system that allows research nurses to record clinical measurements and other patient related information of PAH patients recruited to the NIHR BR-RD study. The implemented extensions provide a programmatic data transfer and an automated data release pipeline for analysis ready data.
The second part is dedicated to the discovery of novel disease genes in PAH. I focus on one well characterised PAH disease gene to establish variant filter strategies to enrich for rare disease causing variants. I apply these filter strategies to all known PAH disease genes and describe the phenotypic differences based on clinically relevant values. Genome-wide results from different filter strategies are tested for association with PAH. I describe the findings of the rare variant association tests and provide a detailed interrogation of two novel disease genes.
The last part describes the data characteristics of variant information, available non SQL (NoSQL) implementations and evaluates the suitability and scalability of distributed compute frameworks to store and analyse population scale variation data. Based on the evaluation, I implement a variant analysis platform that incrementally merges samples, annotates variants and enables the analysis of 10,000 individuals in minutes. An incremental design for variant merging and annotation has not been described before. Using the framework, I develop a quality score to reduce technical variation and other biases. The result from the rare variant association test is compared with traditional methods
ProteinArchitect: Protein Evolution above the Sequence Level
While many authors have discussed models and tools for studying protein evolution at the sequence level, molecular function is usually mediated by complex, higher order features such as independently folding domains and linear motifs that are based on or embedded in a particular arrangment of features such as secondary structure elements, transmembrane domains and regions with intrinsic disorder. This 'protein architecture' can, in its most simplistic representation, be visualized as domain organization cartoons that can be used to compare proteins in terms of the order of their mostly globular domains.Here, we describe a visual approach and a webserver for protein comparison that extend the domain organization cartoon concept. By developing an information-rich, compact visualization of different protein features above the sequence level, potentially related proteins can be compared at the level of propensities for secondary structure, transmembrane domains and intrinsic disorder, in addition to PFAM domains. A public Web server is available at www.proteinarchitect.net, while the code is provided at protarchitect.sourceforge.net.Due to recent advances in sequencing technologies we are now flooded with millions of predicted proteins that await comparative analysis. In many cases, mature tools focused on revealing hits with considerable global or local similarity to well-characterized proteins will not be able to lead us to testable hypotheses about a protein's function, or the function of a particular region. The visual comparison of different types of protein features with ProteinArchitect will be useful when assessing the relevance of similarity search hits, to discover subgroups in protein families and superfamilies, and to understand protein regions with conserved features outside globular regions. Therefore, this approach is likely to help researchers to develop testable hypotheses about a protein's function even if is somewhat distant from the more characterized proteins, by facilitating the discovery of features that are conserved above the sequence level for comparison and further experimental investigation
Curation and expansion of Human Phenotype Ontology for defined groups of inborn errors of immunity
BACKGROUND
Accurate, detailed, and standardized phenotypic descriptions are essential to support diagnostic interpretation of genetic variants and to discover new diseases. The Human Phenotype Ontology (HPO), extensively used in rare disease research, provides a rich collection of vocabulary with standardized phenotypic descriptions in a hierarchical structure. However, to date, the use of HPO has not yet been widely implemented in the field of inborn errors of immunity (IEIs), mainly due to a lack of comprehensive IEI-related terms.
OBJECTIVES
We sought to systematically review available terms in HPO for the depiction of IEIs, to expand HPO, yielding more comprehensive sets of terms, and to reannotate IEIs with HPO terms to provide accurate, standardized phenotypic descriptions.
METHODS
We initiated a collaboration involving expert clinicians, geneticists, researchers working on IEIs, and bioinformaticians. Multiple branches of the HPO tree were restructured and extended on the basis of expert review. Our ontology-guided machine learning coupled with a 2-tier expert review was applied to reannotate defined subgroups of IEIs.
RESULTS
We revised and expanded 4 main branches of the HPO tree. Here, we reannotated 73 diseases from 4 International Union of Immunological Societies-defined IEI disease subgroups with HPO terms. We achieved a 4.7-fold increase in the number of phenotypic terms per disease. Given the new HPO annotations, we demonstrated improved ability to computationally match selected IEI cases to their known diagnosis, and improved phenotype-driven disease classification.
CONCLUSIONS
Our targeted expansion and reannotation presents enhanced precision of disease annotation, will enable superior HPO-based IEI characterization, and hence benefit both IEI diagnostic and research activities
Thrombomodulin in patients with mild to moderate bleeding tendency.
INTRODUCTION: A massive increase of soluble thrombomodulin (sTM) due to variants in the thrombomodulin gene (THBD) has recently been identified as a novel bleeding disorder. AIM: To investigate sTM levels and underlying genetic variants as a cause for haemostatic impairment and bleeding in a large number of patients with a mild to moderate bleeding disorder (MBD), including patients with bleeding of unknown cause (BUC). PATIENTS AND METHODS: In 507 MBD patients, sTM levels, thrombin generation and plasma clot formation were measured and compared to 90 age- and sex-matched healthy controls. In patients, genetic analysis of the THBD gene was performed. RESULTS: No difference in sTM levels between patients and controls was found overall (median ([IQR] 5.0 [3.8-6.3] vs. 5.1 [3.7-6.4] ng/ml, p = .762), and according to specific diagnoses of MBD or BUC, and high sTM levels (≥95th percentile of healthy controls) were not overrepresented in patients. Soluble TM levels had no impact on bleeding severity or global tests of haemostasis, including thrombin generation or plasma clot formation. In the THBD gene, no known pathogenic or novel disease-causing variants affecting sTM plasma levels were identified in our patient cohort. CONCLUSION: TM-associated coagulopathy appears to be rare, as it was not identified in our large cohort of patients with MBD. Soluble TM did not arise as a risk factor for bleeding or altered haemostasis in these patients
SvAnna: efficient and accurate pathogenicity prediction of coding and regulatory structural variants in long-read genome sequencing.
Structural variants (SVs) are implicated in the etiology of Mendelian diseases but have been systematically underascertained owing to sequencing technology limitations. Long-read sequencing enables comprehensive detection of SVs, but approaches for prioritization of candidate SVs are needed. Structural variant Annotation and analysis (SvAnna) assesses all classes of SVs and their intersection with transcripts and regulatory sequences, relating predicted effects on gene function with clinical phenotype data. SvAnna places 87% of deleterious SVs in the top ten ranks. The interpretable prioritizations offered by SvAnna will facilitate the widespread adoption of long-read sequencing in diagnostic genomics. SvAnna is available at https://github.com/TheJacksonLaboratory/SvAnn a
Recommended from our members
HGVA: the Human Genome Variation Archive.
High-profile genomic variation projects like the 1000 Genomes project or the Exome Aggregation Consortium, are generating a wealth of human genomic variation knowledge which can be used as an essential reference for identifying disease-causing genotypes. However, accessing these data, contrasting the various studies and integrating those data in downstream analyses remains cumbersome. The Human Genome Variation Archive (HGVA) tackles these challenges and facilitates access to genomic data for key reference projects in a clean, fast and integrated fashion. HGVA provides an efficient and intuitive web-interface for easy data mining, a comprehensive RESTful API and client libraries in Python, Java and JavaScript for fast programmatic access to its knowledge base. HGVA calculates population frequencies for these projects and enriches their data with variant annotation provided by CellBase, a rich and fast annotation solution. HGVA serves as a proof-of-concept of the genome analysis developments being carried out by the University of Cambridge together with UK's 100 000 genomes project and the National Institute for Health Research BioResource Rare-Diseases, in particular, deploying open-source for Computational Biology (OpenCB) software platform for storing and analyzing massive genomic datasets
Curation and expansion of the Human Phenotype Ontology for systemic autoinflammatory diseases improves phenotype-driven disease-matching
INTRODUCTION: Accurate and standardized phenotypic descriptions are essential in diagnosing rare diseases and discovering new diseases, and the Human Phenotype Ontology (HPO) system was developed to provide a rich collection of hierarchical phenotypic descriptions. However, although the HPO terms for inborn errors of immunity have been improved and curated, it has not been investigated whether this curation improves the diagnosis of systemic autoinflammatory disease (SAID) patients. Here, we aimed to study if improved HPO annotation for SAIDs enhanced SAID identification and to demonstrate the potential of phenotype-driven genome diagnostics using curated HPO terms for SAIDs. METHODS: We collected HPO terms from 98 genetically confirmed SAID patients across eight different European SAID expertise centers and used the LIRICAL (Likelihood Ratio Interpretation of Clinical Abnormalities) computational algorithm to estimate the effect of HPO curation on the prioritization of the correct SAID for each patient. RESULTS: Our results show that the percentage of correct diagnoses increased from 66% to 86% and that the number of diagnoses with the highest ranking increased from 38 to 45. In a further pilot study, curation also improved HPO-based whole-exome sequencing (WES) analysis, diagnosing 10/12 patients before and 12/12 after curation. In addition, the average number of candidate diseases that needed to be interpreted decreased from 35 to 2. DISCUSSION: This study demonstrates that curation of HPO terms can increase identification of the correct diagnosis, emphasizing the high potential of HPO-based genome diagnostics for SAIDs
Plasma Metabolomics Implicates Modified Transfer RNAs and Altered Bioenergetics in the Outcomes of Pulmonary Arterial Hypertension.
BACKGROUND: Pulmonary arterial hypertension (PAH) is a heterogeneous disorder with high mortality. METHODS: We conducted a comprehensive study of plasma metabolites using ultraperformance liquid chromatography mass spectrometry to identify patients at high risk of early death, to identify patients who respond well to treatment, and to provide novel molecular insights into disease pathogenesis. RESULTS: Fifty-three circulating metabolites distinguished well-phenotyped patients with idiopathic or heritable PAH (n=365) from healthy control subjects (n=121) after correction for multiple testing (P<7.3e-5) and confounding factors, including drug therapy, and renal and hepatic impairment. A subset of 20 of 53 metabolites also discriminated patients with PAH from disease control subjects (symptomatic patients without pulmonary hypertension, n=139). Sixty-two metabolites were prognostic in PAH, with 36 of 62 independent of established prognostic markers. Increased levels of tRNA-specific modified nucleosides (N2,N2-dimethylguanosine, N1-methylinosine), tricarboxylic acid cycle intermediates (malate, fumarate), glutamate, fatty acid acylcarnitines, tryptophan, and polyamine metabolites and decreased levels of steroids, sphingomyelins, and phosphatidylcholines distinguished patients from control subjects. The largest differences correlated with increased risk of death, and correction of several metabolites over time was associated with a better outcome. Patients who responded to calcium channel blocker therapy had metabolic profiles similar to those of healthy control subjects. CONCLUSIONS: Metabolic profiles in PAH are strongly related to survival and should be considered part of the deep phenotypic characterization of this disease. Our results support the investigation of targeted therapeutic strategies that seek to address the alterations in translational regulation and energy metabolism that characterize these patients
Identification of germline monoallelic mutations in IKZF2 in patients with immune dysregulation
Helios, encoded by IKZF2, is a member of the Ikaros family of transcription factors with pivotal roles in T-follicular helper, NK- and T-regulatory cell physiology. Somatic IKZF2 mutations are frequently found in lymphoid malignancies. Although germline mutations in IKZF1 and IKZF3 encoding Ikaros and Aiolos have recently been identified in patients with phenotypically similar immunodeficiency syndromes, the effect of germline mutations in IKZF2 on human hematopoiesis and immunity remains enigmatic. We identified germline IKZF2 mutations (one nonsense (p.R291X)- and 4 distinct missense variants) in six patients with systemic lupus erythematosus, immune thrombocytopenia or EBV-associated hemophagocytic lymphohistiocytosis. Patients exhibited hypogammaglobulinemia, decreased number of T-follicular helper and NK cells. Single-cell RNA sequencing of PBMCs from the patient carrying the R291X variant revealed upregulation of proinflammatory genes associated with T-cell receptor activation and T-cell exhaustion. Functional assays revealed the inability of HeliosR291X to homodimerize and bind target DNA as dimers. Moreover, proteomic analysis by proximity-dependent Biotin Identification revealed aberrant interaction of 3/5 Helios mutants with core components of the NuRD complex conveying HELIOS-mediated epigenetic and transcriptional dysregulation.Peer reviewe
- …