9 research outputs found
Automated PDF highlighting to support faster curation of literature for Parkinson's and Alzheimer's disease
Neurodegenerative disorders such as Parkinson’s and Alzheimer’s disease are devastating and costly illnesses, a source of major global burden. In order to provide successful interventions for patients and reduce costs, both causes and pathological processes need to be understood. The ApiNATOMY project aims to contribute to our understanding of neurodegenerative disorders by manually curating and abstracting data from the vast body of literature amassed on these illnesses. As curation is labour-intensive, we aimed to speed up the process by automatically highlighting those parts of the PDF document of primary importance to the curator. Using techniques similar to those of summarisation, we developed an algorithm that relies on linguistic, semantic and spatial features. Employing this algorithm on a test set manually corrected for tool imprecision, we achieved a macro F1-measure of 0.51, which is an increase of 132% compared to the best bag-of-words baseline model. A user based evaluation was also conducted to assess the usefulness of the methodology on 40 unseen publications, which reveals that in 85% of cases all highlighted sentences are relevant to the curation task and in about 65% of the cases, the highlights are sufficient to support the knowledge curation task without needing to consult the full text. In conclusion, we believe that these are promising results for a step in automating the recognition of curation-relevant sentences. Refining our approach to pre-digest papers will lead to faster processing and cost reduction in the curation process
An integrated resource for genome-wide identification and analysis of human tissue-specific differentially methylated regions (tDMRs)
We report a novel resource (methylation profiles of DNA, or mPod) for human genome-wide tissue-specific DNA methylation profiles. mPod consists of three fully integrated parts, genome-wide DNA methylation reference profiles of 13 normal somatic tissues, placenta, sperm, and an immortalized cell line, a visualization tool that has been integrated with the Ensembl genome browser and a new algorithm for the analysis of immunoprecipitation-based DNA methylation profiles. We demonstrate the utility of our resource by identifying the first comprehensive genome-wide set of tissue-specific differentially methylated regions (tDMRs) that may play a role in cellular identity and the regulation of tissue-specific genome function. We also discuss the implications of our findings with respect to the regulatory potential of regions with varied CpG density, gene expression, transcription factor motifs, gene ontology, and correlation with other epigenetic marks such as histone modifications
Certain heterozygous variants in the kinase domain of the serine/threonine kinase NEK8 can cause an autosomal dominant form of polycystic kidney disease
Autosomal dominant polycystic kidney disease (ADPKD) resulting from pathogenic variants in PKD1 and PKD2 is the most common form of PKD, but other genetic causes tied to primary cilia function have been identified. Biallelic pathogenic variants in the serine/threonine kinase NEK8 cause a syndromic ciliopathy with extra-kidney manifestations. Here we identify NEK8 as a disease gene for ADPKD in 12 families. Clinical evaluation was combined with functional studies using fibroblasts and tubuloids from affected individuals. Nek8 knockout mouse kidney epithelial (IMCD3) cells transfected with wild type or variant NEK8 were further used to study ciliogenesis, ciliary trafficking, kinase function, and DNA damage responses. Twenty-one affected monoallelic individuals uniformly exhibited cystic kidney disease (mostly neonatal) without consistent extra-kidney manifestations. Recurrent de novo mutations of the NEK8 missense variant p.Arg45Trp, including mosaicism, were seen in ten families. Missense variants elsewhere within the kinase domain (p.Ile150Met and p.Lys157Gln) were also identified. Functional studies demonstrated normal localization of the NEK8 protein to the proximal cilium and no consistent cilia formation defects in patient-derived cells. NEK8-wild type protein and all variant forms of the protein expressed in Nek8 knockout IMCD3 cells were localized to cilia and supported ciliogenesis. However, Nek8 knockout IMCD3 cells expressing NEK8-p.Arg45Trp and NEK8-p.Lys157Gln showed significantly decreased polycystin-2 but normal ANKS6 localization in cilia. Moreover, p.Arg45Trp NEK8 exhibited reduced kinase activity in vitro. In patient derived tubuloids and IMCD3 cells expressing NEK8-p.Arg45Trp, DNA damage signaling was increased compared to healthy passage-matched controls. Thus, we propose a dominant-negative effect for specific heterozygous missense variants in the NEK8 kinase domain as a new cause of PKD.</p
Whole genome sequencing for the diagnosis of neurological repeat expansion disorders in the UK: a retrospective diagnostic accuracy and prospective clinical validation study
Background: repeat expansion disorders affect about 1 in 3000 individuals and are clinically heterogeneous diseases caused by expansions of short tandem DNA repeats. Genetic testing is often locus-specific, resulting in underdiagnosis of people who have atypical clinical presentations, especially in paediatric patients without a previous positive family history. Whole genome sequencing is increasingly used as a first-line test for other rare genetic disorders, and we aimed to assess its performance in the diagnosis of patients with neurological repeat expansion disorders. Methods: we retrospectively assessed the diagnostic accuracy of whole genome sequencing to detect the most common repeat expansion loci associated with neurological outcomes (AR, ATN1, ATXN1, ATXN2, ATXN3, ATXN7, C9orf72, CACNA1A, DMPK, FMR1, FXN, HTT, and TBP) using samples obtained within the National Health Service in England from patients who were suspected of having neurological disorders; previous PCR test results were used as the reference standard. The clinical accuracy of whole genome sequencing to detect repeat expansions was prospectively examined in previously genetically tested and undiagnosed patients recruited in 2013–17 to the 100 000 Genomes Project in the UK, who were suspected of having a genetic neurological disorder (familial or early-onset forms of ataxia, neuropathy, spastic paraplegia, dementia, motor neuron disease, parkinsonian movement disorders, intellectual disability, or neuromuscular disorders). If a repeat expansion call was made using whole genome sequencing, PCR was used to confirm the result. Findings: the diagnostic accuracy of whole genome sequencing to detect repeat expansions was evaluated against 793 PCR tests previously performed within the NHS from 404 patients. Whole genome sequencing correctly classified 215 of 221 expanded alleles and 1316 of 1321 non-expanded alleles, showing 97·3% sensitivity (95% CI 94·2–99·0) and 99·6% specificity (99·1–99·9) across the 13 disease-associated loci when compared with PCR test results. In samples from 11 631 patients in the 100 000 Genomes Project, whole genome sequencing identified 81 repeat expansions, which were also tested by PCR: 68 were confirmed as repeat expansions in the full pathogenic range, 11 were non-pathogenic intermediate expansions or premutations, and two were non-expanded repeats (16% false discovery rate). Interpretation: In our study, whole genome sequencing for the detection of repeat expansions showed high sensitivity and specificity, and it led to identification of neurological repeat expansion disorders in previously undiagnosed patients. These findings support implementation of whole genome sequencing in clinical laboratories for diagnosis of patients who have a neurological presentation consistent with a repeat expansion disorder. Funding: Medical Research Council, Department of Health and Social Care, National Health Service England, National Institute for Health Research, and Illumina.</p
CAGI, the critical assessment of genome interpretation, establishes progress and prospects for computational genetic variant interpretation methods
Background: The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. Results: Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. Conclusions: Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead