21 research outputs found

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∌99% of the euchromatic genome and is accurate to an error rate of ∌1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    The Molecular Origin and Taxonomy of Mucinous Ovarian Carcinoma

    Get PDF
    Mucinous ovarian carcinoma (MOC) is a unique subtype of ovarian cancer with an uncertain etiology, including whether it genuinely arises at the ovary or is metastatic disease from other organs. In addition, the molecular drivers of invasive progression, high-grade and metastatic disease are poorly defined. We perform genetic analysis of MOC across all histological grades, including benign and borderline mucinous ovarian tumors, and compare these to tumors from other potential extra-ovarian sites of origin. Here we show that MOC is distinct from tumors from other sites and supports a progressive model of evolution from borderline precursors to high-grade invasive MOC. Key drivers of progression identified are TP53 mutation and copy number aberrations, including a notable amplicon on 9p13. High copy number aberration burden is associated with worse prognosis in MOC. Our data conclusively demonstrate that MOC arise from benign and borderline precursors at the ovary and are not extra-ovarian metastases

    Relationship between molecular pathogen detection and clinical disease in febrile children across Europe: a multicentre, prospective observational study

    Get PDF
    BackgroundThe PERFORM study aimed to understand causes of febrile childhood illness by comparing molecular pathogen detection with current clinical practice.MethodsFebrile children and controls were recruited on presentation to hospital in 9 European countries 2016-2020. Each child was assigned a standardized diagnostic category based on retrospective review of local clinical and microbiological data. Subsequently, centralised molecular tests (CMTs) for 19 respiratory and 27 blood pathogens were performed.FindingsOf 4611 febrile children, 643 (14%) were classified as definite bacterial infection (DB), 491 (11%) as definite viral infection (DV), and 3477 (75%) had uncertain aetiology. 1061 controls without infection were recruited. CMTs detected blood bacteria more frequently in DB than DV cases for N. meningitidis (OR: 3.37, 95% CI: 1.92-5.99), S. pneumoniae (OR: 3.89, 95% CI: 2.07-7.59), Group A streptococcus (OR 2.73, 95% CI 1.13-6.09) and E. coli (OR 2.7, 95% CI 1.02-6.71). Respiratory viruses were more common in febrile children than controls, but only influenza A (OR 0.24, 95% CI 0.11-0.46), influenza B (OR 0.12, 95% CI 0.02-0.37) and RSV (OR 0.16, 95% CI: 0.06-0.36) were less common in DB than DV cases. Of 16 blood viruses, enterovirus (OR 0.43, 95% CI 0.23-0.72) and EBV (OR 0.71, 95% CI 0.56-0.90) were detected less often in DB than DV cases. Combined local diagnostics and CMTs respectively detected blood viruses and respiratory viruses in 360 (56%) and 161 (25%) of DB cases, and virus detection ruled-out bacterial infection poorly, with predictive values of 0.64 and 0.68 respectively.InterpretationMost febrile children cannot be conclusively defined as having bacterial or viral infection when molecular tests supplement conventional approaches. Viruses are detected in most patients with bacterial infections, and the clinical value of individual pathogen detection in determining treatment is low. New approaches are needed to help determine which febrile children require antibiotics.FundingEU Horizon 2020 grant 668303

    Impact of infection on proteome-wide glycosylation revealed by distinct signatures for bacterial and viral pathogens

    Get PDF
    Mechanisms of infection and pathogenesis have predominantly been studied based on differential gene or protein expression. Less is known about posttranslational modifications, which are essential for protein functional diversity. We applied an innovative glycoproteomics method to study the systemic proteome-wide glycosylation in response to infection. The protein site-specific glycosylation was characterized in plasma derived from well-defined controls and patients. We found 3862 unique features, of which we identified 463 distinct intact glycopeptides, that could be mapped to more than 30 different proteins. Statistical analyses were used to derive a glycopeptide signature that enabled significant differentiation between patients with a bacterial or viral infection. Furthermore, supported by a machine learning algorithm, we demonstrated the ability to identify the causative pathogens based on the distinctive host blood plasma glycopeptide signatures. These results illustrate that glycoproteomics holds enormous potential as an innovative approach to improve the interpretation of relevant biological changes in response to infection

    Genomic investigations of unexplained acute hepatitis in children

    Get PDF
    Since its first identification in Scotland, over 1,000 cases of unexplained paediatric hepatitis in children have been reported worldwide, including 278 cases in the UK1. Here we report an investigation of 38 cases, 66 age-matched immunocompetent controls and 21 immunocompromised comparator participants, using a combination of genomic, transcriptomic, proteomic and immunohistochemical methods. We detected high levels of adeno-associated virus 2 (AAV2) DNA in the liver, blood, plasma or stool from 27 of 28 cases. We found low levels of adenovirus (HAdV) and human herpesvirus 6B (HHV-6B) in 23 of 31 and 16 of 23, respectively, of the cases tested. By contrast, AAV2 was infrequently detected and at low titre in the blood or the liver from control children with HAdV, even when profoundly immunosuppressed. AAV2, HAdV and HHV-6 phylogeny excluded the emergence of novel strains in cases. Histological analyses of explanted livers showed enrichment for T cells and B lineage cells. Proteomic comparison of liver tissue from cases and healthy controls identified increased expression of HLA class 2, immunoglobulin variable regions and complement proteins. HAdV and AAV2 proteins were not detected in the livers. Instead, we identified AAV2 DNA complexes reflecting both HAdV-mediated and HHV-6B-mediated replication. We hypothesize that high levels of abnormal AAV2 replication products aided by HAdV and, in severe cases, HHV-6B may have triggered immune-mediated hepatic disease in genetically and immunologically predisposed children
    corecore