5 research outputs found

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Developmental and epileptic encephalopathy in two siblings with a novel, homozygous missense variant in SCN1B.

    No full text
    Developmental and epileptic encephalopathies are genetic disorders in which both the developmental disability and the frequent epileptic activity are the effect of a specific gene variant. While heterozygous variants in SCN1B have been described in families with generalized epilepsy with febrile seizures plus, Type 1, only three cases of homozygous, missense variants in SCN1B have been reported in association with autosomal recessive inheritance of a severe developmental and epileptic encephalopathy. We present two siblings who are homozygous for a novel, missense variant in SCN1B, c.265C>T, predicting p.Arg89Cys. The proband is an 11-year-old female with infantile-onset, fever-induced, intractable generalized tonic-clonic seizures, myoclonic seizures, and developmental slowing and autism spectrum disorder occurring later in the course of the disease. Her 4-year-old brother had a similar epilepsy phenotype, but still displays normal development. This variant has not been previously reported in the homozygous state in control databases. The variant was predicted to be damaging and occurred in the vicinity of other epileptic encephalopathy-associated missense variants that are biallelic and located in the extracellular immunoglobulin loop domain of the protein, which mediates interaction of the beta-1 subunit with cellular adhesion molecules. Our report is the first set of siblings with homozygosity for the p.Arg89Cys variant in SCN1B and further implicates biallelic mutations in this gene as a cause of epileptic encephalopathy mimicking Dravet syndrome. Interestingly, the phenotype we observed was milder compared to that previously described in patients with recessive SCN1B mutations

    Ribosomal Protein L5 and L11 Mutations Are Associated with Cleft Palate and Abnormal Thumbs in Diamond-Blackfan Anemia Patients

    Get PDF
    Diamond-Blackfan anemia (DBA), a congenital bone-marrow-failure syndrome, is characterized by red blood cell aplasia, macrocytic anemia, clinical heterogeneity, and increased risk of malignancy. Although anemia is the most prominent feature of DBA, the disease is also characterized by growth retardation and congenital anomalies that are present in ∼30%–50% of patients. The disease has been associated with mutations in four ribosomal protein (RP) genes, RPS19, RPS24, RPS17, and RPL35A, in about 30% of patients. However, the genetic basis of the remaining 70% of cases is still unknown. Here, we report the second known mutation in RPS17 and probable pathogenic mutations in three more RP genes, RPL5, RPL11, and RPS7. In addition, we identified rare variants of unknown significance in three other genes, RPL36, RPS15, and RPS27A. Remarkably, careful review of the clinical data showed that mutations in RPL5 are associated with multiple physical abnormalities, including craniofacial, thumb, and heart anomalies, whereas isolated thumb malformations are predominantly present in patients carrying mutations in RPL11. We also demonstrate that mutations of RPL5, RPL11, or RPS7 in DBA cells is associated with diverse defects in the maturation of ribosomal RNAs in the large or the small ribosomal subunit production pathway, expanding the repertoire of ribosomal RNA processing defects associated with DBA

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    No full text

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    No full text
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical science. © The Author(s) 2019. Published by Oxford University Press
    corecore