15 research outputs found

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    The epigenetic modifier Fam208a is required to maintain epiblast cell fitness

    No full text
    Abstract Gastrulation initiates with the formation of the primitive streak, during which, cells of the epiblast delaminate to form the mesoderm and definitive endoderm. At this stage, the pluripotent cell population of the epiblast undergoes very rapid proliferation and extensive epigenetic programming. Here we show that Fam208a, a new epigenetic modifier, is essential for early post-implantation development. We show that Fam208a mutation leads to impaired primitive streak elongation and delayed epithelial-to-mesenchymal transition. Fam208a mutant epiblasts had increased expression of p53 pathway genes as well as several pluripotency-associated long non-coding RNAs. Fam208a mutants exhibited an increase in p53-driven apoptosis and complete removal of p53 could partially rescue their gastrulation block. This data demonstrates a new in vivo function of Fam208a in maintaining epiblast fitness, establishing it as an important factor at the onset of gastrulation when cells are exiting pluripotency

    Data from: New environmental metabarcodes for analysing soil DNA: potential for studying past and present ecosystems

    No full text
    Metabarcoding approaches use total and typically degraded DNA from environmental samples to analyse biotic assemblages and can potentially be carried out for any kinds of organisms in an ecosystem. These analyses rely on specific markers, here called metabarcodes, which should be optimized for taxonomic resolution, minimal bias in amplification of the target organism group and short sequence length. Using bioinformatic tools, we developed metabarcodes for several groups of organisms: fungi, bryophytes, enchytraeids, beetles and birds. The ability of these metabarcodes to amplify the target groups was systematically evaluated by (1) in silico PCRs using all standard sequences in the EMBL public database as templates, (2) in vitro PCRs of DNA extracts from surface soil samples from a site in Varanger, northern Norway, and (3) in vitro PCRs of DNA extracts from permanently frozen sediment samples of late-Pleistocene age (~ 16 000–50 000 yr BP) from two Siberian sites, Duvanny Yar and Main River. Comparison of the results from the in silico PCR with those obtained in vitro showed that the in silico approach offered a reliable estimate of the suitability of a marker. All target groups were detected in the environmental DNA, but we found large variation in the level of detection among the groups and between modern and ancient samples. Success rates for the Pleistocene samples were highest for fungal DNA, whereas bryophyte, beetle and bird sequences could also be retrieved, but to a much lesser degree. The metabarcoding approach has considerable potential for biodiversity screening of modern samples and also as a paleoecological tool

    Data from: New environmental metabarcodes for analysing soil DNA: potential for studying past and present ecosystems

    No full text
    Metabarcoding approaches use total and typically degraded DNA from environmental samples to analyse biotic assemblages and can potentially be carried out for any kinds of organisms in an ecosystem. These analyses rely on specific markers, here called metabarcodes, which should be optimized for taxonomic resolution, minimal bias in amplification of the target organism group and short sequence length. Using bioinformatic tools, we developed metabarcodes for several groups of organisms: fungi, bryophytes, enchytraeids, beetles and birds. The ability of these metabarcodes to amplify the target groups was systematically evaluated by (1) in silico PCRs using all standard sequences in the EMBL public database as templates, (2) in vitro PCRs of DNA extracts from surface soil samples from a site in Varanger, northern Norway, and (3) in vitro PCRs of DNA extracts from permanently frozen sediment samples of late-Pleistocene age (~ 16 000&ndash;50 000 yr BP) from two Siberian sites, Duvanny Yar and Main River. Comparison of the results from the in silico PCR with those obtained in vitro showed that the in silico approach offered a reliable estimate of the suitability of a marker. All target groups were detected in the environmental DNA, but we found large variation in the level of detection among the groups and between modern and ancient samples. Success rates for the Pleistocene samples were highest for fungal DNA, whereas bryophyte, beetle and bird sequences could also be retrieved, but to a much lesser degree. The metabarcoding approach has considerable potential for biodiversity screening of modern samples and also as a paleoecological tool.,Epp_etal_metabarcodes_dryadThis file contains all consensus sequences retrieved from clones of environmental samples and referred to in Table S6 of the Supplementary material.,</span
    corecore