9 research outputs found

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Sex determination of baleen whale artefacts:Implications for ancient DNA use in zooarchaeology

    Get PDF
    Methods to determine the sex from tissue samples of mammals include the amplification of Y chromosome specific regions, which should only amplify from males, or amplification of homologous regions of the X and Y chromosome containing XY specific SNPs. A disadvantage of the first approach is that PCR failure can be misinterpreted as the identification of a female. The latter approach is proposed to identify PCR failure through non-amplification of the X homologue, which should be present in both sexes. This method is therefore potentially more suitable for molecular sexing of degraded DNA with a high probability of PCR failure, such as for example, ancient DNA samples. Here, we investigate the validity of this assumption regarding the use of XY homologue PCR assays for molecular sexing of ancient DNA. We tested a primer set targeting the ZFX/ZFY alleles using ancient DNA extracts from 100 to 4500 years old bowhead whale samples, and for comparison on dilution series from modern bowhead whales of known sex. DNA sequencing of PCR products obtained from the ancient material confirmed a higher proportion of successful PCR amplifications of the X homologue over the Y homologue. This potentially biased sex determination was further assessed by testing highly diluted DNA extracts of modern samples, for which a consistently higher success rate of PCR amplification and lower PCR cycle threshold was found for the X homologue from females than either homologue from males. This is most likely due to the higher copy number of the X homologue in females, although other yet unknown attributes of the protocol may also cause the observed bias. The current case study provides a valuable example of a potential pitfall in molecular sex determination of ancient mammal DNA in zooarchaeology. High-throughput sequencing methods, in which sufficiently large numbers of reads can be unambiguously mapped to X and Y regions, should overcome such biases and be the most robust approach for molecular sex determination using degraded DNA

    Spectrogram (down-sampled to 8 kHz, window size 256 samples with 95% overlap, fft size 512 with a factor two spectra interpolation), oscillogram (below) and power spectrum (right, Welch power spectral density estimate with a window size of 256 samples) of a bowhead whale song (A) and a fin whale song note (B) (data from Simon et al. 2010

    No full text
    <p><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0052072#pone.0052072-Simon1" target="_blank">[<b>53</b>]</a><b>).</b> The distance to the bowhead whale making the song note is shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0052072#pone-0052072-g002" target="_blank">Figure 2</a>. The song consisted of repetitions of this single note. The frequency of the fundamental ranged from 104 Hz to 1356 Hz (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0052072#pone-0052072-t001" target="_blank">Table 1</a>).</p

    Acoustic parameters of song notes.

    No full text
    <p>Localized = song notes fulfilling the criteria for source level estimation. Others = song notes with equally high quality, but were unable to be localized. (Dur, s) = duration, F<sub>max</sub> (Hz) = maximum frequency, F<sub>min</sub>(Hz) = minimum frequency, F<sub>c</sub> (Hz) = centroid frequency, F<sub>peak</sub> (Hz) = peak frequency, BW<sub>rms</sub> (Hz) = rms bandwidth, R(m) = distance, TL(dB) = transmission loss, RL = received level (rms = root-mean-squared, pp = peak to peak, efd = energy flux density), ASL = apparent source level referenced to 1 m from the source (whale). Standard deviation is given in parentheses.</p

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    No full text
    corecore