22 research outputs found

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Biostratigraphy, Depositional Environments, and Diagenesis of the Tamana Formation, Trinidad: a Tectonic Marker Horizon

    No full text
    The Tamana Formation of the Central Range of Trinidad was studied in order to determine its importance in the stratigraphical and structural development of north‐eastern South America. Biostratigraphical, petrological and mineralogical data, combined with field mapping show that the Tamana sediments are composed of five distinct lithofacies: inner to outer shelf, burrowed shaley mudstone; outer shelf, Fe‐rich sandy limestone; submarine channel, conglomeratic mudstone; middle shelf to nearshore, algal‐foram packstone/grainstone; and intertidal to nearshore, algal‐stromatolite‐coral boundstone with coral bioherms. Maximum thickness of the Tamana Formation is 244 m. Deposition of the Tamana limestones occurred between the Praeorbulina glomerosa (latest early Miocene) and Globorotalia fohsi robusta (middle part of the middle Miocene) planktonic foraminiferal zones, and in a more continuous trend than is seen in the current outcrop belt. Detailed biostratigraphy shows that the Tamana Formation is a facies equivalent of the shallow‐ and deep‐water shales of the Brasso Formation, and the deep water turbidites of the Herrera Member of the Cipero Formation. The early diagenetic history of the Tamana limestones was dominated by the precipitation of authigenic glauconitic smectite, and the dissolution of skeletal grains and carbonate matrix. Late burial diagenesis was dominated by the precipitation of illite and illite/smectite. Comparative mineralogy and textural analyses indicate a minimum range of burial depth for the Tamana Formation at 800–1500m, with a maximum of 2400 m. Alteration of Fe‐bearing minerals to geothite and late fracturing occurred during post‐Pliocene tectonic uplift and unroofing of the Central Range. The Tamana Formation sediments can be used as a structural and stratigraphical event marker within the Late Tertiary geological history of Trinidad. These units record a phase of the tectonic interaction between the Caribbean and South American plates in the south‐eastern Caribbean, and reflect the onset of contractile deformation in the Central Range

    Evolutionary changes in nectar sugar composition associated with switches between bird and insect pollination: the Canarian bird-flower element revisited

    Full text link
    The bird-flower element of the Canary Islands is a group of endemic plants having traits characteristic of bird pollination, and some are visited by opportunistically nectar-feeding passerine birds. 2. We investigated evolutionary changes in nectar sugar composition in seven Canarian lineages of ornithophilous plant species and their entomophilous relatives. 3. We hypothesized that nectar sugar composition evolved in response to the main pollinator group of a plant. Specialist nectarivores can assimilate sucrose, whereas some opportunistic nectar-feeders digest only the simple hexoses. 4. Sugar composition of nectars was analysed using high pH anion exchange chromatography. 5. Evolution of nectar type was correlated with mode of pollination. Generally, sucrose nectars were associated with insect visitation and hexose nectars with bird visitation. Nectar sugar composition was an evolutionary labile trait within a lineage. Hence, nectar characteristics may have evolved readily, perhaps in response to opportunistically nectarivorous birds living in Canary Islands

    Genome sequence of the progenitor of the wheat D genome <em>Aegilops tauschii</em>.

    No full text
    Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat (Triticum aestivum, genomes AABBDD) and an important genetic resource for wheat. The large size and highly repetitive nature of the Ae. tauschii genome has until now precluded the development of a reference-quality genome sequence. Here we use an array of advanced technologies, including ordered-clone genome sequencing, whole-genome shotgun sequencing, and BioNano optical genome mapping, to generate a reference-quality genome sequence for Ae. tauschii ssp. strangulata accession AL8/78, which is closely related to the wheat D genome. We show that compared to other sequenced plant genomes, including a much larger conifer genome, the Ae. tauschii genome contains unprecedented amounts of very similar repeated sequences. Our genome comparisons reveal that the Ae. tauschii genome has a greater number of dispersed duplicated genes than other sequenced genomes and its chromosomes have been structurally evolving an order of magnitude faster than those of other grass genomes. The decay of colinearity with other grass genomes correlates with recombination rates along chromosomes. We propose that the vast amounts of very similar repeated sequences cause frequent errors in recombination and lead to gene duplications and structural chromosome changes that drive fast genome evolution
    corecore