5 research outputs found

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    From E-Business to Social Tool for the Poor - A Study on Internet Applications, Drivers and Impact

    No full text

    Cosmology Intertwined: A Review of the Particle Physics, Astrophysics, and Cosmology Associated with the Cosmological Tensions and Anomalies

    Get PDF
    In this paper we will list a few important goals that need to be addressed in the next decade, also taking into account the current discordances between the different cosmological probes, such as the disagreement in the value of the Hubble constant H0H_0, the σ8\sigma_8--S8S_8 tension, and other less statistically significant anomalies. While these discordances can still be in part the result of systematic errors, their persistence after several years of accurate analysis strongly hints at cracks in the standard cosmological scenario and the necessity for new physics or generalisations beyond the standard model. In this paper, we focus on the 5.0σ5.0\,\sigma tension between the {\it Planck} CMB estimate of the Hubble constant H0H_0 and the SH0ES collaboration measurements. After showing the H0H_0 evaluations made from different teams using different methods and geometric calibrations, we list a few interesting new physics models that could alleviate this tension and discuss how the next decade's experiments will be crucial. Moreover, we focus on the tension of the {\it Planck} CMB data with weak lensing measurements and redshift surveys, about the value of the matter energy density Ωm\Omega_m, and the amplitude or rate of the growth of structure (σ8,fσ8\sigma_8,f\sigma_8). We list a few interesting models proposed for alleviating this tension, and we discuss the importance of trying to fit a full array of data with a single model and not just one parameter at a time. Additionally, we present a wide range of other less discussed anomalies at a statistical significance level lower than the H0H_0--S8S_8 tensions which may also constitute hints towards new physics, and we discuss possible generic theoretical approaches that can collectively explain the non-standard nature of these signals.[Abridged]Comment: Contribution to Snowmass 2021. 224 pages, 27 figures. Accepted for publication in JHEA

    Cosmology intertwined: A review of the particle physics, astrophysics, and cosmology associated with the cosmological tensions and anomalies

    No full text
    The standard Λ Cold Dark Matter (ΛCDM) cosmological model provides a good description of a wide range of astrophysical and cosmological data. However, there are a few big open questions that make the standard model look like an approximation to a more realistic scenario yet to be found. In this paper, we list a few important goals that need to be addressed in the next decade, taking into account the current discordances between the different cosmological probes, such as the disagreement in the value of the Hubble constant H0, the σ8–S8 tension, and other less statistically significant anomalies. While these discordances can still be in part the result of systematic errors, their persistence after several years of accurate analysis strongly hints at cracks in the standard cosmological scenario and the necessity for new physics or generalisations beyond the standard model. In this paper, we focus on the 5.0σ tension between the Planck CMB estimate of the Hubble constant H0 and the SH0ES collaboration measurements. After showing the H0 evaluations made from different teams using different methods and geometric calibrations, we list a few interesting new physics models that could alleviate this tension and discuss how the next decade's experiments will be crucial. Moreover, we focus on the tension of the Planck CMB data with weak lensing measurements and redshift surveys, about the value of the matter energy density Ωm, and the amplitude or rate of the growth of structure (σ8,fσ8). We list a few interesting models proposed for alleviating this tension, and we discuss the importance of trying to fit a full array of data with a single model and not just one parameter at a time. Additionally, we present a wide range of other less discussed anomalies at a statistical significance level lower than the H0–S8 tensions which may also constitute hints towards new physics, and we discuss possible generic theoretical approaches that can collectively explain the non-standard nature of these signals. Finally, we give an overview of upgraded experiments and next-generation space missions and facilities on Earth that will be of crucial importance to address all these open questions
    corecore