5 research outputs found

    Machine Reading at Scale: A Search Engine for Scientific and Academic Research

    Get PDF
    The Internet, much like our universe, is ever-expanding. Information, in the most varied formats, is continuously added to the point of information overload. Consequently, the ability to navigate this ocean of data is crucial in our day-to-day lives, with familiar tools such as search engines carving a path through this unknown. In the research world, articles on a myriad of topics with distinct complexity levels are published daily, requiring specialized tools to facilitate the access and assessment of the information within. Recent endeavors in artificial intelligence, and in natural language processing in particular, can be seen as potential solutions for breaking information overload and provide enhanced search mechanisms by means of advanced algorithms. As the advent of transformer-based language models contributed to a more comprehensive analysis of both text-encoded intents and true document semantic meaning, there is simultaneously a need for additional computational resources. Information retrieval methods can act as low-complexity, yet reliable, filters to feed heavier algorithms, thus reducing computational requirements substantially. In this work, a new search engine is proposed, addressing machine reading at scale in the context of scientific and academic research. It combines state-of-the-art algorithms for information retrieval and reading comprehension tasks to extract meaningful answers from a corpus of scientific documents. The solution is then tested on two current and relevant topics, cybersecurity and energy, proving that the system is able to perform under distinct knowledge domains while achieving competent performance.This work has received funding from the following projects: UIDB/00760/2020 and UIDP/00760/2020.info:eu-repo/semantics/publishedVersio

    Adaptive distributional extensions to DFR ranking

    No full text
    Divergence From Randomness (DFR) ranking models assume that informative terms are distributed in a corpus differently than non-informative terms. Different statistical models (e.g. Poisson, geometric) are used to model the distribution of non-informative terms, producing different DFR models. An informative term is then detected by measuring the divergence of its distribution from the distribution of non-informative terms. However, there is little empirical evidence that the distributions of non-informative terms used in DFR actually fit current datasets. Practically this risks providing a poor separation between informative and non-informative terms, thus compromising the discriminative power of the ranking model. We present a novel extension to DFR, which first detects the best-fitting distribution of non-informative terms in a collection, and then adapts the ranking computation to this best-fitting distribution. We call this model Adaptive Distributional Ranking (ADR) because it adapts the ranking to the statistics of the specific dataset being processed each time. Experiments on TREC data show ADR to outperform DFR models (and their extensions) and be comparable in performance to a query likelihood language model (LM)

    Adaptive Distributional Extensions to DFR Ranking

    No full text
    corecore