Search CORE

22 research outputs found

LTRo: Learning to Route Queries in Clustered P2P IR

Author: D Metzler
G Amati
IA Klampanos
IA Klampanos
JH Friedman
RA Baeza-Yates
RS Alkhawaldeh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Query Routing is a critical step in P2P Information Retrieval. In this paper, we consider learning to rank approaches for query routing in the clustered P2P IR architecture. Our formulation, LTRo, scores resources based on the number of relevant documents for each training query, and uses that information to build a model that would then rank promising peers for a new query. Our empirical analysis over a variety of P2P IR testbeds illustrate the superiority of our method against the state-of-the-art methods for query routing

Queen's University Belfast Research Portal

Crossref

Enlighten

Influential users in Twitter: detection and evolution analysis

Author: Amati G.
Angelini S.
Gambosi G.
Rossi G.
Vocca P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/02/2019
Field of study

In this paper, we study how to detect the most influential users in the microblogging social network platform Twitter and their evolution over time. To this aim, we consider the Dynamic Retweet Graph (DRG) proposed in Amati et al. (2016) and partially analyzed in Amati et al. (IADIS Int J Comput Sci Inform Syst, 11(2) 2016), Amati et al. (2016). The model of the evolution of the Twitter social network is based here on the retweet relationship. In a DRGs, the last time a tweet has been retweeted we delete all the edges representing this tweet. In this way we model the decay of tweet life in the social platform. To detect the influential users, we consider the central nodes in the network with respect to the following centrality measures: degree, closeness, betweenness and PageRank-centrality. These measures have been widely studied in the static case and we analyze them on the sequence of DRG temporal graphs with special regard to the distribution of the 75% most central nodes. We derive the following results: (a) in all cases, applying the closeness measure results into many nodes with high centrality, so it is useless to detect influential users; (b) for all other measures, almost all nodes have null or very low centrality and (c) the number of vertices with significant centrality are often the same; (d) the above observations hold also for the cumulative retweet graph and, (e) central nodes in the sequence of DRG temporal graphs have high centrality in cumulative graph

ART

Fisher's exact test explains a popular metric in information retrieval

Author: Onsjö Mikael
Sheridan Paul
Publication venue
Publication date: 20/07/2020
Field of study

Term frequency-inverse document frequency, or tf-idf for short, is a numerical measure that is widely used in information retrieval to quantify the importance of a term of interest in one out of many documents. While tf-idf was originally proposed as a heuristic, much work has been devoted over the years to placing it on a solid theoretical foundation. Following in this tradition, we here advance the first justification for tf-idf that is grounded in statistical hypothesis testing. More precisely, we first show that the one-tailed version of Fisher's exact test, also known as the hypergeometric test, corresponds well with a common tf-idf variant on selected real-data information retrieval tasks. We then set forth a mathematical argument that suggests the tf-idf variant approximates the negative logarithm of the one-tailed Fisher's exact test P-value (i.e., a hypergeometric distribution tail probability). The Fisher's exact test interpretation of this common tf-idf variant furnishes the working statistician with a ready explanation of tf-idf's long-established effectiveness.Comment: 26 pages, 4 figures, 1 tables, minor revision

arXiv.org e-Print Archive

Combining compound and single terms under language model framework

Author: Ahmed-Ouamar Rachid
Boughanem Mohand
Hammache Arezki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/03/2013
Field of study

International audienceMost existing Information Retrieval model including probabilistic and vector space models are based on the term independence hypothesis. To go beyond this assumption and thereby capture the semantics of document and query more accurately, several works have incorporated phrases or other syntactic information in IR, such attempts have shown slight benefit, at best. Particularly in language modeling approaches this extension is achieved through the use of the bigram or n-gram models. However, in these models all bigrams/n-grams are considered and weighted uniformly. In this paper we introduce a new approach to select and weight relevant n-grams associated with a document. Experimental results on three TREC test collections showed an improvement over three strongest state-of-the-art model baselines, which are the original unigram language model, the Markov Random Field model, and the positional language model

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Question answering systems for health professionals at the point of care - a systematic review

Author: Ferrari Davide
Kell Gregory
Marshall Iain J.
Patel Nikhil
Qian Linglong
Roberts Angus
Soboczenski Frank
Umansky Serge
Wallace Byron
Publication venue
Publication date: 13/01/2024
Field of study

King's Research Portal

Question answering systems for health professionals at the point of care -- a systematic review

Author: Ferrari Davide
Kell Gregory
Marshall Iain J
Patel Nikhil
Qian Linglong
Roberts Angus
Soboczenski Frank
Umansky Serge
Wallace Byron
Publication venue
Publication date: 24/01/2024
Field of study

Objective: Question answering (QA) systems have the potential to improve the quality of clinical care by providing health professionals with the latest and most relevant evidence. However, QA systems have not been widely adopted. This systematic review aims to characterize current medical QA systems, assess their suitability for healthcare, and identify areas of improvement. Materials and methods: We searched PubMed, IEEE Xplore, ACM Digital Library, ACL Anthology and forward and backward citations on 7th February 2023. We included peer-reviewed journal and conference papers describing the design and evaluation of biomedical QA systems. Two reviewers screened titles, abstracts, and full-text articles. We conducted a narrative synthesis and risk of bias assessment for each study. We assessed the utility of biomedical QA systems. Results: We included 79 studies and identified themes, including question realism, answer reliability, answer utility, clinical specialism, systems, usability, and evaluation methods. Clinicians' questions used to train and evaluate QA systems were restricted to certain sources, types and complexity levels. No system communicated confidence levels in the answers or sources. Many studies suffered from high risks of bias and applicability concerns. Only 8 studies completely satisfied any criterion for clinical utility, and only 7 reported user evaluations. Most systems were built with limited input from clinicians. Discussion: While machine learning methods have led to increased accuracy, most studies imperfectly reflected real-world healthcare information needs. Key research priorities include developing more realistic healthcare QA datasets and considering the reliability of answer sources, rather than merely focusing on accuracy.Comment: Accepted to the Journal of the American Medical Informatics Association (JAMIA

arXiv.org e-Print Archive

Sidra5: a search system with geographic signatures

Author: Cruz David José Vaz
Publication venue
Publication date: 01/01/2007
Field of study

Tese de mestrado em Engenharia Informática, apresentada à Universidade de Lisboa através da Faculdade de Ciências, 2007Este trabalho consistiu no desenvolvimento de um sistema de pesquisa de informação com raciocínio geográfico, servindo de base para uma nova abordagem para modelação da informação geográfica contida nos documentos, as assinaturas geográficas. Pretendeu-se determinar se a semântica geográfica presente nos documentos, capturada através das assinaturas geográficas, contribui para uma melhoria dos resultados obtidos para pesquisas de cariz geográfico. São propostas e experimentadas diversas estratégias para o cálculo da semelhança entre as assinaturas geográficas de interrogações e documentos. A partir dos resultados observados conclui-se que, em algumas circunstâncias, as assinaturas geográficas contribuem para melhorar a qualidade das pesquisas geográficas.The dissertation report presents the development of a geographic information search system which implements geographic signatures, a novel approach for the modeling of the geographic information present in documents. The goal of the project was to determine if the information with geographic semantics present in documents, captured as geographic signatures, contributes to the improvement of search results. Several strategies for computing the similarity between the geographic signatures in queries and documents are proposed and experimented. The obtained results show that, in some circunstances, geographic signatures can indeed improve the search quality of geographic queries

Universidade de Lisboa: Repositório.UL

A Novel Class-Based Data Fusion Technique for Information Retrieval

Author: Diana Inkpen
Muath Alzghool
Publication venue: 'Academy Publisher'
Publication date
Field of study

Crossref

Smart Search Engine For Information Retrieval

Author: SUN LEILEI
Publication venue
Publication date: 01/01/2009
Field of study

This project addresses the main research problem in information retrieval and semantic search. It proposes the smart search theory as new theory based on hypothesis that semantic meanings of a document can be described by a set of keywords. With two experiments designed and carried out in this project, the experiment result demonstrates positive evidence that meet the smart search theory. In the theory proposed in this project, the smart search aims to determine a set of keywords for any web documents, by which the semantic meanings of the documents can be uniquely identified. Meanwhile, the size of the set of keywords is supposed to be small enough which can be easily managed. This is the fundamental assumption for creating the smart semantic search engine. In this project, the rationale of the assumption and the theory based on it will be discussed, as well as the processes of how the theory can be applied to the keyword allocation and the data model to be generated. Then the design of the smart search engine will be proposed, in order to create a solution to the efficiency problem while searching among huge amount of increasing information published on the web. To achieve high efficiency in web searching, statistical method is proved to be an effective way and it can be interpreted from the semantic level. Based on the frequency of joint keywords, the keyword list can be generated and linked to each other to form a meaning structure. A data model is built when a proper keyword list is achieved and the model is applied to the design of the smart search engine

Durham e-Theses

Biomedical Question Answering: A Survey of Approaches and Challenges

Author: Chen Mosha
Huang Songfang
Jin Qiao
Liu Xiaozhong
Tan Chuanqi
Xiong Guangzhi
Ying Huaiyuan
Yu Qianlan
Yu Sheng
Yuan Zheng
Publication venue
Publication date: 08/09/2021
Field of study

Automatic Question Answering (QA) has been successfully applied in various domains such as search engines and chatbots. Biomedical QA (BQA), as an emerging QA task, enables innovative applications to effectively perceive, access and understand complex biomedical knowledge. There have been tremendous developments of BQA in the past two decades, which we classify into 5 distinctive approaches: classic, information retrieval, machine reading comprehension, knowledge base and question entailment approaches. In this survey, we introduce available datasets and representative methods of each BQA approach in detail. Despite the developments, BQA systems are still immature and rarely used in real-life settings. We identify and characterize several key challenges in BQA that might lead to this issue, and discuss some potential future directions to explore.Comment: In submission to ACM Computing Survey

arXiv.org e-Print Archive