288 research outputs found

    Using topic information to improve non-exact keyword-based search for mobile applications

    Get PDF
    Considering the wide offer of mobile applications available nowadays, effective search engines are imperative for an user to find applications that provide a specific desired functionality. Retrieval approaches that leverage topic similarity between queries and applications have shown promising results in previous studies. However, the search engines used by most app stores are based on keyword-matching and boosting. In this paper, we explore means to include topic information in such approaches, in order to improve their ability to retrieve relevant applications for non-exact queries, without impairing their computational performance. More specifically, we create topic models specialized on application descriptions and explore how the most relevant terms for each topic covered by an application can be used to complement the information provided by its description. Our experiments show that, although these topic keywords are not able to provide all the information of the topic model, they provide a sufficiently informative summary of the top- ics covered by the descriptions, leading to improved performance.info:eu-repo/semantics/publishedVersio

    In the pursuit of a semantic similarity metric based on UMLS annotations for articles in PubMed Central

    Get PDF
    Motivation Although full-text articles are provided by the publishers in electronic formats, it remains a challenge to find related work beyond the title and abstract context. Identifying related articles based on their abstract is indeed a good starting point; this process is straightforward and does not consume as many resources as full-text based similarity would require. However, further analyses may require in-depth understanding of the full content. Two articles with highly related abstracts can be substantially different regarding the full content. How similarity differs when considering title-and-abstract versus full-text and which semantic similarity metric provides better results when dealing with full-text articles are the main issues addressed in this manuscript. Methods We have benchmarked three similarity metrics – BM25, PMRA, and Cosine, in order to determine which one performs best when using concept-based annotations on full-text documents. We also evaluated variations in similarity values based on title-and-abstract against those relying on full-text. Our test dataset comprises the Genomics track article collection from the 2005 Text Retrieval Conference. Initially, we used an entity recognition software to semantically annotate titles and abstracts as well as full-text with concepts defined in the Unified Medical Language System (UMLS®). For each article, we created a document profile, i.e., a set of identified concepts, term frequency, and inverse document frequency; we then applied various similarity metrics to those document profiles. We considered correlation, precision, recall, and F1 in order to determine which similarity metric performs best with concept-based annotations. For those full-text articles available in PubMed Central Open Access (PMC-OA), we also performed dispersion analyses in order to understand how similarity varies when considering full-text articles. Results We have found that the PubMed Related Articles similarity metric is the most suitable for full-text articles annotated with UMLS concepts. For similarity values above 0.8, all metrics exhibited an F1 around 0.2 and a recall around 0.1; BM25 showed the highest precision close to 1; in all cases the concept-based metrics performed better than the word-stem-based one. Our experiments show that similarity values vary when considering only title-and-abstract versus full-text similarity. Therefore, analyses based on full-text become useful when a given research requires going beyond title and abstract, particularly regarding connectivity across articles. Availability Visualization available at ljgarcia.github.io/semsim.benchmark/, data available at http://dx.doi.org/10.5281/zenodo.13323.The authors acknowledge the support from the members of Temporal Knowledge Bases Group at Universitat Jaume I. Funding: LJGC and AGC are both self-funded, RB is funded by the “Ministerio de Economía y Competitividad” with contract number TIN2011-24147

    CWI at TREC 2011: Session, Web, and Medical

    Get PDF

    Learning to Truncate Ranked Lists for Information Retrieval

    Full text link
    Ranked list truncation is of critical importance in a variety of professional information retrieval applications such as patent search or legal search. The goal is to dynamically determine the number of returned documents according to some user-defined objectives, in order to reach a balance between the overall utility of the results and user efforts. Existing methods formulate this task as a sequential decision problem and take some pre-defined loss as a proxy objective, which suffers from the limitation of local decision and non-direct optimization. In this work, we propose a global decision based truncation model named AttnCut, which directly optimizes user-defined objectives for the ranked list truncation. Specifically, we take the successful transformer architecture to capture the global dependency within the ranked list for truncation decision, and employ the reward augmented maximum likelihood (RAML) for direct optimization. We consider two types of user-defined objectives which are of practical usage. One is the widely adopted metric such as F1 which acts as a balanced objective, and the other is the best F1 under some minimal recall constraint which represents a typical objective in professional search. Empirical results over the Robust04 and MQ2007 datasets demonstrate the effectiveness of our approach as compared with the state-of-the-art baselines

    Interactive Graph-Based Visualisation and Veracity Prediction of Scientific COVID-19 Claims

    Get PDF
    Claims presented as facts and other misinformation have become a bigger problem in recent times, especially during the COVID-19 pandemic, which gave rise to a large amount of misinformation, deemed the "infodemic". Therefore, having a way for people to determine the veracity of claims they encounter could alleviate the problems caused by misinformation. To this end, a system for visualising relevant information in a graph-based interactive manner to help users determine the veracity of COVID-19 claims is presented. It consists of relevant abstract retrieval, stance detection, rationale extraction, inter-rationale stance detection, and additional metadata extraction. Furthermore, it also contains prediction methods to assist in determining the veracity of a claim. One of which is a graph algorithm that uses a measure of document importance and how the documents' rationales agree or disagree with each other to make a prediction. This is done in a way as to allow users to adjust certain parameters based on what they deem important when deciding the veracity of a claim. Finally, it is shown that visualisation and associated prediction methods can help a user determine the veracity of a claim
    • …
    corecore