5 research outputs found

    Comparative Analysis of Functionality and Aspects for Hybrid Recommender Systems

    Get PDF
    Recommender systems are gradually becoming the backbone of profitable business which interact with users mainly on the web stack. These systems are privileged to have large amounts of user interaction data used to improve them.  The systems utilize machine learning and data mining techniques to determine products and features to suggest different users correctly. This is an essential function since offering the right product at the right time might result in increased revenue. This paper gives focus on the importance of different kinds of hybrid recommenders. First, by explaining the various types of recommenders in use, then showing the need for hybrid systems and the multiple kinds before giving a comparative analysis of each of these. Keeping in mind that content-based, as well as collaborative filtering systems, are widely used, research is comparatively done with a keen interest on how this measures up to hybrid recommender systems

    Constructing Tree-based Index for Efficient and Effective Dense Retrieval

    Full text link
    Recent studies have shown that Dense Retrieval (DR) techniques can significantly improve the performance of first-stage retrieval in IR systems. Despite its empirical effectiveness, the application of DR is still limited. In contrast to statistic retrieval models that rely on highly efficient inverted index solutions, DR models build dense embeddings that are difficult to be pre-processed with most existing search indexing systems. To avoid the expensive cost of brute-force search, the Approximate Nearest Neighbor (ANN) algorithm and corresponding indexes are widely applied to speed up the inference process of DR models. Unfortunately, while ANN can improve the efficiency of DR models, it usually comes with a significant price on retrieval performance. To solve this issue, we propose JTR, which stands for Joint optimization of TRee-based index and query encoding. Specifically, we design a new unified contrastive learning loss to train tree-based index and query encoder in an end-to-end manner. The tree-based negative sampling strategy is applied to make the tree have the maximum heap property, which supports the effectiveness of beam search well. Moreover, we treat the cluster assignment as an optimization problem to update the tree-based index that allows overlapped clustering. We evaluate JTR on numerous popular retrieval benchmarks. Experimental results show that JTR achieves better retrieval performance while retaining high system efficiency compared with widely-adopted baselines. It provides a potential solution to balance efficiency and effectiveness in neural retrieval system designs.Comment: 10 pages, accepted at SIGIR 202

    Efficient self-supervised metric information retrieval: A bibliography based method applied to covid literature

    Get PDF
    The literature on coronaviruses counts more than 300,000 publications. Finding relevant papers concerning arbitrary queries is essential to discovery helpful knowledge. Current best information retrieval (IR) use deep learning approaches and need supervised train sets with labeled data, namely to know a priori the queries and their corresponding relevant papers. Creating such labeled datasets is time-expensive and requires prominent experts’ efforts, resources insufficiently available under a pandemic time pressure. We present a new self-supervised solution, called SUBLIMER, that does not require labels to learn to search on corpora of scientific papers for most relevant against arbitrary queries. SUBLIMER is a novel efficient IR engine trained on the unsupervised COVID-19 Open Research Dataset (CORD19), using deep metric learning. The core point of our self-supervised approach is that it uses no labels, but exploits the bibliography citations from papers to create a latent space where their spatial proximity is a metric of semantic similarity; for this reason, it can also be applied to other domains of papers corpora. SUBLIMER, despite is self-supervised, outperforms the Precision@5 (P@5) and Bpref of the state-of-the-art competitors on CORD19, which, differently from our approach, require both labeled datasets and a number of trainable parameters that is an order of magnitude higher than our

    Design and Development of an Extensible and Configurable Framework for Conversational Search Experiments

    Get PDF
    The Conversational Search (CS) paradigm allows for an intuitive interaction between the user and the system through natural language sentences and it is increasingly being adopted in various scenarios. However, its widespread experimentation has led to the birth of a multitude of CS systems with custom implementations and variants of Information Retrieval (IR) models. This exacerbates the reproducibility crisis already observed in several research areas, including IR. To address this issue, we propose DECAF: a modular and extensible Conversational Search framework designed for fast prototyping and development of conversational agents. Our framework integrates all the components that characterize a modern CS system and allows for the seamless integration of Machine Learning (ML) and Large Language Models (LLMs)-based techniques. Furthermore, thanks to its uniform interface, DECAF allows for experiments characterized by a high degree of reproducibility. DECAF contains several state-of-the-art components including query rewriting, search functions under Bag-of-Words (BoW) and dense paradigms, and re-ranking functions. Our framework is tested on two well-known conversational collections: TREC CAsT 2019 and 2020 and the results can be used by future practitioners as baselines. Our contributions include the identification of a series of state-of-the-art components for the CS task and the definition of a modular framework for its implementation.The Conversational Search (CS) paradigm allows for an intuitive interaction between the user and the system through natural language sentences and it is increasingly being adopted in various scenarios. However, its widespread experimentation has led to the birth of a multitude of CS systems with custom implementations and variants of Information Retrieval (IR) models. This exacerbates the reproducibility crisis already observed in several research areas, including IR. To address this issue, we propose DECAF: a modular and extensible Conversational Search framework designed for fast prototyping and development of conversational agents. Our framework integrates all the components that characterize a modern CS system and allows for the seamless integration of Machine Learning (ML) and Large Language Models (LLMs)-based techniques. Furthermore, thanks to its uniform interface, DECAF allows for experiments characterized by a high degree of reproducibility. DECAF contains several state-of-the-art components including query rewriting, search functions under Bag-of-Words (BoW) and dense paradigms, and re-ranking functions. Our framework is tested on two well-known conversational collections: TREC CAsT 2019 and 2020 and the results can be used by future practitioners as baselines. Our contributions include the identification of a series of state-of-the-art components for the CS task and the definition of a modular framework for its implementation

    Commissioning and First Science Results of the Desert Fireball Network: a Global-Scale Automated Survey for Large Meteoroid Impacts

    Get PDF
    This thesis explores the first results from the Desert Fireball Network, a distributed global observatory designed to characterise fireballs caused by meteoroid impacts. To deal with the >50 terabytes of data influx per week, innovative data reduction techniques have been developed. The science topics investigated in this work include airbursts caused by large meteoroids impacting the Earth's atmosphere, the recovery of a meteorite and its orbital history, and the structure of a meteor shower
    corecore