3,016 research outputs found

    Large Language Models for Information Retrieval: A Survey

    Full text link
    As a primary means of information acquisition, information retrieval (IR) systems, such as search engines, have integrated themselves into our daily lives. These systems also serve as components of dialogue, question-answering, and recommender systems. The trajectory of IR has evolved dynamically from its origins in term-based methods to its integration with advanced neural models. While the neural models excel at capturing complex contextual signals and semantic nuances, thereby reshaping the IR landscape, they still face challenges such as data scarcity, interpretability, and the generation of contextually plausible yet potentially inaccurate responses. This evolution requires a combination of both traditional methods (such as term-based sparse retrieval methods with rapid response) and modern neural architectures (such as language models with powerful language understanding capacity). Meanwhile, the emergence of large language models (LLMs), typified by ChatGPT and GPT-4, has revolutionized natural language processing due to their remarkable language understanding, generation, generalization, and reasoning abilities. Consequently, recent research has sought to leverage LLMs to improve IR systems. Given the rapid evolution of this research trajectory, it is necessary to consolidate existing methodologies and provide nuanced insights through a comprehensive overview. In this survey, we delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers. Additionally, we explore promising directions within this expanding field

    Community Interest as An Indicator for Ranking

    Get PDF
    Ranking documents in response to users\u27 information needs is a challenging task, due, in part, to the dynamic nature of users\u27 interests with respect to a query. We hypothesize that the interests of a given user are similar to the interests of the broader community of which he or she is a part and propose an innovative method that uses social media to characterize the interests of the community and use this characterization to improve future rankings. By generating a community interest vector (CIV) and community interest language model (CILM) for a given query, we use community interest to alter the ranking score of individual documents retrieved by the query. The CIV or CILM is based on a continuously updated set of recent (daily or past few hours) user oriented text data. The interest based ranking method is evaluated by using Amazon Turk to against relevance based ranking and search engines\u27 ranking results. Overall, the experiment result shows community interest is an effective indicator for dynamic ranking

    Ontology Based Semantic Web Information Retrieval Enhancing Search Significance

    Get PDF
    The web contain huge amount of structured as well as unstructured data/information. This varying nature of data may yield a retrieval response that is expected to contain relevant response that is expected to contain relevant as well as irrelevant data while directing search. In order to filter out irrelevance in the search result, numerous methodologies have been used to extract more and more relevant search responses in retrieval. This work has adopted semantic search dealing directly with the knowledge base. The approach incorporates Query pattern evolution and semantic keyword matching with final detail to enhance significance of relevant data retrieval. The proposed method is implemented in open source computing tool environment and the result obtained thereof are compared with that of earlier used methodologies

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    When Things Matter: A Data-Centric View of the Internet of Things

    Full text link
    With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. While IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy, and continuous. This article surveys the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed

    Semantic Interaction in Web-based Retrieval Systems : Adopting Semantic Web Technologies and Social Networking Paradigms for Interacting with Semi-structured Web Data

    Get PDF
    Existing web retrieval models for exploration and interaction with web data do not take into account semantic information, nor do they allow for new forms of interaction by employing meaningful interaction and navigation metaphors in 2D/3D. This thesis researches means for introducing a semantic dimension into the search and exploration process of web content to enable a significantly positive user experience. Therefore, an inherently dynamic view beyond single concepts and models from semantic information processing, information extraction and human-machine interaction is adopted. Essential tasks for semantic interaction such as semantic annotation, semantic mediation and semantic human-computer interaction were identified and elaborated for two general application scenarios in web retrieval: Web-based Question Answering in a knowledge-based dialogue system and semantic exploration of information spaces in 2D/3D
    corecore