3,016 research outputs found
Large Language Models for Information Retrieval: A Survey
As a primary means of information acquisition, information retrieval (IR)
systems, such as search engines, have integrated themselves into our daily
lives. These systems also serve as components of dialogue, question-answering,
and recommender systems. The trajectory of IR has evolved dynamically from its
origins in term-based methods to its integration with advanced neural models.
While the neural models excel at capturing complex contextual signals and
semantic nuances, thereby reshaping the IR landscape, they still face
challenges such as data scarcity, interpretability, and the generation of
contextually plausible yet potentially inaccurate responses. This evolution
requires a combination of both traditional methods (such as term-based sparse
retrieval methods with rapid response) and modern neural architectures (such as
language models with powerful language understanding capacity). Meanwhile, the
emergence of large language models (LLMs), typified by ChatGPT and GPT-4, has
revolutionized natural language processing due to their remarkable language
understanding, generation, generalization, and reasoning abilities.
Consequently, recent research has sought to leverage LLMs to improve IR
systems. Given the rapid evolution of this research trajectory, it is necessary
to consolidate existing methodologies and provide nuanced insights through a
comprehensive overview. In this survey, we delve into the confluence of LLMs
and IR systems, including crucial aspects such as query rewriters, retrievers,
rerankers, and readers. Additionally, we explore promising directions within
this expanding field
Community Interest as An Indicator for Ranking
Ranking documents in response to users\u27 information needs is a challenging task, due, in part, to the dynamic nature of users\u27 interests with respect to a query. We hypothesize that the interests of a given user are similar to the interests of the broader community of which he or she is a part and propose an innovative method that uses social media to characterize the interests of the community and use this characterization to improve future rankings. By generating a community interest vector (CIV) and community interest language model (CILM) for a given query, we use community interest to alter the ranking score of individual documents retrieved by the query. The CIV or CILM is based on a continuously updated set of recent (daily or past few hours) user oriented text data. The interest based ranking method is evaluated by using Amazon Turk to against relevance based ranking and search engines\u27 ranking results. Overall, the experiment result shows community interest is an effective indicator for dynamic ranking
Ontology Based Semantic Web Information Retrieval Enhancing Search Significance
The web contain huge amount of structured as well as unstructured data/information. This varying nature of data may yield a retrieval response that is expected to contain relevant response that is expected to contain relevant as well as irrelevant data while directing search. In order to filter out irrelevance in the search result, numerous methodologies have been used to extract more and more relevant search responses in retrieval. This work has adopted semantic search dealing directly with the knowledge base. The approach incorporates Query pattern evolution and semantic keyword matching with final detail to enhance significance of relevant data retrieval. The proposed method is implemented in open source computing tool environment and the result obtained thereof are compared with that of earlier used methodologies
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
Semantic Interaction in Web-based Retrieval Systems : Adopting Semantic Web Technologies and Social Networking Paradigms for Interacting with Semi-structured Web Data
Existing web retrieval models for exploration and interaction with web data do not take into account semantic information, nor do they allow for new forms of interaction by employing meaningful interaction and navigation metaphors in 2D/3D. This thesis researches means for introducing a semantic dimension into the search and exploration process of web content to enable a significantly positive user experience. Therefore, an inherently dynamic view beyond single concepts and models from semantic information processing, information extraction and human-machine interaction is adopted. Essential tasks for semantic interaction such as semantic annotation, semantic mediation and semantic human-computer interaction were identified and elaborated for two general application scenarios in web retrieval: Web-based Question Answering in a knowledge-based dialogue system and semantic exploration of information spaces in 2D/3D
- …