316 research outputs found
Recommended from our members
Neural Approaches to Feedback in Information Retrieval
Relevance feedback on search results indicates users\u27 search intent and preferences. Extensive studies have shown that incorporating relevance feedback (RF) on the top k (usually 10) ranked results significantly improves the performance of re-ranking. However, most existing research on user feedback focuses on words-based retrieval models. Recently, neural retrieval models have shown their efficacy in capturing relevance matching in retrieval but little research has been conducted on neural approaches to feedback. This leads us to study different aspects of feedback with neural approaches in the dissertation.
RF techniques are seldom used in real search scenarios since they can require significant manual efforts to obtain explicit judgments for search results. However, with mobile or voice-based intelligent assistants being more popular nowadays, user feedback of result quality could be collected potentially during their interactions with the assistants. We study both positive and negative RF to refine the re-ranking performance. Positive feedback aims to find more relevant results given some known relevant results while negative feedback targets identifying the first relevant result. In most cases, it is more beneficial to find the first relevant result compared with finding additional relevant results. However, negative feedback is much more challenging than positive feedback since relevant results are usually similar while non-relevant results could vary considerably.
We focus on the tasks of text retrieval and product search to study the different aspects of incorporating feedback for ranking refinement with neural approaches. Our contributions are: (1) we show that iterative relevance feedback (IRF) is more effective than top-k RF on answer passages and we further improve IRF with neural approaches; (2) we propose an effective RF technique based on neural models for product search; (3) we study how to refine re-ranking with negative feedback for conversational product search; (4) we leverage negative feedback in user responses to ask clarifying questions in open-domain conversational search. Our research improves retrieval performance by incorporating feedback in interactive retrieval and approaches multi-turn conversational information-seeking tasks with a focus on positive and negative feedback
The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives
The Archive Query Log (AQL) is a previously unused, comprehensive query log
collected at the Internet Archive over the last 25 years. Its first version
includes 356 million queries, 166 million search result pages, and 1.7 billion
search results across 550 search providers. Although many query logs have been
studied in the literature, the search providers that own them generally do not
publish their logs to protect user privacy and vital business data. Of the few
query logs publicly available, none combines size, scope, and diversity. The
AQL is the first to do so, enabling research on new retrieval models and
(diachronic) search engine analyses. Provided in a privacy-preserving manner,
it promotes open research as well as more transparency and accountability in
the search industry.Comment: SIGIR 2023 resource paper, 13 page
Feature Extraction and Duplicate Detection for Text Mining: A Survey
Text mining, also known as Intelligent Text Analysis is an important research area. It is very difficult to focus on the most appropriate information due to the high dimensionality of data. Feature Extraction is one of the important techniques in data reduction to discover the most important features. Proce- ssing massive amount of data stored in a unstructured form is a challenging task. Several pre-processing methods and algo- rithms are needed to extract useful features from huge amount of data. The survey covers different text summarization, classi- fication, clustering methods to discover useful features and also discovering query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query thereby reducing time taken by the user. Dealing with collection of text documents, it is also very important to filter out duplicate data. Once duplicates are deleted, it is recommended to replace the removed duplicates. Hence we also review the literature on duplicate detection and data fusion (remove and replace duplicates).The survey provides existing text mining techniques to extract relevant features, detect duplicates and to replace the duplicate data to get fine grained knowledge to the user
Professional Search in Pharmaceutical Research
In the mid 90s, visiting libraries – as means of retrieving the latest literature – was still a common necessity among professionals. Nowadays, professionals simply access information by ‘googling’. Indeed, the name of the Web search engine market leader “Google” became a synonym for searching and retrieving information. Despite the increased popularity of search as a method for retrieving relevant information, at the workplace search engines still do not deliver satisfying results to professionals.
Search engines for instance ignore that the relevance of answers (the satisfaction of a searcher’s needs) depends not only on the query (the information request) and the document corpus, but also on the working context (the user’s personal needs, education, etc.). In effect, an answer which might be appropriate to one user might not be appropriate to the other user, even though the query and the document corpus are the same for both. Personalization services addressing the context become therefore more and more popular and are an active field of research.
This is only one of several challenges encountered in ‘professional search’: How can the working context of the searcher be incorporated in the ranking process; how can unstructured free-text documents be enriched with semantic information so that the information need can be expressed precisely at query time; how and to which extent can a company’s knowledge be exploited for search purposes; how should data from distributed sources be accessed from into one-single-entry-point.
This thesis is devoted to ‘professional search’, i.e. search at the workplace, especially in industrial research and development. We contribute by compiling and developing several approaches for facing the challenges mentioned above. The approaches are implemented into the prototype YASA (Your Adaptive Search Agent) which provides meta-search, adaptive ranking of search results, guided navigation, and which uses domain knowledge to drive the search processes. YASA is deployed in the pharmaceutical research department of Roche in Penzberg – a major pharmaceutical company – in which the applied methods were empirically evaluated.
Being confronted with mostly unstructured free-text documents and having barely explicit metadata at hand, we faced a serious challenge. Incorporating semantics (i.e. formal knowledge representation) into the search process can only be as good as the underlying data. Nonetheless, we are able to demonstrate that this issue can be largely compensated by incorporating automatic metadata extraction techniques. The metadata we were able to extract automatically was not perfectly accurate, nor did the ontology we applied contain considerably “rich semantics”. Nonetheless, our results show that already the little semantics incorporated into the search process, suffices to achieve a significant improvement in search and retrieval.
We thus contribute to the research field of context-based search by incorporating the working context into the search process – an area which so far has not yet been well studied
Inferring User Needs and Tasks from User Interactions
The need for search often arises from a broad range of complex information needs or tasks (such as booking travel, buying a house, etc.) which lead to lengthy search processes characterised by distinct stages and goals. While existing search systems are adept at handling simple information needs, they offer limited support for tackling complex tasks. Accurate task representations could be useful in aptly placing users in the task-subtask space and enable systems to contextually target the user, provide them better query suggestions, personalization and recommendations and help in gauging satisfaction. The major focus of this thesis is to work towards task based information retrieval systems - search systems which are adept at understanding, identifying and extracting tasks as well as supporting user’s complex search task missions. This thesis focuses on two major themes: (i) developing efficient algorithms for understanding and extracting search tasks from log user and (ii) leveraging the extracted task information to better serve the user via different applications. Based on log analysis on a tera-byte scale data from a real-world search engine, detailed analysis is provided on user interactions with search engines. On the task extraction side, two bayesian non-parametric methods are proposed to extract subtasks from a complex task and to recursively extract hierarchies of tasks and subtasks. A novel coupled matrix-tensor factorization model is proposed that represents user based on their topical interests and task behaviours. Beyond personalization, the thesis demonstrates that task information provides better context to learn from and proposes a novel neural task context embedding architecture to learn query representations. Finally, the thesis examines implicit signals of user interactions and considers the problem of predicting user’s satisfaction when engaged in complex search tasks. A unified multi-view deep sequential model is proposed to make query and task level satisfaction prediction
Feature extraction and duplicate detection for text mining: A survey
Text mining, also known as Intelligent Text Analysis is an important research area. It is very
difficult to focus on the most appropriate information due to the high dimensionality of data. Feature
Extraction is one of the important techniques in data reduction to discover the most important
features. Proce- ssing massive amount of data stored in a unstructured form is a challenging task.
Several pre-processing methods and algo- rithms are needed to extract useful features from huge
amount of data. The survey covers different text summarization, classi- fication, clustering methods to
discover useful features and also discovering query facets which are multiple groups of words or
phrases that explain and summarize the content covered by a query thereby reducing time taken by
the user
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
- …