80,967 research outputs found
Reply With: Proactive Recommendation of Email Attachments
Email responses often contain items-such as a file or a hyperlink to an
external document-that are attached to or included inline in the body of the
message. Analysis of an enterprise email corpus reveals that 35% of the time
when users include these items as part of their response, the attachable item
is already present in their inbox or sent folder. A modern email client can
proactively retrieve relevant attachable items from the user's past emails
based on the context of the current conversation, and recommend them for
inclusion, to reduce the time and effort involved in composing the response. In
this paper, we propose a weakly supervised learning framework for recommending
attachable items to the user. As email search systems are commonly available,
we constrain the recommendation task to formulating effective search queries
from the context of the conversations. The query is submitted to an existing IR
system to retrieve relevant items for attachment. We also present a novel
strategy for generating labels from an email corpus---without the need for
manual annotations---that can be used to train and evaluate the query
formulation model. In addition, we describe a deep convolutional neural network
that demonstrates satisfactory performance on this query formulation task when
evaluated on the publicly available Avocado dataset and a proprietary dataset
of internal emails obtained through an employee participation program.Comment: CIKM2017. Proceedings of the 26th ACM International Conference on
Information and Knowledge Management. 201
A study of query expansion methods for patent retrieval
Patent retrieval is a recall-oriented search task where the objective is to find all possible relevant documents. Queries in patent retrieval are typically very long since they take the form of a patent claim or even a full patent application in the case of priorart patent search. Nevertheless, there is generally a significant mismatch between the query and the relevant documents, often leading to low retrieval effectiveness. Some previous work has
tried to address this mismatch through the application of query expansion (QE) techniques which have generally showed
effectiveness for many other retrieval tasks. However, results of QE on patent search have been found to be very disappointing. We present a review of previous investigations of QE in patent retrieval, and explore some of these techniques on a prior-art patent search task. In addition, a novel method for QE using automatically generated synonyms set is presented. While previous QE techniques fail to improve over baseline retrieval, our new approach show statistically better retrieval precision over
the baseline, although not for recall. In addition, it proves to be significantly more efficient than existing techniques. An extensive analysis to the results is presented which seeks to better understand situations where these QE techniques succeed or fail
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions
Visual localization enables autonomous vehicles to navigate in their
surroundings and augmented reality applications to link virtual to real worlds.
Practical visual localization approaches need to be robust to a wide variety of
viewing condition, including day-night changes, as well as weather and seasonal
variations, while providing highly accurate 6 degree-of-freedom (6DOF) camera
pose estimates. In this paper, we introduce the first benchmark datasets
specifically designed for analyzing the impact of such factors on visual
localization. Using carefully created ground truth poses for query images taken
under a wide variety of conditions, we evaluate the impact of various factors
on 6DOF camera pose estimation accuracy through extensive experiments with
state-of-the-art localization approaches. Based on our results, we draw
conclusions about the difficulty of different conditions, showing that
long-term localization is far from solved, and propose promising avenues for
future work, including sequence-based localization approaches and the need for
better local features. Our benchmark is available at visuallocalization.net.Comment: Accepted to CVPR 2018 as a spotligh
Entity Linking for Queries by Searching Wikipedia Sentences
We present a simple yet effective approach for linking entities in queries.
The key idea is to search sentences similar to a query from Wikipedia articles
and directly use the human-annotated entities in the similar sentences as
candidate entities for the query. Then, we employ a rich set of features, such
as link-probability, context-matching, word embeddings, and relatedness among
candidate entities as well as their related entities, to rank the candidates
under a regression based framework. The advantages of our approach lie in two
aspects, which contribute to the ranking process and final linking result.
First, it can greatly reduce the number of candidate entities by filtering out
irrelevant entities with the words in the query. Second, we can obtain the
query sensitive prior probability in addition to the static link-probability
derived from all Wikipedia articles. We conduct experiments on two benchmark
datasets on entity linking for queries, namely the ERD14 dataset and the GERDAQ
dataset. Experimental results show that our method outperforms state-of-the-art
systems and yields 75.0% in F1 on the ERD14 dataset and 56.9% on the GERDAQ
dataset
Extracting Hierarchies of Search Tasks & Subtasks via a Bayesian Nonparametric Approach
A significant amount of search queries originate from some real world
information need or tasks. In order to improve the search experience of the end
users, it is important to have accurate representations of tasks. As a result,
significant amount of research has been devoted to extracting proper
representations of tasks in order to enable search systems to help users
complete their tasks, as well as providing the end user with better query
suggestions, for better recommendations, for satisfaction prediction, and for
improved personalization in terms of tasks. Most existing task extraction
methodologies focus on representing tasks as flat structures. However, tasks
often tend to have multiple subtasks associated with them and a more
naturalistic representation of tasks would be in terms of a hierarchy, where
each task can be composed of multiple (sub)tasks. To this end, we propose an
efficient Bayesian nonparametric model for extracting hierarchies of such tasks
\& subtasks. We evaluate our method based on real world query log data both
through quantitative and crowdsourced experiments and highlight the importance
of considering task/subtask hierarchies.Comment: 10 pages. Accepted at SIGIR 2017 as a full pape
mARC: Memory by Association and Reinforcement of Contexts
This paper introduces the memory by Association and Reinforcement of Contexts
(mARC). mARC is a novel data modeling technology rooted in the second
quantization formulation of quantum mechanics. It is an all-purpose incremental
and unsupervised data storage and retrieval system which can be applied to all
types of signal or data, structured or unstructured, textual or not. mARC can
be applied to a wide range of information clas-sification and retrieval
problems like e-Discovery or contextual navigation. It can also for-mulated in
the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast
to Conway approach, the objects evolve in a massively multidimensional space.
In order to start evaluating the potential of mARC we have built a mARC-based
Internet search en-gine demonstrator with contextual functionality. We compare
the behavior of the mARC demonstrator with Google search both in terms of
performance and relevance. In the study we find that the mARC search engine
demonstrator outperforms Google search by an order of magnitude in response
time while providing more relevant results for some classes of queries
Target Apps Selection: Towards a Unified Search Framework for Mobile Devices
With the recent growth of conversational systems and intelligent assistants
such as Apple Siri and Google Assistant, mobile devices are becoming even more
pervasive in our lives. As a consequence, users are getting engaged with the
mobile apps and frequently search for an information need in their apps.
However, users cannot search within their apps through their intelligent
assistants. This requires a unified mobile search framework that identifies the
target app(s) for the user's query, submits the query to the app(s), and
presents the results to the user. In this paper, we take the first step forward
towards developing unified mobile search. In more detail, we introduce and
study the task of target apps selection, which has various potential real-world
applications. To this aim, we analyze attributes of search queries as well as
user behaviors, while searching with different mobile apps. The analyses are
done based on thousands of queries that we collected through crowdsourcing. We
finally study the performance of state-of-the-art retrieval models for this
task and propose two simple yet effective neural models that significantly
outperform the baselines. Our neural approaches are based on learning
high-dimensional representations for mobile apps. Our analyses and experiments
suggest specific future directions in this research area.Comment: To appear at SIGIR 201
Exploring Topic-based Language Models for Effective Web Information Retrieval
The main obstacle for providing focused search is the relative opaqueness of search request -- searchers tend to express their complex information needs in only a couple of keywords. Our overall aim is to find out if, and how, topic-based language models can lead to more effective web information retrieval. In this paper we explore retrieval performance of a topic-based model that combines topical models with other language models based on cross-entropy. We first define our topical categories and train our topical models on the .GOV2 corpus by building parsimonious language models. We then test the topic-based model on TREC8 small Web data collection for ad-hoc search.Our experimental results show that the topic-based model outperforms the standard language model and parsimonious model
- âŠ