21,547 research outputs found
Reply With: Proactive Recommendation of Email Attachments
Email responses often contain items-such as a file or a hyperlink to an
external document-that are attached to or included inline in the body of the
message. Analysis of an enterprise email corpus reveals that 35% of the time
when users include these items as part of their response, the attachable item
is already present in their inbox or sent folder. A modern email client can
proactively retrieve relevant attachable items from the user's past emails
based on the context of the current conversation, and recommend them for
inclusion, to reduce the time and effort involved in composing the response. In
this paper, we propose a weakly supervised learning framework for recommending
attachable items to the user. As email search systems are commonly available,
we constrain the recommendation task to formulating effective search queries
from the context of the conversations. The query is submitted to an existing IR
system to retrieve relevant items for attachment. We also present a novel
strategy for generating labels from an email corpus---without the need for
manual annotations---that can be used to train and evaluate the query
formulation model. In addition, we describe a deep convolutional neural network
that demonstrates satisfactory performance on this query formulation task when
evaluated on the publicly available Avocado dataset and a proprietary dataset
of internal emails obtained through an employee participation program.Comment: CIKM2017. Proceedings of the 26th ACM International Conference on
Information and Knowledge Management. 201
A study of query expansion methods for patent retrieval
Patent retrieval is a recall-oriented search task where the objective is to find all possible relevant documents. Queries in patent retrieval are typically very long since they take the form of a patent claim or even a full patent application in the case of priorart patent search. Nevertheless, there is generally a significant mismatch between the query and the relevant documents, often leading to low retrieval effectiveness. Some previous work has
tried to address this mismatch through the application of query expansion (QE) techniques which have generally showed
effectiveness for many other retrieval tasks. However, results of QE on patent search have been found to be very disappointing. We present a review of previous investigations of QE in patent retrieval, and explore some of these techniques on a prior-art patent search task. In addition, a novel method for QE using automatically generated synonyms set is presented. While previous QE techniques fail to improve over baseline retrieval, our new approach show statistically better retrieval precision over
the baseline, although not for recall. In addition, it proves to be significantly more efficient than existing techniques. An extensive analysis to the results is presented which seeks to better understand situations where these QE techniques succeed or fail
Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness
In this paper we present a systematic analysis of document
retrieval using unstructured and structured queries within
the score region algebra (SRA) structured retrieval framework. The behavior of di®erent retrieval models, namely
Boolean, tf.idf, GPX, language models, and Okapi, is tested
using the transparent SRA framework in our three-level structured retrieval system called TIJAH. The retrieval models are implemented along four elementary retrieval aspects: element and term selection, element score computation, score combination, and score propagation.
The analysis is performed on a numerous experiments
evaluated on TREC and CLEF collections, using manually
generated unstructured and structured queries. Unstructured queries range from the short title queries to long title
+ description + narrative queries. For generating structured
queries we exploit the knowledge of the document structure
and the content used to semantically describe or classify
documents. We show that such structured information can
be utilized in retrieval engines to give more precise answers to user queries then when using unstructured queries
Enterprise engineering using semantic technologies
Modern Enterprises are facing unprecedented challenges in every aspect of their businesses: from marketing research, invention of products, prototyping, production, sales to billing. Innovation is the key to enhancing enterprise performances and knowledge is the main driving force in creating innovation. The identification and effective management of valuable knowledge, however, remains an illusive topic. Knowledge management (KM) techniques, such as enterprise process modelling, have long been recognised for their value and practiced as part of normal business. There are plentiful of KM techniques. However, what is still lacking is a holistic KM approach that enables one to fully connect KM efforts with existing business knowledge and practices already in IT systems, such as organisational memories. To address this problem, we present an integrated three-dimensional KM approach that supports innovative semantics technologies. Its automated formal methods allow us to tap into modern business practices and capitalise on existing knowledge. It closes the knowledge management cycle with user feedback loops. Since we are making use of reliable existing knowledge and methods, new knowledge can be extracted with less effort comparing with another method where new information has to be created from scratch
Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization
Solving for adversarial examples with projected gradient descent has been
demonstrated to be highly effective in fooling the neural network based
classifiers. However, in the black-box setting, the attacker is limited only to
the query access to the network and solving for a successful adversarial
example becomes much more difficult. To this end, recent methods aim at
estimating the true gradient signal based on the input queries but at the cost
of excessive queries. We propose an efficient discrete surrogate to the
optimization problem which does not require estimating the gradient and
consequently becomes free of the first order update hyperparameters to tune.
Our experiments on Cifar-10 and ImageNet show the state of the art black-box
attack performance with significant reduction in the required queries compared
to a number of recently proposed methods. The source code is available at
https://github.com/snu-mllab/parsimonious-blackbox-attack.Comment: Accepted and to appear at ICML 201
Generating Synthetic Data for Neural Keyword-to-Question Models
Search typically relies on keyword queries, but these are often semantically
ambiguous. We propose to overcome this by offering users natural language
questions, based on their keyword queries, to disambiguate their intent. This
keyword-to-question task may be addressed using neural machine translation
techniques. Neural translation models, however, require massive amounts of
training data (keyword-question pairs), which is unavailable for this task. The
main idea of this paper is to generate large amounts of synthetic training data
from a small seed set of hand-labeled keyword-question pairs. Since natural
language questions are available in large quantities, we develop models to
automatically generate the corresponding keyword queries. Further, we introduce
various filtering mechanisms to ensure that synthetic training data is of high
quality. We demonstrate the feasibility of our approach using both automatic
and manual evaluation. This is an extended version of the article published
with the same title in the Proceedings of ICTIR'18.Comment: Extended version of ICTIR'18 full paper, 11 page
- …