7 research outputs found
Reply With: Proactive Recommendation of Email Attachments
Email responses often contain items-such as a file or a hyperlink to an
external document-that are attached to or included inline in the body of the
message. Analysis of an enterprise email corpus reveals that 35% of the time
when users include these items as part of their response, the attachable item
is already present in their inbox or sent folder. A modern email client can
proactively retrieve relevant attachable items from the user's past emails
based on the context of the current conversation, and recommend them for
inclusion, to reduce the time and effort involved in composing the response. In
this paper, we propose a weakly supervised learning framework for recommending
attachable items to the user. As email search systems are commonly available,
we constrain the recommendation task to formulating effective search queries
from the context of the conversations. The query is submitted to an existing IR
system to retrieve relevant items for attachment. We also present a novel
strategy for generating labels from an email corpus---without the need for
manual annotations---that can be used to train and evaluate the query
formulation model. In addition, we describe a deep convolutional neural network
that demonstrates satisfactory performance on this query formulation task when
evaluated on the publicly available Avocado dataset and a proprietary dataset
of internal emails obtained through an employee participation program.Comment: CIKM2017. Proceedings of the 26th ACM International Conference on
Information and Knowledge Management. 201
Integration of association rules and ontology for semantic-based query expansion
Paper accepted for publication in Data and Knowledge Engineering. Retrieved 6/26/2006 from http://www.ischool.drexel.edu/faculty/thu/My%20Publication/Journal-papers/song_dek_2006.pdf.The goal of query expansion is to reduce the mismatch
between documents and queries by expanding the query using words or
phrases with a similar meaning or some other statistical relation to the
set of relevant documents. One of the limitations with query expansion
techniques is that a query is often expanded only by the linguistic
features of terms. To tackle this problem, we propose a novel semantic
query expansion technique that combines association rules with
ontologies and Natural Language Processing techniques. Our technique
utilizes the association rule discovery to find good candidate terms to
improve the retrieval performance. These candidate terms are
automatically derived from collections and added to the original query.
Our technique is differentiated from others in that 1) it utilizes the
semantics as well as linguistic properties of unstructured text corpus,
2) it makes use of contextual properties of important terms discovered
by association rules, and 3) ontologies’ entry is added to the query by
disambiguating word senses. Experiments conducted on TREC
collections give encouraging results. We achieve from 13.41% to
32.39% improvement in term of P@20 and from 8.39% to 14.22% in
terms of F-measure with TREC ad hoc queries. Detailed descriptions
of the experimental results are discussed in the paper
The Topological Information Retrieval System and the Topological Paradigm: a Unification of the Major Models of Information Retrieval.
There are three topics discussed in this work. The first topic is an investigation of the topological properties of the p-norm model of Salton, Fox, and Wu. It is shown that certain properties of the p-norm model that one would expect to hold, given the topological origin of the model, do not in fact hold. These properties include the ability to change the query by changing p, and the ability to adequately separate documents. Since these properties do hold in the model as actually constructed, it must be that the properties do not follow from the topological origin of the model. The second topic is a search for a usable model with an adequate theoretical basis. In order to construct such a model, the topological paradigm is defined. This paradigm establishes a minimal set of requirements that any system with a topological foundation should have. A particular example of the paradigm, the Topological Information Retrieval System (TIRS), is constructed. It is shown that all of the desired properties of the p-norm model hold for the TIRS model. A discussion of the various query systems that may be used with TIRS is given. These query systems include a natural language interface and a weighted boolean query system, as well as two specialized interfaces. The weighted boolean query system has the property that pairs, when treated as units, have all of the properties of the non-weighted boolean lattice. The run time of the system is estimated, once for an inverted file implementation, and once for an implementation using kd-trees. These run times are much better than for traditional systems. The third topic is a reexamination of the standard models of information retrieval, considered as cases of the topological paradigm. The paradigm is shown to be a unifying model, in that all of the standard models, i.e., the boolean, vector space, fuzzy set theoretic, and probabilistic models, as well as a hierarchical model, are shown to be instances of the paradigm. An appendix contains a review of relevant topics from topology and abstract algebra
NASA RECON: Course Development, Administration, and Evaluation
The R and D activities addressing the development, administration, and evaluation of a set of transportable, college-level courses to educate science and engineering students in the effective use of automated scientific and technical information storage and retrieval systems, and, in particular, in the use of the NASA RECON system, are discussed. The long-range scope and objectives of these contracted activities are overviewed and the progress which has been made toward these objectives during FY 1983-1984 is highlighted. In addition, the results of a survey of 237 colleges and universities addressing course needs are presented
Indexação e controlo da terminologia em bibliotecas do ensino superior politécnico em Portugal: o sistema no Instituto Politécnico de Portalegre
O principal objetivo deste trabalho foi evidenciar a dificuldade no desenvolvimento
do processo de indexação em bases bibliográficas. Os instrumentos normativos e de
apoio a essa tarefa são dispersos e difíceis de utilizar para grande parte dos
profissionais. Os termos de indexação são recolhidos de várias fontes, potenciando a sua
inconsistência se não for feito o controlo de autoridades, pondo em causa a recuperação
de informação eficaz por parte dos utilizadores dos catálogos.
A investigação divide-se em duas partes distintas: na primeira é identificado o uso de
instrumentos normativos e de apoio à indexação nas bibliotecas politécnicas em
Portugal. Na segunda parte são observadas as caraterísticas dos índices de assunto num
grupo de bibliotecas (Instituto Politécnico de Portalegre). A metodologia desenvolveuse
de acordo com os dois níveis de observação, no primeiro através de um inquérito às
instituições, e no segundo através da análise de registos bibliográficos, em que foram
definidos três eixos centrais: formato UNIMARC, sintaxe dos cabeçalhos e
terminologia.
Através dos resultados obtidos, conclui-se que na maioria dos casos não são usados
instrumentos normativos, e o sistema de indexação português SIPORbase é muito pouco utilizado. A recolha de termos é essencialmente de tesauros, e não é feito o controlo de
autoridades, quer a nível da terminologia, quer da sintaxe dos cabeçalhos. A falta de
consistência observada nos índices de assunto demonstrou que é necessário uma política
de indexação, e com esse fim foi apresentado um modelo que se sugere utilizar
Automatic Query Formulations in Information Retrieval
Modern information retrieval systems are designed to supply relevant information in response to requests received from the user population. In most retrieval environments the search requests consist of keywords, or index terms, interrelated by appropriate Boolean operators. Since it is difficult for untrained users to generate effective Boolean search requests, trained search intermediaries are normally used to translate original statements of user need into useful Boolean search formulations. Methods are introduced in this study which reduce the role of the search intermediaries by making it possible to generate Boolean search formulations completely automatically from natural language statements provided by the system patrons. Frequency considerations are used automatically to generate appropriate term combinations as well as Boolean connectives relating the terms. Methods are covered to produce automatic query formulations both in a standard Boolean logic system, as well as in an extended Boolean system in which the strict interpretation of the connectives is relaxed. Experimental results are supplied to evaluate the effectiveness of the automatic query formulation process in practice