7 research outputs found

    Reply With: Proactive Recommendation of Email Attachments

    Full text link
    Email responses often contain items-such as a file or a hyperlink to an external document-that are attached to or included inline in the body of the message. Analysis of an enterprise email corpus reveals that 35% of the time when users include these items as part of their response, the attachable item is already present in their inbox or sent folder. A modern email client can proactively retrieve relevant attachable items from the user's past emails based on the context of the current conversation, and recommend them for inclusion, to reduce the time and effort involved in composing the response. In this paper, we propose a weakly supervised learning framework for recommending attachable items to the user. As email search systems are commonly available, we constrain the recommendation task to formulating effective search queries from the context of the conversations. The query is submitted to an existing IR system to retrieve relevant items for attachment. We also present a novel strategy for generating labels from an email corpus---without the need for manual annotations---that can be used to train and evaluate the query formulation model. In addition, we describe a deep convolutional neural network that demonstrates satisfactory performance on this query formulation task when evaluated on the publicly available Avocado dataset and a proprietary dataset of internal emails obtained through an employee participation program.Comment: CIKM2017. Proceedings of the 26th ACM International Conference on Information and Knowledge Management. 201

    Integration of association rules and ontology for semantic-based query expansion

    Get PDF
    Paper accepted for publication in Data and Knowledge Engineering. Retrieved 6/26/2006 from http://www.ischool.drexel.edu/faculty/thu/My%20Publication/Journal-papers/song_dek_2006.pdf.The goal of query expansion is to reduce the mismatch between documents and queries by expanding the query using words or phrases with a similar meaning or some other statistical relation to the set of relevant documents. One of the limitations with query expansion techniques is that a query is often expanded only by the linguistic features of terms. To tackle this problem, we propose a novel semantic query expansion technique that combines association rules with ontologies and Natural Language Processing techniques. Our technique utilizes the association rule discovery to find good candidate terms to improve the retrieval performance. These candidate terms are automatically derived from collections and added to the original query. Our technique is differentiated from others in that 1) it utilizes the semantics as well as linguistic properties of unstructured text corpus, 2) it makes use of contextual properties of important terms discovered by association rules, and 3) ontologies’ entry is added to the query by disambiguating word senses. Experiments conducted on TREC collections give encouraging results. We achieve from 13.41% to 32.39% improvement in term of P@20 and from 8.39% to 14.22% in terms of F-measure with TREC ad hoc queries. Detailed descriptions of the experimental results are discussed in the paper

    The Topological Information Retrieval System and the Topological Paradigm: a Unification of the Major Models of Information Retrieval.

    Get PDF
    There are three topics discussed in this work. The first topic is an investigation of the topological properties of the p-norm model of Salton, Fox, and Wu. It is shown that certain properties of the p-norm model that one would expect to hold, given the topological origin of the model, do not in fact hold. These properties include the ability to change the query by changing p, and the ability to adequately separate documents. Since these properties do hold in the model as actually constructed, it must be that the properties do not follow from the topological origin of the model. The second topic is a search for a usable model with an adequate theoretical basis. In order to construct such a model, the topological paradigm is defined. This paradigm establishes a minimal set of requirements that any system with a topological foundation should have. A particular example of the paradigm, the Topological Information Retrieval System (TIRS), is constructed. It is shown that all of the desired properties of the p-norm model hold for the TIRS model. A discussion of the various query systems that may be used with TIRS is given. These query systems include a natural language interface and a weighted boolean query system, as well as two specialized interfaces. The weighted boolean query system has the property that pairs, when treated as units, have all of the properties of the non-weighted boolean lattice. The run time of the system is estimated, once for an inverted file implementation, and once for an implementation using kd-trees. These run times are much better than for traditional systems. The third topic is a reexamination of the standard models of information retrieval, considered as cases of the topological paradigm. The paradigm is shown to be a unifying model, in that all of the standard models, i.e., the boolean, vector space, fuzzy set theoretic, and probabilistic models, as well as a hierarchical model, are shown to be instances of the paradigm. An appendix contains a review of relevant topics from topology and abstract algebra

    NASA RECON: Course Development, Administration, and Evaluation

    Get PDF
    The R and D activities addressing the development, administration, and evaluation of a set of transportable, college-level courses to educate science and engineering students in the effective use of automated scientific and technical information storage and retrieval systems, and, in particular, in the use of the NASA RECON system, are discussed. The long-range scope and objectives of these contracted activities are overviewed and the progress which has been made toward these objectives during FY 1983-1984 is highlighted. In addition, the results of a survey of 237 colleges and universities addressing course needs are presented

    Indexação e controlo da terminologia em bibliotecas do ensino superior politécnico em Portugal: o sistema no Instituto Politécnico de Portalegre

    Get PDF
    O principal objetivo deste trabalho foi evidenciar a dificuldade no desenvolvimento do processo de indexação em bases bibliográficas. Os instrumentos normativos e de apoio a essa tarefa são dispersos e difíceis de utilizar para grande parte dos profissionais. Os termos de indexação são recolhidos de várias fontes, potenciando a sua inconsistência se não for feito o controlo de autoridades, pondo em causa a recuperação de informação eficaz por parte dos utilizadores dos catálogos. A investigação divide-se em duas partes distintas: na primeira é identificado o uso de instrumentos normativos e de apoio à indexação nas bibliotecas politécnicas em Portugal. Na segunda parte são observadas as caraterísticas dos índices de assunto num grupo de bibliotecas (Instituto Politécnico de Portalegre). A metodologia desenvolveuse de acordo com os dois níveis de observação, no primeiro através de um inquérito às instituições, e no segundo através da análise de registos bibliográficos, em que foram definidos três eixos centrais: formato UNIMARC, sintaxe dos cabeçalhos e terminologia. Através dos resultados obtidos, conclui-se que na maioria dos casos não são usados instrumentos normativos, e o sistema de indexação português SIPORbase é muito pouco utilizado. A recolha de termos é essencialmente de tesauros, e não é feito o controlo de autoridades, quer a nível da terminologia, quer da sintaxe dos cabeçalhos. A falta de consistência observada nos índices de assunto demonstrou que é necessário uma política de indexação, e com esse fim foi apresentado um modelo que se sugere utilizar

    Automatic Query Formulations in Information Retrieval

    Full text link
    Modern information retrieval systems are designed to supply relevant information in response to requests received from the user population. In most retrieval environments the search requests consist of keywords, or index terms, interrelated by appropriate Boolean operators. Since it is difficult for untrained users to generate effective Boolean search requests, trained search intermediaries are normally used to translate original statements of user need into useful Boolean search formulations. Methods are introduced in this study which reduce the role of the search intermediaries by making it possible to generate Boolean search formulations completely automatically from natural language statements provided by the system patrons. Frequency considerations are used automatically to generate appropriate term combinations as well as Boolean connectives relating the terms. Methods are covered to produce automatic query formulations both in a standard Boolean logic system, as well as in an extended Boolean system in which the strict interpretation of the connectives is relaxed. Experimental results are supplied to evaluate the effectiveness of the automatic query formulation process in practice
    corecore