63 research outputs found

    Detecting action-items in e-mail

    Full text link

    A pattern mining approach for information filtering systems

    Get PDF
    It is a big challenge to clearly identify the boundary between positive and negative streams for information filtering systems. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on the RCV1 data collection, and substantial experiments show that the proposed approach achieves encouraging performance and the performance is also consistent for adaptive filtering as well

    A Classification Mechanism To Avoid Useless Data From Osn Walls

    Get PDF
    The attempt of the present work is consequently to propose and experimentally estimate an automated system called Filtered Wall (FW) which is competent to filter unwanted messages from OSN user walls. One essential issue in today’s Online Social Networks (OSNs) is to give users the provision to control the messages posted on their own private space to shun that unwanted content is displayed. This is achieved through a flexible rule-based system that let users to adapt the filtering criterion to be applied to their walls and a Machine Learning-based soft classifier automatically labelling messages in support of content-based filtering. The unique set of description imitative from endogenous properties of short texts is distended here including exogenous knowledge connected to the context from which the messages create. As far as the learning model is apprehensive we confirm in the current paper the use of neural learning which is today documented as one of the well-organized solutions in text classification. In particular we base the overall short text classification strategy on Radial Basis Function Networks (RBFN) for their established potential in acting as soft classifiers in managing noisy data and essentially vague classes.

    Non-Compositional Term Dependence for Information Retrieval

    Full text link
    Modelling term dependence in IR aims to identify co-occurring terms that are too heavily dependent on each other to be treated as a bag of words, and to adapt the indexing and ranking accordingly. Dependent terms are predominantly identified using lexical frequency statistics, assuming that (a) if terms co-occur often enough in some corpus, they are semantically dependent; (b) the more often they co-occur, the more semantically dependent they are. This assumption is not always correct: the frequency of co-occurring terms can be separate from the strength of their semantic dependence. E.g. "red tape" might be overall less frequent than "tape measure" in some corpus, but this does not mean that "red"+"tape" are less dependent than "tape"+"measure". This is especially the case for non-compositional phrases, i.e. phrases whose meaning cannot be composed from the individual meanings of their terms (such as the phrase "red tape" meaning bureaucracy). Motivated by this lack of distinction between the frequency and strength of term dependence in IR, we present a principled approach for handling term dependence in queries, using both lexical frequency and semantic evidence. We focus on non-compositional phrases, extending a recent unsupervised model for their detection [21] to IR. Our approach, integrated into ranking using Markov Random Fields [31], yields effectiveness gains over competitive TREC baselines, showing that there is still room for improvement in the very well-studied area of term dependence in IR

    A Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis

    Full text link
    Web semantic access in specific domains calls for specialized search engines with enhanced semantic querying and indexing capacities, which pertain both to information retrieval (IR) and to information extraction (IE). A rich linguistic analysis is required either to identify the relevant semantic units to index and weight them according to linguistic specific statistical distribution, or as the basis of an information extraction process. Recent developments make Natural Language Processing (NLP) techniques reliable enough to process large collections of documents and to enrich them with semantic annotations. This paper focuses on the design and the development of a text processing platform, Ogmios, which has been developed in the ALVIS project. The Ogmios platform exploits existing NLP modules and resources, which may be tuned to specific domains and produces linguistically annotated documents. We show how the three constraints of genericity, domain semantic awareness and performance can be handled all together

    Filtrage automatique de courriels : une approche adaptative et multi niveaux

    No full text
    International audienceCet article propose un système de courriers électroniques paramétrable avec plusieurs niveaux de filtrage: un filtrage simple basé sur l'information contenue dans l'entête du courriel ; un filtrage booléen basé sur l'existence ou non de mots clés dans le corps du courriel ; un filtrage vectoriel basé sur le poids de contribution des mots clés du courriel ; un filtrage approfondi basé sur les propriétés linguistiques caractérisant la structure et le contenu du courriel. Nous proposons une solution adaptative qui offre au système la possibilité d'apprendre à partir de données, de modifier ses connaissances et de s'adapter à l'évolution des intérêts de l'utilisateur et à la variation de la nature des courriels dans le temps. De plus, nous utilisons un réseau lexical permettant d'améliorer la représentation du courriel en prenant en considération l'aspect sémantique.<BR /
    • …
    corecore