27 research outputs found

    Machine Learning of Generic and User-Focused Summarization

    Full text link
    A key problem in text summarization is finding a salience function which determines what information in the source should be included in the summary. This paper describes the use of machine learning on a training corpus of documents and their abstracts to discover salience functions which describe what combination of features is optimal for a given summarization task. The method addresses both "generic" and user-focused summaries.Comment: In Proceedings of the Fifteenth National Conference on AI (AAAI-98), p. 821-82

    Machine Learning of User Profiles: Representational Issues

    Full text link
    As more information becomes available electronically, tools for finding information of interest to users becomes increasingly important. The goal of the research described here is to build a system for generating comprehensible user profiles that accurately capture user interest with minimum user interaction. The research described here focuses on the importance of a suitable generalization hierarchy and representation for learning profiles which are predictively accurate and comprehensible. In our experiments we evaluated both traditional features based on weighted term vectors as well as subject features corresponding to categories which could be drawn from a thesaurus. Our experiments, conducted in the context of a content-based profiling system for on-line newspapers on the World Wide Web (the IDD News Browser), demonstrate the importance of a generalization hierarchy and the promise of combining natural language processing techniques with machine learning (ML) to address an information retrieval (IR) problem.Comment: 6 page

    Constructive Induction-based Learning Agents: An Architecture and Preliminary Experiments

    No full text
    This paper introduces a new type of intelligent agent called a constructive induction-based learning agent (CILA). This agent differs from other adaptive agents because it has the ability to not only learn how to assist a user in some task, but also to incrementally adapt its knowledge representation space to better fit the given learning task. The agent's ability to autonomously make problem-oriented modifications to the originally given representation space is due to its constructive induction (CI) learning method. Selective induction (SI) learning methods, and agents based on these methods, rely on a good representation space. A good representation space has no misclassification noise, inter-correlated attributes or irrelevant attributes. Our proposed CILA has methods for overcoming all of these problems. In agent domains with poor representations, the CIbased learning agent will learn more accurate rules and be more useful than an SI-based learning agent. This paper gives an archit..

    By

    No full text
    Machine learning (ML) algorithms are increasingly being pressed into service to help users understand and detect patterns or regularities found in large amounts of data. These tools are needed to help human analysts make sense of the increasing amount of complex data available electronically from domains as diverse as computer vision, to world economics. One of th

    Bloedorn E: Exploiting Available Domain Knowledge to Improve Mining Aviation Safety and Network Security Data

    No full text
    Abstract. This paper discusses a method for incorporating available domain knowledge into data mining techniques in order to improve the interestingness of the discovered rules. Existing domain knowledge is represented by a simple grammar and is used within the algorithms in order to reduce the search space and generate more interesting results. We implemented the proposed approach in the A-Priori and C4.5 algorithms and applied them to data from aviation safety and intrusion detection domains. Our experiments show promising results.

    Using NLP for Machine Learning of User Profiles 1

    No full text
    As more information becomes available electronically, tools for finding information of interest to users becomes increasingly important. The goal of the research described here is to build a system for generating comprehensible user profiles that accurately capture user interest with minimum user interaction. The research focuses on the importance of a suitable generalization hierarchy and representation for learning profiles which are predictively accurate and comprehensible. In our experiments we evaluated both traditional features based on weighted term vectors as well as subject features corresponding to categories which could be drawn from a thesaurus. Our experiments, conducted in the context of a content-based profiling system for on-line newspapers on the World Wide Web (the IDD News Browser), demonstrate the importance of a generalization hierarchy and the promise of combining natural language processing techniques with machin
    corecore