27 research outputs found
Machine Learning of Generic and User-Focused Summarization
A key problem in text summarization is finding a salience function which
determines what information in the source should be included in the summary.
This paper describes the use of machine learning on a training corpus of
documents and their abstracts to discover salience functions which describe
what combination of features is optimal for a given summarization task. The
method addresses both "generic" and user-focused summaries.Comment: In Proceedings of the Fifteenth National Conference on AI (AAAI-98),
p. 821-82
Machine Learning of User Profiles: Representational Issues
As more information becomes available electronically, tools for finding
information of interest to users becomes increasingly important. The goal of
the research described here is to build a system for generating comprehensible
user profiles that accurately capture user interest with minimum user
interaction. The research described here focuses on the importance of a
suitable generalization hierarchy and representation for learning profiles
which are predictively accurate and comprehensible. In our experiments we
evaluated both traditional features based on weighted term vectors as well as
subject features corresponding to categories which could be drawn from a
thesaurus. Our experiments, conducted in the context of a content-based
profiling system for on-line newspapers on the World Wide Web (the IDD News
Browser), demonstrate the importance of a generalization hierarchy and the
promise of combining natural language processing techniques with machine
learning (ML) to address an information retrieval (IR) problem.Comment: 6 page
Constructive Induction-based Learning Agents: An Architecture and Preliminary Experiments
This paper introduces a new type of intelligent agent called a constructive induction-based learning agent (CILA). This agent differs from other adaptive agents because it has the ability to not only learn how to assist a user in some task, but also to incrementally adapt its knowledge representation space to better fit the given learning task. The agent's ability to autonomously make problem-oriented modifications to the originally given representation space is due to its constructive induction (CI) learning method. Selective induction (SI) learning methods, and agents based on these methods, rely on a good representation space. A good representation space has no misclassification noise, inter-correlated attributes or irrelevant attributes. Our proposed CILA has methods for overcoming all of these problems. In agent domains with poor representations, the CIbased learning agent will learn more accurate rules and be more useful than an SI-based learning agent. This paper gives an archit..
By
Machine learning (ML) algorithms are increasingly being pressed into service to help users understand and detect patterns or regularities found in large amounts of data. These tools are needed to help human analysts make sense of the increasing amount of complex data available electronically from domains as diverse as computer vision, to world economics. One of th
Bloedorn E: Exploiting Available Domain Knowledge to Improve Mining Aviation Safety and Network Security Data
Abstract. This paper discusses a method for incorporating available domain knowledge into data mining techniques in order to improve the interestingness of the discovered rules. Existing domain knowledge is represented by a simple grammar and is used within the algorithms in order to reduce the search space and generate more interesting results. We implemented the proposed approach in the A-Priori and C4.5 algorithms and applied them to data from aviation safety and intrusion detection domains. Our experiments show promising results.
Using NLP for Machine Learning of User Profiles 1
As more information becomes available electronically, tools for finding information of interest to users becomes increasingly important. The goal of the research described here is to build a system for generating comprehensible user profiles that accurately capture user interest with minimum user interaction. The research focuses on the importance of a suitable generalization hierarchy and representation for learning profiles which are predictively accurate and comprehensible. In our experiments we evaluated both traditional features based on weighted term vectors as well as subject features corresponding to categories which could be drawn from a thesaurus. Our experiments, conducted in the context of a content-based profiling system for on-line newspapers on the World Wide Web (the IDD News Browser), demonstrate the importance of a generalization hierarchy and the promise of combining natural language processing techniques with machin