12,035 research outputs found
Recommended from our members
A survey of induction algorithms for machine learning
Central to all systems for machine learning from examples is an induction algorithm. The purpose of the algorithm is to generalize from a finite set of training examples a description consistent with the examples seen, and, hopefully, with the potentially infinite set of examples not seen. This paper surveys four machine learning induction algorithms. The knowledge representation schemes and a PDL description of algorithm control are emphasized. System characteristics that are peculiar to a domain of application are de-emphasized. Finally, a comparative summary of the learning algorithms is presented
Measuring efficiency in high-accuracy, broad-coverage statistical parsing
Very little attention has been paid to the comparison of efficiency between
high accuracy statistical parsers. This paper proposes one machine-independent
metric that is general enough to allow comparisons across very different
parsing architectures. This metric, which we call ``events considered'',
measures the number of ``events'', however they are defined for a particular
parser, for which a probability must be calculated, in order to find the parse.
It is applicable to single-pass or multi-stage parsers. We discuss the
advantages of the metric, and demonstrate its usefulness by using it to compare
two parsers which differ in several fundamental ways.Comment: 8 pages, 4 figures, 2 table
Substructure Discovery Using Minimum Description Length and Background Knowledge
The ability to identify interesting and repetitive substructures is an
essential component to discovering knowledge in structural data. We describe a
new version of our SUBDUE substructure discovery system based on the minimum
description length principle. The SUBDUE system discovers substructures that
compress the original data and represent structural concepts in the data. By
replacing previously-discovered substructures in the data, multiple passes of
SUBDUE produce a hierarchical description of the structural regularities in the
data. SUBDUE uses a computationally-bounded inexact graph match that identifies
similar, but not identical, instances of a substructure and finds an
approximate measure of closeness of two substructures when under computational
constraints. In addition to the minimum description length principle, other
background knowledge can be used by SUBDUE to guide the search towards more
appropriate substructures. Experiments in a variety of domains demonstrate
SUBDUE's ability to find substructures capable of compressing the original data
and to discover structural concepts important to the domain. Description of
Online Appendix: This is a compressed tar file containing the SUBDUE discovery
system, written in C. The program accepts as input databases represented in
graph form, and will output discovered substructures with their corresponding
value.Comment: See http://www.jair.org/ for an online appendix and other files
accompanying this articl
- …