425 research outputs found
Robust classification with context-sensitive features
This paper addresses the problem of classifying observations when features are context-sensitive, especially when the testing set involves a context that is different from the training set. The paper begins with a precise definition of the problem, then general strategies are presented for enhancing the performance of classification algorithms on this type of problem. These strategies are tested on three domains. The first domain is the diagnosis of gas turbine engines. The problem is to diagnose a faulty engine in one context, such as warm weather, when the fault has previously been seen only in another context, such as cold weather. The second domain is speech recognition. The context is given by the identity of the speaker. The problem is to recognize words spoken by a new speaker, not represented in the training set. The third domain is medical prognosis. The problem is to predict whether a patient with hepatitis will live or die. The context is the age of the patient. For all three domains, exploiting context results in substantially more accurate classification
Data Engineering for the Analysis of Semiconductor Manufacturing Data
We have analyzed manufacturing data from several different semiconductor
manufacturing plants, using decision tree induction software called
Q-YIELD. The software generates rules for predicting when a given product
should be rejected. The rules are intended to help the process engineers
improve the yield of the product, by helping them to discover the causes
of rejection. Experience with Q-YIELD has taught us the importance of
data engineering -- preprocessing the data to enable or facilitate
decision tree induction. This paper discusses some of the data engineering
problems we have encountered with semiconductor manufacturing data.
The paper deals with two broad classes of problems: engineering the features
in a feature vector representation and engineering the definition of the
target concept (the classes). Manufacturing process data present special
problems for feature engineering, since the data have multiple levels of
granularity (detail, resolution). Engineering the target concept is important,
due to our focus on understanding the past, as opposed to the more common
focus in machine learning on predicting the future
Types of cost in inductive concept learning
Inductive concept learning is the task of learning to assign cases to a discrete set of classes. In real-world applications of concept learning, there are many different types of cost involved. The majority of the machine learning literature ignores all types of cost (unless accuracy is interpreted as a type of cost measure). A few papers have investigated the cost of misclassification errors. Very few papers have examined the many other types of cost. In this paper, we attempt to create a taxonomy of the different types of cost that are involved in inductive concept learning. This taxonomy may help to organize the literature on cost-sensitive learning. We hope that it will inspire researchers to investigate all types of cost in inductive concept learning in more depth
Myths and Legends of the Baldwin Effect
This position paper argues that the Baldwin effect is widely
misunderstood by the evolutionary computation community. The
misunderstandings appear to fall into two general categories.
Firstly, it is commonly believed that the Baldwin effect is
concerned with the synergy that results when there is an evolving
population of learning individuals. This is only half of the story.
The full story is more complicated and more interesting. The Baldwin
effect is concerned with the costs and benefits of lifetime
learning by individuals in an evolving population. Several
researchers have focussed exclusively on the benefits, but there
is much to be gained from attention to the costs. This paper explains
the two sides of the story and enumerates ten of the costs and
benefits of lifetime learning by individuals in an evolving population.
Secondly, there is a cluster of misunderstandings about the relationship
between the Baldwin effect and Lamarckian inheritance of acquired
characteristics. The Baldwin effect is not Lamarckian. A Lamarckian
algorithm is not better for most evolutionary computing problems than
a Baldwinian algorithm. Finally, Lamarckian inheritance is not a
better model of memetic (cultural) evolution than the Baldwin effect
Answering Subcognitive Turing Test Questions: A Reply to French
Robert French has argued that a disembodied computer is incapable of
passing a Turing Test that includes subcognitive questions. Subcognitive
questions are designed to probe the network of cultural and perceptual
associations that humans naturally develop as we live, embodied and
embedded in the world. In this paper, I show how it is possible for a
disembodied computer to answer subcognitive questions appropriately,
contrary to Frenchs claim. My approach to answering subcognitive
questions is to use statistical information extracted from a very large
collection of text. In particular, I show how it is possible to answer a
sample of subcognitive questions taken from French, by issuing queries to
a search engine that indexes about 350 million Web pages. This simple
algorithm may shed light on the nature of human (sub-) cognition, but the
scope of this paper is limited to demonstrating that French is mistaken: a
disembodied computer can answer subcognitive questions
Exploiting context when learning to classify
This paper addresses the problem of classifying observations when
features are context-sensitive, specifically when the testing set involves a context
that is different from the training set. The paper begins with a precise definition of
the problem, then general strategies are presented for enhancing the performance
of classification algorithms on this type of problem. These strategies are tested on
two domains. The first domain is the diagnosis of gas turbine engines. The
problem is to diagnose a faulty engine in one context, such as warm weather,
when the fault has previously been seen only in another context, such as cold
weather. The second domain is speech recognition. The problem is to recognize
words spoken by a new speaker, not represented in the training set. For both
domains, exploiting context results in substantially more accurate classification
Learning to Extract Keyphrases from Text
Many academic journals ask their authors to provide a list of about five to fifteen key words, to appear on the first page of each article. Since these key words are often phrases of two or more words, we prefer to call them keyphrases. There is a surprisingly wide variety of tasks for which keyphrases are useful, as we discuss in this paper. Recent commercial software, such as Microsoft?s Word 97 and Verity?s Search 97, includes algorithms that automatically extract keyphrases from documents. In this paper, we approach the problem of automatically extracting keyphrases from text as a supervised learning task. We treat a document as a set of phrases, which the learning algorithm must learn to classify as positive or negative examples of keyphrases. Our first set of experiments applies the C4.5 decision tree induction algorithm to this learning task. The second set of experiments applies the GenEx algorithm to the task. We developed the GenEx algorithm specifically for this task. The third set of experiments examines the performance of GenEx on the task of metadata generation, relative to the performance of Microsoft?s Word 97. The fourth and final set of experiments investigates the performance of GenEx on the task of highlighting, relative to Verity?s Search 97. The experimental results support the claim that a specialized learning algorithm (GenEx) can generate better keyphrases than a general-purpose learning algorithm (C4.5) and the non-learning algorithms that are used in commercial software (Word 97 and Search 97)
Self-Replicating Machines in Continuous Space with Virtual Physics
JohnnyVon is an implementation of self-replicating machines in
continuous two-dimensional space. Two types of particles drift
about in a virtual liquid. The particles are automata with
discrete internal states but continuous external relationships.
Their internal states are governed by finite state machines but
their external relationships are governed by a simulated physics
that includes Brownian motion, viscosity, and spring-like attractive
and repulsive forces. The particles can be assembled into patterns
that can encode arbitrary strings of bits. We demonstrate that, if
an arbitrary "seed" pattern is put in a "soup" of separate individual
particles, the pattern will replicate by assembling the individual
particles into copies of itself. We also show that, given sufficient
time, a soup of separate individual particles will eventually
spontaneously form self-replicating patterns. We discuss the implications
of JohnnyVon for research in nanotechnology, theoretical biology, and
artificial life
A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations
Recognizing analogies, synonyms, antonyms, and associations appear to be four\ud
distinct tasks, requiring distinct NLP algorithms. In the past, the four\ud
tasks have been treated independently, using a wide variety of algorithms.\ud
These four semantic classes, however, are a tiny sample of the full\ud
range of semantic phenomena, and we cannot afford to create ad hoc algorithms\ud
for each semantic phenomenon; we need to seek a unified approach.\ud
We propose to subsume a broad range of phenomena under analogies.\ud
To limit the scope of this paper, we restrict our attention to the subsumption\ud
of synonyms, antonyms, and associations. We introduce a supervised corpus-based\ud
machine learning algorithm for classifying analogous word pairs, and we\ud
show that it can solve multiple-choice SAT analogy questions, TOEFL\ud
synonym questions, ESL synonym-antonym questions, and similar-associated-both\ud
questions from cognitive psychology
Measuring Semantic Similarity by Latent Relational Analysis
This paper introduces Latent Relational Analysis (LRA), a method for measuring semantic similarity. LRA measures similarity in the semantic relations between two pairs of words. When two pairs have a high degree of relational similarity, they are analogous. For example, the pair cat:meow is analogous to the pair dog:bark. There is evidence from cognitive science that relational similarity is fundamental to many cognitive and linguistic tasks (e.g., analogical reasoning). In the Vector Space Model (VSM) approach to measuring relational similarity, the similarity between two pairs is calculated by the cosine of the angle between the vectors that represent the two pairs. The elements in the vectors are based on the frequencies of manually constructed patterns in a large corpus. LRA extends the VSM approach in three ways: (1) patterns are derived automatically from the corpus, (2) Singular Value Decomposition is used to smooth the frequency data, and (3) synonyms are used to reformulate word pairs. This paper describes the LRA algorithm and experimentally compares LRA to VSM on two tasks, answering college-level multiple-choice word analogy questions and classifying semantic relations in noun-modifier expressions. LRA achieves state-of-the-art results, reaching human-level performance on the analogy questions and significantly exceeding VSM performance on both tasks
- …
