15,667 research outputs found
Technical note: Bias and the quantification of stability
Research on bias in machine learning algorithms has generally been concerned with the
impact of bias on predictive accuracy. We believe that there are other factors that should
also play a role in the evaluation of bias. One such factor is the stability of the algorithm;
in other words, the repeatability of the results. If we obtain two sets of data from the same
phenomenon, with the same underlying probability distribution, then we would like our
learning algorithm to induce approximately the same concepts from both sets of data. This
paper introduces a method for quantifying stability, based on a measure of the agreement
between concepts. We also discuss the relationships among stability, predictive accuracy,
and bias
Distribution-Based Categorization of Classifier Transfer Learning
Transfer Learning (TL) aims to transfer knowledge acquired in one problem,
the source problem, onto another problem, the target problem, dispensing with
the bottom-up construction of the target model. Due to its relevance, TL has
gained significant interest in the Machine Learning community since it paves
the way to devise intelligent learning models that can easily be tailored to
many different applications. As it is natural in a fast evolving area, a wide
variety of TL methods, settings and nomenclature have been proposed so far.
However, a wide range of works have been reporting different names for the same
concepts. This concept and terminology mixture contribute however to obscure
the TL field, hindering its proper consideration. In this paper we present a
review of the literature on the majority of classification TL methods, and also
a distribution-based categorization of TL with a common nomenclature suitable
to classification problems. Under this perspective three main TL categories are
presented, discussed and illustrated with examples
Are There Good Mistakes? A Theoretical Analysis of CEGIS
Counterexample-guided inductive synthesis CEGIS is used to synthesize
programs from a candidate space of programs. The technique is guaranteed to
terminate and synthesize the correct program if the space of candidate programs
is finite. But the technique may or may not terminate with the correct program
if the candidate space of programs is infinite. In this paper, we perform a
theoretical analysis of counterexample-guided inductive synthesis technique. We
investigate whether the set of candidate spaces for which the correct program
can be synthesized using CEGIS depends on the counterexamples used in inductive
synthesis, that is, whether there are good mistakes which would increase the
synthesis power. We investigate whether the use of minimal counterexamples
instead of arbitrary counterexamples expands the set of candidate spaces of
programs for which inductive synthesis can successfully synthesize a correct
program. We consider two kinds of counterexamples: minimal counterexamples and
history bounded counterexamples. The history bounded counterexample used in any
iteration of CEGIS is bounded by the examples used in previous iterations of
inductive synthesis. We examine the relative change in power of inductive
synthesis in both cases. We show that the synthesis technique using minimal
counterexamples MinCEGIS has the same synthesis power as CEGIS but the
synthesis technique using history bounded counterexamples HCEGIS has different
power than that of CEGIS, but none dominates the other.Comment: In Proceedings SYNT 2014, arXiv:1407.493
Myths and Legends of the Baldwin Effect
This position paper argues that the Baldwin effect is widely
misunderstood by the evolutionary computation community. The
misunderstandings appear to fall into two general categories.
Firstly, it is commonly believed that the Baldwin effect is
concerned with the synergy that results when there is an evolving
population of learning individuals. This is only half of the story.
The full story is more complicated and more interesting. The Baldwin
effect is concerned with the costs and benefits of lifetime
learning by individuals in an evolving population. Several
researchers have focussed exclusively on the benefits, but there
is much to be gained from attention to the costs. This paper explains
the two sides of the story and enumerates ten of the costs and
benefits of lifetime learning by individuals in an evolving population.
Secondly, there is a cluster of misunderstandings about the relationship
between the Baldwin effect and Lamarckian inheritance of acquired
characteristics. The Baldwin effect is not Lamarckian. A Lamarckian
algorithm is not better for most evolutionary computing problems than
a Baldwinian algorithm. Finally, Lamarckian inheritance is not a
better model of memetic (cultural) evolution than the Baldwin effect
Concept Learning By Example Decomposition
For efficient understanding and prediction in natural systems, even in artificially closed ones, we usually need to consider a number of factors that may combine in simple or complex ways. Additionally, many modern scientific disciplines face increasingly large datasets from which to extract knowledge (for example, genomics). Thus to learn all but the most trivial regularities in the natural world, we rely on different ways of simplifying the learning problem. One simplifying technique that is highly pervasive in nature is to break down a large learning problem into smaller ones; to learn the smaller, more manageable problems; and then to recombine them to obtain the larger picture. It is widely accepted in machine learning that it is easier to learn several smaller decomposed concepts than a single large one. Though many machine learning methods exploit it, the process of decomposition of a learning problem has not been studied adequately from a theoretical perspective. Typically such decomposition of concepts is achieved in highly constrained environments, or aided by human experts. In this work, we investigate concept learning by example decomposition in a general probably approximately correct (PAC) setting for Boolean learning. We develop sample complexity bounds for the different steps involved in the process. We formally show that if the cost of example partitioning is kept low then it is highly advantageous to learn by example decomposition. To demonstrate the efficacy of this framework, we interpret the theory in the context of feature extraction. We discover that many vague concepts in feature extraction, starting with what exactly a feature is, can be formalized unambiguously by this new theory of feature extraction. We analyze some existing feature learning algorithms in light of this theory, and finally demonstrate its constructive nature by generating a new learning algorithm from theoretical results
- …