4,598 research outputs found
Unsupervised feature construction for improving data representation and semantics
Attribute-based format is the main data representation format used by machine learning algorithms. When the attributes do not properly describe the initial data, performance starts to degrade. Some algorithms address this problem by internally changing the representation space, but the newly constructed features rarely have any meaning. We seek to construct, in an unsupervised way, new attributes that are more appropriate for describing a given dataset and, at the same time, comprehensible for a human user. We propose two algorithms that construct the new attributes as conjunctions of the initial primitive attributes or their negations. The generated feature sets have reduced correlations between features and succeed in catching some of the hidden relations between individuals in a dataset. For example, a feature like sky \wedge \neg building \wedge panorama would be true for non-urban images and is more informative than simple features expressing the presence or the absence of an object. The notion of Pareto optimality is used to evaluate feature sets and to obtain a balance between total correlation and the complexity of the resulted feature set. Statistical hypothesis testing is employed in order to automatically determine the values of the parameters used for constructing a data-dependent feature set. We experimentally show that our approaches achieve the construction of informative feature sets for multiple datasets. © 2013 Springer Science+Business Media New York
On Algorithms and Complexity for Sets with Cardinality Constraints
Typestate systems ensure many desirable properties of imperative programs,
including initialization of object fields and correct use of stateful library
interfaces. Abstract sets with cardinality constraints naturally generalize
typestate properties: relationships between the typestates of objects can be
expressed as subset and disjointness relations on sets, and elements of sets
can be represented as sets of cardinality one. Motivated by these applications,
this paper presents new algorithms and new complexity results for constraints
on sets and their cardinalities. We study several classes of constraints and
demonstrate a trade-off between their expressive power and their complexity.
Our first result concerns a quantifier-free fragment of Boolean Algebra with
Presburger Arithmetic. We give a nondeterministic polynomial-time algorithm for
reducing the satisfiability of sets with symbolic cardinalities to constraints
on constant cardinalities, and give a polynomial-space algorithm for the
resulting problem.
In a quest for more efficient fragments, we identify several subclasses of
sets with cardinality constraints whose satisfiability is NP-hard. Finally, we
identify a class of constraints that has polynomial-time satisfiability and
entailment problems and can serve as a foundation for efficient program
analysis.Comment: 20 pages. 12 figure
GIMO : A multi-objective anytime rule mining system to ease iterative feedback from domain experts
Data extracted from software repositories is used intensively in Software Engineering research, for example, to predict defects in source code. In our research in this area, with data from open source projects as well as an industrial partner, we noticed several shortcomings of conventional data mining approaches for classification problems: (1) Domain experts’ acceptance is of critical importance, and domain experts can provide valuable input, but it is hard to use this feedback. (2) Evaluating the quality of the model is not a matter of calculating AUC or accuracy. Instead, there are multiple objectives of varying importance with hard to quantify trade-offs. Furthermore, the performance of the model cannot be evaluated on a per-instance level in our case, because it shares aspects with the set cover problem. To overcome these problems, we take a holistic approach and develop a rule mining system that simplifies iterative feedback from domain experts and can incorporate the domain-specific evaluation needs. A central part of the system is a novel multi-objective anytime rule mining algorithm. The algorithm is based on the GRASP-PR meta-heuristic but extends it with ideas from several other approaches. We successfully applied the system in the industrial context. In the current article, we focus on the description of the algorithm and the concepts of the system. We make an implementation of the system available. © 2020 The Author
Recommended from our members
Tools for reformulating logical forms into zero-one mixed integer programs (MIPS)
A systematic procedure for transforming a set of logical statements or logical conditions imposed on a model into an Integer Linear Programming (ILP) formulation or a Mixed Integer Programming (MIP) formulation is presented. A reformulation procedure which uses the extended reverse polish representation of a compound logical form is then described. A prototype user interface by which logical forms can be reformulated and the corresponding MIP constructed and analysed within an existing Mathematical Programming modelling system is illustrated. Finally, the steps to formulate a discrete optimisation model in this way are demonstrated by means of an example
Recommended from our members
Machine learning : techniques and foundations
The field of machine learning studies computational methods for acquiring new knowledge, new skills, and new ways to organize existing knowledge. In this paper we present some of the basic techniques and principles that underlie AI research on learning, including methods for learning from examples, learning in problem solving, learning by analogy, grammar acquisition, and machine discovery. In each case, we illustrate the techniques with paradigmatic examples
Recommended from our members
Transformation of propositional calculus statements into integer and mixed integer programs: An approach towards automatic reformulation
A systematic procedure for transforming a set of logical statements or logical conditions imposed on a model into an Integer Linear Progamming (ILP) formulation Mixed Integer Programming (MIP) formulation is presented. An ILP stated as a system of linear constraints involving integer variables and an objective function, provides a powerful representation of decision problems through a tightly interrelated closed system of choices. It supports direct representation of logical (Boolean or prepositional calculus) expressions. Binary variables (hereafter called logical variables) are first introduced and methods of logically connecting these to other variables are then presented. Simple constraints can be combined to construct logical relationships and the methods of formulating these are discussed. A reformulation procedure which uses the extended reverse polish representation of a compound logical form is then described. These reformulation procedures are illustrated by two examples. A scheme of implementation.ithin an LP modelling system is outlined
- …