63 research outputs found
Generating artificial data with monotonicity constraints
The monotonicity constraint is a common side condition imposed on
modeling problems as diverse as hedonic pricing, personnel
selection and credit rating. Experience tells us that it is not
trivial to generate artificial data for supervised learning
problems when the monotonicity constraint holds. Two algorithms
are presented in this paper for such learning problems. The first
one can be used to generate random monotone data sets without an
underlying model, and the second can be used to generate monotone
decision tree models. If needed, noise can be added to the
generated data. The second algorithm makes use of the first one.
Both algorithms are illustrated with an example
Achieving New Upper Bounds for the Hypergraph Duality Problem through Logic
The hypergraph duality problem DUAL is defined as follows: given two simple
hypergraphs and , decide whether
consists precisely of all minimal transversals of (in which case
we say that is the dual of ). This problem is
equivalent to deciding whether two given non-redundant monotone DNFs are dual.
It is known that non-DUAL, the complementary problem to DUAL, is in
, where
denotes the complexity class of all problems that after a nondeterministic
guess of bits can be decided (checked) within complexity class
. It was conjectured that non-DUAL is in . In this paper we prove this conjecture and actually
place the non-DUAL problem into the complexity class which is a subclass of . We here refer to the logtime-uniform version of
, which corresponds to , i.e., first order
logic augmented by counting quantifiers. We achieve the latter bound in two
steps. First, based on existing problem decomposition methods, we develop a new
nondeterministic algorithm for non-DUAL that requires to guess
bits. We then proceed by a logical analysis of this algorithm, allowing us to
formulate its deterministic part in . From this result, by
the well known inclusion , it follows
that DUAL belongs also to . Finally, by exploiting
the principles on which the proposed nondeterministic algorithm is based, we
devise a deterministic algorithm that, given two hypergraphs and
, computes in quadratic logspace a transversal of
missing in .Comment: Restructured the presentation in order to be the extended version of
a paper that will shortly appear in SIAM Journal on Computin
Counting and enumerating aggregate classifiers
peer reviewedaudience: researcherWe propose a generic model for the "weighted voting" aggregation step performed by several methods in supervised classification. Further, we construct an algorithm to count the number of distinct aggregate classifiers that arise in this model. When there are only two classes in the classification problem, we show that a class of functions that arises from aggregate classifiers coincides with the class of self-dual positive threshold Boolean functions
Dualisation, decision lists and identification of monotone discrete functions
Many data-analysis algorithms in machine learning, datamining and a variety of other disciplines essentially operate on discrete multi-attribute data sets. By means of discretisation or binarisation also numerical data sets can be successfully analysed. Therefore, in this paper we view/introduce the theory of (partially defined) discrete functions as an important theoretical tool for the analysis of multi-attribute data sets. In particular we study monotone (partially defined) discrete functions. Compared with the theory of Boolean functions relatively little is known about (partially defined) monotone discrete functions. It appears that decision lists are useful for the representation of monotone discrete functions. Since dualisation is an important tool in the theory of (monotone) Boolean functions, we study the interpretation and properties of the dual of a (monotone) binary or discrete function. We also introduce the dual of a pseudo-Boolean function. The results are used to investigate extensions of partially defined monotone discrete functions and the identification of monotone discrete functions. In particular we present a polynomial time algorithm for the identification of so-called stable discrete functions
On the Complexity of Mining Itemsets from the Crowd Using Taxonomies
We study the problem of frequent itemset mining in domains where data is not
recorded in a conventional database but only exists in human knowledge. We
provide examples of such scenarios, and present a crowdsourcing model for them.
The model uses the crowd as an oracle to find out whether an itemset is
frequent or not, and relies on a known taxonomy of the item domain to guide the
search for frequent itemsets. In the spirit of data mining with oracles, we
analyze the complexity of this problem in terms of (i) crowd complexity, that
measures the number of crowd questions required to identify the frequent
itemsets; and (ii) computational complexity, that measures the computational
effort required to choose the questions. We provide lower and upper complexity
bounds in terms of the size and structure of the input taxonomy, as well as the
size of a concise description of the output itemsets. We also provide
constructive algorithms that achieve the upper bounds, and consider more
efficient variants for practical situations.Comment: 18 pages, 2 figures. To be published to ICDT'13. Added missing
acknowledgemen
Monotonicity in Bayesian Networks
For many real-life Bayesian networks, common knowledge dictates that the
output established for the main variable of interest increases with higher
values for the observable variables. We define two concepts of monotonicity to
capture this type of knowledge. We say that a network is isotone in
distribution if the probability distribution computed for the output variable
given specific observations is stochastically dominated by any such
distribution given higher-ordered observations; a network is isotone in mode if
a probability distribution given higher observations has a higher mode. We show
that establishing whether a network exhibits any of these properties of
monotonicity is coNPPP-complete in general, and remains coNP-complete for
polytrees. We present an approximate algorithm for deciding whether a network
is monotone in distribution and illustrate its application to a real-life
network in oncology.Comment: Appears in Proceedings of the Twentieth Conference on Uncertainty in
Artificial Intelligence (UAI2004
Knowledge Discovery and Monotonicity
The monotonicity property is ubiquitous in our lives and it appears in different roles: as domain knowledge, as a requirement, as a property that reduces the complexity of the problem, and so on. It is present in various domains: economics, mathematics, languages, operations research and many others. This thesis is focused on the monotonicity property in knowledge discovery and more specifically in classification, attribute reduction, function decomposition, frequent patterns generation and missing values handling. Four specific problems are addressed within four different methodologies, namely, rough sets theory, monotone decision trees, function decomposition and frequent patterns generation. In the first three parts, the monotonicity is domain knowledge and a requirement for the outcome of the classification process. The three methodologies are extended for dealing with monotone data in order to be able to guarantee that the outcome will also satisfy the monotonicity requirement. In the last part, monotonicity is a property that helps reduce the computation of the process of frequent patterns generation. Here the focus is on two of the best algorithms and their comparison both theoretically and experimentally.
About the Author:
Viara Popova was born in Bourgas, Bulgaria in 1972. She followed her secondary
education at Mathematics High School "Nikola Obreshkov" in Bourgas. In 1996
she finished her higher education at Sofia University, Faculty of Mathematics
and Informatics where she graduated with major in Informatics and specialization
in Information Technologies in Education. She then joined the Department
of Information Technologies,
First as an associated member and from 1997 as an assistant professor.
In 1999 she became a PhD student at Erasmus University Rotterdam, Faculty
of Economics, Department of Computer Science. In 2004 she joined the
Artificial Intelligence Group within the Department of Computer Science, Faculty
of Sciences at Vrije Universiteit Amsterdam as a PostDoc researcher.This thesis is positioned in the area of knowledge discovery with special attention to problems where the property of monotonicity plays an important role. Monotonicity is a ubiquitous property in all areas of life and has therefore been widely studied in mathematics. Monotonicity in knowledge discovery can be treated as available background information that can facilitate and guide the knowledge extraction process. While in some sub-areas methods have already been developed for taking this additional information into account, in most methodologies it has not been extensively studied or even has not been addressed at all. This thesis is a contribution to a change in that direction. In the thesis, four specific problems have been examined from different sub-areas of knowledge discovery: the rough sets methodology, monotone decision trees, function decomposition and frequent patterns discovery. In the first three parts, the monotonicity is domain knowledge and a requirement for the outcome of the classification process. The three methodologies are extended for dealing with monotone data in order to be able to guarantee that the outcome will also satisfy the monotonicity requirement. In the last part, monotonicity is a property that helps reduce the computation of the process of frequent patterns generation. Here the focus is on two of the best algorithms and their comparison both theoretically and experimentally
Algorithmic and complexity aspects of simple coalitional games
Simple coalitional games are a fundamental class of cooperative games and voting games which are used to model coalition formation, resource allocation and decision making in computer science, artificial intelligence and multiagent systems. Although simple coalitional games are well studied in the domain of game theory and social choice, their algorithmic and computational complexity aspects have received less attention till recently. The computational aspects of simple coalitional games are of increased importance as these games are used by computer scientists to model distributed settings. This thesis fits in the wider setting of the interplay between economics and computer science which has led to the development of algorithmic game theory and computational social choice. A unified view of the computational aspects of simple coalitional games is presented here for the first time. Certain complexity results also apply to other coalitional games such as skill games and matching games. The following issues are given special consideration: influence of players, limit and complexity of manipulations in the coalitional games and complexity of resource allocation on networks. The complexity of comparison of influence between players in simple games is characterized. The simple games considered are represented by winning coalitions, minimal winning coalitions, weighted voting games or multiple weighted voting games. A comprehensive classification of weighted voting games which can be solved in polynomial time is presented. An efficient algorithm which uses generating functions and interpolation to compute an integer weight vector for target power indices is proposed. Voting theory, especially the Penrose Square Root Law, is used to investigate the fairness of a real life voting model. Computational complexity of manipulation in social choice protocols can determine whether manipulation is computationally feasible or not. The computational complexity and bounds of manipulation are considered from various angles including control, false-name manipulation and bribery. Moreover, the computational complexity of computing various cooperative game solutions of simple games in dierent representations is studied. Certain structural results regarding least core payos extend to the general monotone cooperative game. The thesis also studies a coalitional game called the spanning connectivity game. It is proved that whereas computing the Banzhaf values and Shapley-Shubik indices of such games is #P-complete, there is a polynomial time combinatorial algorithm to compute the nucleolus. The results have interesting significance for optimal strategies for the wiretapping game which is a noncooperative game defined on a network
- …