7 research outputs found
Generating artificial data with monotonicity constraints
The monotonicity constraint is a common side condition imposed on
modeling problems as diverse as hedonic pricing, personnel
selection and credit rating. Experience tells us that it is not
trivial to generate artificial data for supervised learning
problems when the monotonicity constraint holds. Two algorithms
are presented in this paper for such learning problems. The first
one can be used to generate random monotone data sets without an
underlying model, and the second can be used to generate monotone
decision tree models. If needed, noise can be added to the
generated data. The second algorithm makes use of the first one.
Both algorithms are illustrated with an example
Knowledge Discovery and Monotonicity
The monotonicity property is ubiquitous in our lives and it appears in different roles: as domain knowledge, as a requirement, as a property that reduces the complexity of the problem, and so on. It is present in various domains: economics, mathematics, languages, operations research and many others. This thesis is focused on the monotonicity property in knowledge discovery and more specifically in classification, attribute reduction, function decomposition, frequent patterns generation and missing values handling. Four specific problems are addressed within four different methodologies, namely, rough sets theory, monotone decision trees, function decomposition and frequent patterns generation. In the first three parts, the monotonicity is domain knowledge and a requirement for the outcome of the classification process. The three methodologies are extended for dealing with monotone data in order to be able to guarantee that the outcome will also satisfy the monotonicity requirement. In the last part, monotonicity is a property that helps reduce the computation of the process of frequent patterns generation. Here the focus is on two of the best algorithms and their comparison both theoretically and experimentally.
About the Author:
Viara Popova was born in Bourgas, Bulgaria in 1972. She followed her secondary
education at Mathematics High School "Nikola Obreshkov" in Bourgas. In 1996
she finished her higher education at Sofia University, Faculty of Mathematics
and Informatics where she graduated with major in Informatics and specialization
in Information Technologies in Education. She then joined the Department
of Information Technologies,
First as an associated member and from 1997 as an assistant professor.
In 1999 she became a PhD student at Erasmus University Rotterdam, Faculty
of Economics, Department of Computer Science. In 2004 she joined the
Artificial Intelligence Group within the Department of Computer Science, Faculty
of Sciences at Vrije Universiteit Amsterdam as a PostDoc researcher.This thesis is positioned in the area of knowledge discovery with special attention to problems where the property of monotonicity plays an important role. Monotonicity is a ubiquitous property in all areas of life and has therefore been widely studied in mathematics. Monotonicity in knowledge discovery can be treated as available background information that can facilitate and guide the knowledge extraction process. While in some sub-areas methods have already been developed for taking this additional information into account, in most methodologies it has not been extensively studied or even has not been addressed at all. This thesis is a contribution to a change in that direction. In the thesis, four specific problems have been examined from different sub-areas of knowledge discovery: the rough sets methodology, monotone decision trees, function decomposition and frequent patterns discovery. In the first three parts, the monotonicity is domain knowledge and a requirement for the outcome of the classification process. The three methodologies are extended for dealing with monotone data in order to be able to guarantee that the outcome will also satisfy the monotonicity requirement. In the last part, monotonicity is a property that helps reduce the computation of the process of frequent patterns generation. Here the focus is on two of the best algorithms and their comparison both theoretically and experimentally