69,922 research outputs found

    Some NP-Complete Problems for Attribute Reduction in Consistent Decision Tables

    Get PDF
    Over recent years, the research of attribute reduction for general decision systems and, in particular, for consistent decision tables has attracted great attention from the computer science community due to the emerge of big data. It has been known that, for a consistent decision table, we can derive a polynomial time complexity algorithm for finding a reduct. In addition, finding redundant properties can also be done in polynomial time. However, finding all reduct sets in a consistent decision table is a problem with exponential time complexity. In this paper, we study complexity of the problem for finding a certain class of reduct sets. In particular, we make use of a new concept of relative reduct in the consistent decision table. We present two NP-complete problems related to the proposed concept. These problems are related to the cardinality constraint and the relative reduct set. On the basis of this result, we show that finding a reduct with the smallest cardinality cannot be done by an algorithm with polynomial time complexity

    An efficient randomised sphere cover classifier

    Get PDF
    This paper describes an efficient randomised sphere cover classifier(aRSC), that reduces the training data set size without loss of accuracy when compared to nearest neighbour classifiers. The motivation for developing this algorithm is the desire to have a non-deterministic, fast, instance-based classifier that performs well in isolation but is also ideal for use with ensembles. We use 24 benchmark datasets from UCI repository and six gene expression datasets for evaluation. The first set of experiments demonstrate the basic benefits of sphere covering. The second set of experiments demonstrate that when we set the a parameter through cross validation, the resulting aRSC algorithm outperforms several well known classifiers when compared using the Friedman rank sum test. Thirdly, we test the usefulness of aRSC when used with three feature filtering filters on six gene expression datasets. Finally, we highlight the benefits of pruning with a bias/variance decompositio

    Intertemporal Choice of Fuzzy Soft Sets

    Get PDF
    This paper first merges two noteworthy aspects of choice. On the one hand, soft sets and fuzzy soft sets are popular models that have been largely applied to decision making problems, such as real estate valuation, medical diagnosis (glaucoma, prostate cancer, etc.), data mining, or international trade. They provide crisp or fuzzy parameterized descriptions of the universe of alternatives. On the other hand, in many decisions, costs and benefits occur at different points in time. This brings about intertemporal choices, which may involve an indefinitely large number of periods. However, the literature does not provide a model, let alone a solution, to the intertemporal problem when the alternatives are described by (fuzzy) parameterizations. In this paper, we propose a novel soft set inspired model that applies to the intertemporal framework, hence it fills an important gap in the development of fuzzy soft set theory. An algorithm allows the selection of the optimal option in intertemporal choice problems with an infinite time horizon. We illustrate its application with a numerical example involving alternative portfolios of projects that a public administration may undertake. This allows us to establish a pioneering intertemporal model of choice in the framework of extended fuzzy set theorie

    Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets

    Full text link
    This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptions, the costs of these operations can be shown to be independent of the number of records in the dataset and loglinear in the number of non-zero entries in the contingency table. We provide a very sparse data structure, the ADtree, to minimize memory use. We provide analytical worst-case bounds for this structure for several models of data distribution. We empirically demonstrate that tractably-sized data structures can be produced for large real-world datasets by (a) using a sparse tree structure that never allocates memory for counts of zero, (b) never allocating memory for counts that can be deduced from other counts, and (c) not bothering to expand the tree fully near its leaves. We show how the ADtree can be used to accelerate Bayes net structure finding algorithms, rule learning algorithms, and feature selection algorithms, and we provide a number of empirical results comparing ADtree methods against traditional direct counting approaches. We also discuss the possible uses of ADtrees in other machine learning methods, and discuss the merits of ADtrees in comparison with alternative representations such as kd-trees, R-trees and Frequent Sets.Comment: See http://www.jair.org/ for any accompanying file
    • …
    corecore