107,147 research outputs found

    A Local Approach to Studying the Time and Space Complexity of Deterministic and Nondeterministic Decision Trees

    Full text link
    In this paper, we study arbitrary infinite binary information systems each of which consists of an infinite set called universe and an infinite set of two-valued functions (attributes) defined on the universe. We consider the notion of a problem over information system, which is described by a finite number of attributes and a mapping associating a decision to each tuple of attribute values. As algorithms for problem solving, we investigate deterministic and nondeterministic decision trees that use only attributes from the problem description. Nondeterministic decision trees are representations of decision rule systems that sometimes have less space complexity than the original rule systems. As time and space complexity, we study the depth and the number of nodes in the decision trees. In the worst case, with the growth of the number of attributes in the problem description, (i) the minimum depth of deterministic decision trees grows either as a logarithm or linearly, (ii) the minimum depth of nondeterministic decision trees either is bounded from above by a constant or grows linearly, (iii) the minimum number of nodes in deterministic decision trees has either polynomial or exponential growth, and (iv) the minimum number of nodes in nondeterministic decision trees has either polynomial or exponential growth. Based on these results, we divide the set of all infinite binary information systems into three complexity classes. This allows us to identify nontrivial relationships between deterministic decision trees and decision rules systems represented by nondeterministic decision trees. For each class, we study issues related to time-space trade-off for deterministic and nondeterministic decision trees.Comment: arXiv admin note: substantial text overlap with arXiv:2201.0101

    Hopf Algebras of m-permutations, (m+1)-ary trees, and m-parking functions

    Full text link
    The m-Tamari lattice of F. Bergeron is an analogue of the clasical Tamari order defined on objects counted by Fuss-Catalan numbers, such as m-Dyck paths or (m+1)-ary trees. On another hand, the Tamari order is related to the product in the Loday-Ronco Hopf algebra of planar binary trees. We introduce new combinatorial Hopf algebras based on (m+1)-ary trees, whose structure is described by the m-Tamari lattices. In the same way as planar binary trees can be interpreted as sylvester classes of permutations, we obtain (m+1)-ary trees as sylvester classes of what we call m-permutations. These objects are no longer in bijection with decreasing (m+1)-ary trees, and a finer congruence, called metasylvester, allows us to build Hopf algebras based on these decreasing trees. At the opposite, a coarser congruence, called hyposylvester, leads to Hopf algebras of graded dimensions (m+1)^{n-1}, generalizing noncommutative symmetric functions and quasi-symmetric functions in a natural way. Finally, the algebras of packed words and parking functions also admit such m-analogues, and we present their subalgebras and quotients induced by the various congruences.Comment: 51 page

    The Algebra of Binary Search Trees

    Get PDF
    We introduce a monoid structure on the set of binary search trees, by a process very similar to the construction of the plactic monoid, the Robinson-Schensted insertion being replaced by the binary search tree insertion. This leads to a new construction of the algebra of Planar Binary Trees of Loday-Ronco, defining it in the same way as Non-Commutative Symmetric Functions and Free Symmetric Functions. We briefly explain how the main known properties of the Loday-Ronco algebra can be described and proved with this combinatorial point of view, and then discuss it from a representation theoretical point of view, which in turns leads to new combinatorial properties of binary trees.Comment: 49 page

    Solving Multiclass Learning Problems via Error-Correcting Output Codes

    Full text link
    Multiclass learning problems involve finding a definition for an unknown function f(x) whose range is a discrete set containing k &gt 2 values (i.e., k ``classes''). The definition is acquired by studying collections of training examples of the form [x_i, f (x_i)]. Existing approaches to multiclass learning problems include direct application of multiclass algorithms such as the decision-tree algorithms C4.5 and CART, application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and application of binary concept learning algorithms with distributed output representations. This paper compares these three approaches to a new technique in which error-correcting codes are employed as a distributed output representation. We show that these output representations improve the generalization performance of both C4.5 and backpropagation on a wide range of multiclass learning tasks. We also demonstrate that this approach is robust with respect to changes in the size of the training sample, the assignment of distributed representations to particular classes, and the application of overfitting avoidance techniques such as decision-tree pruning. Finally, we show that---like the other methods---the error-correcting code technique can provide reliable class probability estimates. Taken together, these results demonstrate that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems.Comment: See http://www.jair.org/ for any accompanying file

    Commutative combinatorial Hopf algebras

    Full text link
    We propose several constructions of commutative or cocommutative Hopf algebras based on various combinatorial structures, and investigate the relations between them. A commutative Hopf algebra of permutations is obtained by a general construction based on graphs, and its non-commutative dual is realized in three different ways, in particular as the Grossman-Larson algebra of heap ordered trees. Extensions to endofunctions, parking functions, set compositions, set partitions, planar binary trees and rooted forests are discussed. Finally, we introduce one-parameter families interpolating between different structures constructed on the same combinatorial objects.Comment: 29 pages, LaTEX; expanded and updated version of math.CO/050245

    ForestHash: Semantic Hashing With Shallow Random Forests and Tiny Convolutional Networks

    Full text link
    Hash codes are efficient data representations for coping with the ever growing amounts of data. In this paper, we introduce a random forest semantic hashing scheme that embeds tiny convolutional neural networks (CNN) into shallow random forests, with near-optimal information-theoretic code aggregation among trees. We start with a simple hashing scheme, where random trees in a forest act as hashing functions by setting `1' for the visited tree leaf, and `0' for the rest. We show that traditional random forests fail to generate hashes that preserve the underlying similarity between the trees, rendering the random forests approach to hashing challenging. To address this, we propose to first randomly group arriving classes at each tree split node into two groups, obtaining a significantly simplified two-class classification problem, which can be handled using a light-weight CNN weak learner. Such random class grouping scheme enables code uniqueness by enforcing each class to share its code with different classes in different trees. A non-conventional low-rank loss is further adopted for the CNN weak learners to encourage code consistency by minimizing intra-class variations and maximizing inter-class distance for the two random class groups. Finally, we introduce an information-theoretic approach for aggregating codes of individual trees into a single hash code, producing a near-optimal unique hash for each class. The proposed approach significantly outperforms state-of-the-art hashing methods for image retrieval tasks on large-scale public datasets, while performing at the level of other state-of-the-art image classification techniques while utilizing a more compact and efficient scalable representation. This work proposes a principled and robust procedure to train and deploy in parallel an ensemble of light-weight CNNs, instead of simply going deeper.Comment: Accepted to ECCV 201
    corecore