1,011 research outputs found

    A novel Boolean kernels family for categorical data

    Get PDF
    Kernel based classifiers, such as SVM, are considered state-of-the-art algorithms and are widely used on many classification tasks. However, this kind of methods are hardly interpretable and for this reason they are often considered as black-box models. In this paper, we propose a new family of Boolean kernels for categorical data where features correspond to propositional formulas applied to the input variables. The idea is to create human-readable features to ease the extraction of interpretation rules directly from the embedding space. Experiments on artificial and benchmark datasets show the effectiveness of the proposed family of kernels with respect to established ones, such as RBF, in terms of classification accuracy

    Binary Independent Component Analysis with OR Mixtures

    Full text link
    Independent component analysis (ICA) is a computational method for separating a multivariate signal into subcomponents assuming the mutual statistical independence of the non-Gaussian source signals. The classical Independent Components Analysis (ICA) framework usually assumes linear combinations of independent sources over the field of realvalued numbers R. In this paper, we investigate binary ICA for OR mixtures (bICA), which can find applications in many domains including medical diagnosis, multi-cluster assignment, Internet tomography and network resource management. We prove that bICA is uniquely identifiable under the disjunctive generation model, and propose a deterministic iterative algorithm to determine the distribution of the latent random variables and the mixing matrix. The inverse problem concerning inferring the values of latent variables are also considered along with noisy measurements. We conduct an extensive simulation study to verify the effectiveness of the propose algorithm and present examples of real-world applications where bICA can be applied.Comment: Manuscript submitted to IEEE Transactions on Signal Processin

    Using rules of thumb to repair inconsistent knowledge

    Get PDF

    Space complexity in polynomial calculus

    Get PDF
    During the last decade, an active line of research in proof complexity has been to study space complexity and time-space trade-offs for proofs. Besides being a natural complexity measure of intrinsic interest, space is also an important issue in SAT solving, and so research has mostly focused on weak systems that are used by SAT solvers. There has been a relatively long sequence of papers on space in resolution, which is now reasonably well understood from this point of view. For other natural candidates to study, however, such as polynomial calculus or cutting planes, very little has been known. We are not aware of any nontrivial space lower bounds for cutting planes, and for polynomial calculus the only lower bound has been for CNF formulas of unbounded width in [Alekhnovich et al. ’02], where the space lower bound is smaller than the initial width of the clauses in the formulas. Thus, in particular, it has been consistent with current knowledge that polynomial calculus could be able to refute any k-CNF formula in constant space. In this paper, we prove several new results on space in polynomial calculus (PC), and in the extended proof system polynomial calculus resolution (PCR) studied in [Alekhnovich et al. ’02]: 1. We prove an Ω(n) space lower bound in PC for the canonical 3-CNF version of the pigeonhole principle formulas PHPm n with m pigeons and n holes, and show that this is tight. 2. For PCR, we prove an Ω(n) space lower bound for a bitwise encoding of the functional pigeonhole principle. These formulas have width O(log n), and hence this is an exponential improvement over [Alekhnovich et al. ’02] measured in the width of the formulas. 3. We then present another encoding of the pigeonhole principle that has constant width, and prove an Ω(n) space lower bound in PCR for these formulas as well. 4. Finally, we prove that any k-CNF formula can be refuted in PC in simultaneous exponential size and linear space (which holds for resolution and thus for PCR, but was not obviously the case for PC). We also characterize a natural class of CNF formulas for which the space complexity in resolution and PCR does not change when the formula is transformed into 3-CNF in the canonical way, something that we believe can be useful when proving PCR space lower bounds for other well-studied formula families in proof complexity

    Branching strategies for mixed-integer programs containing logical constraints and decomposable structure

    Get PDF
    Decision-making optimisation problems can include discrete selections, e.g. selecting a route, arranging non-overlapping items or designing a network of items. Branch-and-bound (B&B), a widely applied divide-and-conquer framework, often solves such problems by considering a continuous approximation, e.g. replacing discrete variable domains by a continuous superset. Such approximations weaken the logical relations, e.g. for discrete variables corresponding to Boolean variables. Branching in B&B reintroduces logical relations by dividing the search space. This thesis studies designing B&B branching strategies, i.e. how to divide the search space, for optimisation problems that contain both a logical and a continuous structure. We begin our study with a large-scale, industrially-relevant optimisation problem where the objective consists of machine-learnt gradient-boosted trees (GBTs) and convex penalty functions. GBT functions contain if-then queries which introduces a logical structure to this problem. We propose decomposition-based rigorous bounding strategies and an iterative heuristic that can be embedded into a B&B algorithm. We approach branching with two strategies: a pseudocost initialisation and strong branching that target the structure of GBT and convex penalty aspects of the optimisation objective, respectively. Computational tests show that our B&B approach outperforms state-of-the-art solvers in deriving rigorous bounds on optimality. Our second project investigates how satisfiability modulo theories (SMT) derived unsatisfiable cores may be utilised in a B&B context. Unsatisfiable cores are subsets of constraints that explain an infeasible result. We study two-dimensional bin packing (2BP) and develop a B&B algorithm that branches on SMT unsatisfiable cores. We use the unsatisfiable cores to derive cuts that break 2BP symmetries. Computational results show that our B&B algorithm solves 20% more instances when compared with commercial solvers on the tested instances. Finally, we study convex generalized disjunctive programming (GDP), a framework that supports logical variables and operators. Convex GDP includes disjunctions of mathematical constraints, which motivate branching by partitioning the disjunctions. We investigate separation by branching, i.e. eliminating solutions that prevent rigorous bound improvement, and propose a greedy algorithm for building the branches. We propose three scoring methods for selecting the next branching disjunction. We also analyse how to leverage infeasibility to expedite the B&B search. Computational results show that our scoring methods can reduce the number of explored B&B nodes by an order of magnitude when compared with scoring methods proposed in literature. Our infeasibility analysis further reduces the number of explored nodes.Open Acces

    A Graphical Query Interface Based on Aggregation/Generalization Hierarchies

    Get PDF
    In order for automated information systems to be used effectively, they must be made easily accessible to a wide range of users and with short training periods. This work proposes a method of organizing documents based on the concepts of aggregation and generalization hierarchies. We propose a graphical user interface to provide a more intuitive form of Boolean query. This design is based on mapping the nodes of the aggregation hierarchy to Boolean intersection operations, mapping the nodes of the generalization hierarchy to Boolean union operations, and providing a concrete, graphical, manipulable representation of both of these node types. Finally, a working prototype interface was constructed and evaluated experimentally against a classical command-line Boolean query interface. In this formative evaluation with sixteen subjects, the graphical interface produced less than one-tenth the errors of the textual interface, on average. Significant differences in time spent specifying queries were not found. Observations and comments provide guidance for designers. (Also cross-referenced as CAR-TR-562
    • …
    corecore