1,011 research outputs found
A novel Boolean kernels family for categorical data
Kernel based classifiers, such as SVM, are considered state-of-the-art algorithms and are widely used on many classification tasks. However, this kind of methods are hardly interpretable and for this reason they are often considered as black-box models. In this paper, we propose a new family of Boolean kernels for categorical data where features correspond to propositional formulas applied to the input variables. The idea is to create human-readable features to ease the extraction of interpretation rules directly from the embedding space. Experiments on artificial and benchmark datasets show the effectiveness of the proposed family of kernels with respect to established ones, such as RBF, in terms of classification accuracy
Binary Independent Component Analysis with OR Mixtures
Independent component analysis (ICA) is a computational method for separating
a multivariate signal into subcomponents assuming the mutual statistical
independence of the non-Gaussian source signals. The classical Independent
Components Analysis (ICA) framework usually assumes linear combinations of
independent sources over the field of realvalued numbers R. In this paper, we
investigate binary ICA for OR mixtures (bICA), which can find applications in
many domains including medical diagnosis, multi-cluster assignment, Internet
tomography and network resource management. We prove that bICA is uniquely
identifiable under the disjunctive generation model, and propose a
deterministic iterative algorithm to determine the distribution of the latent
random variables and the mixing matrix. The inverse problem concerning
inferring the values of latent variables are also considered along with noisy
measurements. We conduct an extensive simulation study to verify the
effectiveness of the propose algorithm and present examples of real-world
applications where bICA can be applied.Comment: Manuscript submitted to IEEE Transactions on Signal Processin
Space complexity in polynomial calculus
During the last decade, an active line of research in proof complexity has been to study space
complexity and time-space trade-offs for proofs. Besides being a natural complexity measure of
intrinsic interest, space is also an important issue in SAT solving, and so research has mostly focused
on weak systems that are used by SAT solvers.
There has been a relatively long sequence of papers on space in resolution, which is now reasonably
well understood from this point of view. For other natural candidates to study, however, such as
polynomial calculus or cutting planes, very little has been known. We are not aware of any nontrivial
space lower bounds for cutting planes, and for polynomial calculus the only lower bound has been
for CNF formulas of unbounded width in [Alekhnovich et al. ’02], where the space lower bound is
smaller than the initial width of the clauses in the formulas. Thus, in particular, it has been consistent
with current knowledge that polynomial calculus could be able to refute any k-CNF formula in
constant space.
In this paper, we prove several new results on space in polynomial calculus (PC), and in the
extended proof system polynomial calculus resolution (PCR) studied in [Alekhnovich et al. ’02]:
1. We prove an Ω(n) space lower bound in PC for the canonical 3-CNF version of the pigeonhole
principle formulas PHPm
n with m pigeons and n holes, and show that this is tight.
2. For PCR, we prove an Ω(n) space lower bound for a bitwise encoding of the functional pigeonhole
principle. These formulas have width O(log n), and hence this is an exponential
improvement over [Alekhnovich et al. ’02] measured in the width of the formulas.
3. We then present another encoding of the pigeonhole principle that has constant width, and
prove an Ω(n) space lower bound in PCR for these formulas as well.
4. Finally, we prove that any k-CNF formula can be refuted in PC in simultaneous exponential
size and linear space (which holds for resolution and thus for PCR, but was not obviously
the case for PC). We also characterize a natural class of CNF formulas for which the space
complexity in resolution and PCR does not change when the formula is transformed into 3-CNF
in the canonical way, something that we believe can be useful when proving PCR space lower
bounds for other well-studied formula families in proof complexity
Branching strategies for mixed-integer programs containing logical constraints and decomposable structure
Decision-making optimisation problems can include discrete selections, e.g. selecting a route, arranging non-overlapping items or designing a network of items. Branch-and-bound (B&B), a widely applied divide-and-conquer framework, often solves such problems by considering a continuous approximation, e.g. replacing discrete variable domains by a continuous superset. Such approximations weaken the logical relations, e.g. for discrete variables corresponding to Boolean variables. Branching in B&B reintroduces logical relations by dividing the search space. This thesis studies designing B&B branching strategies, i.e. how to divide the search space, for optimisation problems that contain both a logical and a continuous structure.
We begin our study with a large-scale, industrially-relevant optimisation problem where the objective consists of machine-learnt gradient-boosted trees (GBTs) and convex penalty functions. GBT functions contain if-then queries which introduces a logical structure to this problem. We propose decomposition-based rigorous bounding strategies and an iterative heuristic that can be embedded into a B&B algorithm. We approach branching with two strategies: a pseudocost initialisation and strong branching that target the structure of GBT and convex penalty aspects of the optimisation objective, respectively. Computational tests show that our B&B approach outperforms state-of-the-art solvers in deriving rigorous bounds on optimality.
Our second project investigates how satisfiability modulo theories (SMT) derived unsatisfiable cores may be utilised in a B&B context. Unsatisfiable cores are subsets of constraints that explain an infeasible result. We study two-dimensional bin packing (2BP) and develop a B&B algorithm that branches on SMT unsatisfiable cores. We use the unsatisfiable cores to derive cuts that break 2BP symmetries. Computational results show that our B&B algorithm solves 20% more instances when compared with commercial solvers on the tested instances.
Finally, we study convex generalized disjunctive programming (GDP), a framework that supports logical variables and operators. Convex GDP includes disjunctions of mathematical constraints, which motivate branching by partitioning the disjunctions. We investigate separation by branching, i.e. eliminating solutions that prevent rigorous bound improvement, and propose a greedy algorithm for building the branches. We propose three scoring methods for selecting the next branching disjunction. We also analyse how to leverage infeasibility to expedite the B&B search. Computational results show that our scoring methods can reduce the number of explored B&B nodes by an order of magnitude when compared with scoring methods proposed in literature. Our infeasibility analysis further reduces the number of explored nodes.Open Acces
A Graphical Query Interface Based on Aggregation/Generalization Hierarchies
In order for automated information systems to be used effectively, they must
be made easily accessible to a wide range of users and with short training
periods. This work proposes a method of organizing documents based on the
concepts of aggregation and generalization hierarchies. We propose a
graphical user interface to provide a more intuitive form of Boolean query.
This design is based on mapping the nodes of the aggregation hierarchy to
Boolean intersection operations, mapping the nodes of the generalization
hierarchy to Boolean union operations, and providing a concrete, graphical,
manipulable representation of both of these node types. Finally, a working
prototype interface was constructed and evaluated experimentally against a
classical command-line Boolean query interface. In this formative
evaluation with sixteen subjects, the graphical interface produced less
than one-tenth the errors of the textual interface, on average. Significant
differences in time spent specifying queries were not found. Observations
and comments provide guidance for designers.
(Also cross-referenced as CAR-TR-562
- …