1,004 research outputs found
Clustering Boolean Tensors
Tensor factorizations are computationally hard problems, and in particular,
are often significantly harder than their matrix counterparts. In case of
Boolean tensor factorizations -- where the input tensor and all the factors are
required to be binary and we use Boolean algebra -- much of that hardness comes
from the possibility of overlapping components. Yet, in many applications we
are perfectly happy to partition at least one of the modes. In this paper we
investigate what consequences does this partitioning have on the computational
complexity of the Boolean tensor factorizations and present a new algorithm for
the resulting clustering problem. This algorithm can alternatively be seen as a
particularly regularized clustering algorithm that can handle extremely
high-dimensional observations. We analyse our algorithms with the goal of
maximizing the similarity and argue that this is more meaningful than
minimizing the dissimilarity. As a by-product we obtain a PTAS and an efficient
0.828-approximation algorithm for rank-1 binary factorizations. Our algorithm
for Boolean tensor clustering achieves high scalability, high similarity, and
good generalization to unseen data with both synthetic and real-world data
sets
Weakly Submodular Functions
Submodular functions are well-studied in combinatorial optimization, game
theory and economics. The natural diminishing returns property makes them
suitable for many applications. We study an extension of monotone submodular
functions, which we call {\em weakly submodular functions}. Our extension
includes some (mildly) supermodular functions. We show that several natural
functions belong to this class and relate our class to some other recent
submodular function extensions.
We consider the optimization problem of maximizing a weakly submodular
function subject to uniform and general matroid constraints. For a uniform
matroid constraint, the "standard greedy algorithm" achieves a constant
approximation ratio where the constant (experimentally) converges to 5.95 as
the cardinality constraint increases. For a general matroid constraint, a
simple local search algorithm achieves a constant approximation ratio where the
constant (analytically) converges to 10.22 as the rank of the matroid
increases
Clustering {Boolean} Tensors
Tensor factorizations are computationally hard problems, and in particular, are often significantly harder than their matrix counterparts. In case of Boolean tensor factorizations -- where the input tensor and all the factors are required to be binary and we use Boolean algebra -- much of that hardness comes from the possibility of overlapping components. Yet, in many applications we are perfectly happy to partition at least one of the modes. In this paper we investigate what consequences does this partitioning have on the computational complexity of the Boolean tensor factorizations and present a new algorithm for the resulting clustering problem. This algorithm can alternatively be seen as a particularly regularized clustering algorithm that can handle extremely high-dimensional observations. We analyse our algorithms with the goal of maximizing the similarity and argue that this is more meaningful than minimizing the dissimilarity. As a by-product we obtain a PTAS and an efficient 0.828-approximation algorithm for rank-1 binary factorizations. Our algorithm for Boolean tensor clustering achieves high scalability, high similarity, and good generalization to unseen data with both synthetic and real-world data sets
The Lov\'asz Hinge: A Novel Convex Surrogate for Submodular Losses
Learning with non-modular losses is an important problem when sets of
predictions are made simultaneously. The main tools for constructing convex
surrogate loss functions for set prediction are margin rescaling and slack
rescaling. In this work, we show that these strategies lead to tight convex
surrogates iff the underlying loss function is increasing in the number of
incorrect predictions. However, gradient or cutting-plane computation for these
functions is NP-hard for non-supermodular loss functions. We propose instead a
novel surrogate loss function for submodular losses, the Lov\'asz hinge, which
leads to O(p log p) complexity with O(p) oracle accesses to the loss function
to compute a gradient or cutting-plane. We prove that the Lov\'asz hinge is
convex and yields an extension. As a result, we have developed the first
tractable convex surrogates in the literature for submodular losses. We
demonstrate the utility of this novel convex surrogate through several set
prediction tasks, including on the PASCAL VOC and Microsoft COCO datasets
A study in Rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning
The Rashomon effect occurs when many different explanations exist for the
same phenomenon. In machine learning, Leo Breiman used this term to
characterize problems where many accurate-but-different models exist to
describe the same data. In this work, we study how the Rashomon effect can be
useful for understanding the relationship between training and test
performance, and the possibility that simple-yet-accurate models exist for many
problems. We consider the Rashomon set - the set of almost-equally-accurate
models for a given problem - and study its properties and the types of models
it could contain. We present the Rashomon ratio as a new measure related to
simplicity of model classes, which is the ratio of the volume of the set of
accurate models to the volume of the hypothesis space; the Rashomon ratio is
different from standard complexity measures from statistical learning theory.
For a hierarchy of hypothesis spaces, the Rashomon ratio can help modelers to
navigate the trade-off between simplicity and accuracy. In particular, we find
empirically that a plot of empirical risk vs. Rashomon ratio forms a
characteristic -shaped Rashomon curve, whose elbow seems to be a
reliable model selection criterion. When the Rashomon set is large, models that
are accurate - but that also have various other useful properties - can often
be obtained. These models might obey various constraints such as
interpretability, fairness, or monotonicity.Comment: Revisited sections 3, 4, 5, 6, 7, and
Recommended from our members
Image Understanding and Robotics Research at Columbia University
Over the past year, the research investigations of the Vision/Robotics Laboratory at Columbia University have reflected the interests of its four faculty members, two staff programmers, and 16 Ph.D. students. Several of the projects involve other faculty members in the department or the university, or researchers at AT&T, IBM, or Philips. We list below a summary of our interests and results, together with the principal researchers associated with them. Since it is difficult to separate those aspects of robotic research that are purely visual from those that are vision-like (for example, tactile sensing) or vision-related (for example, integrated vision-robotic systems), we have listed all robotic research that is not purely manipulative. The majority of our current investigations are deepenings of work reported last year; this was the second year of both our basic Image Understanding contract and our Strategic Computing contract. Therefore, the form of this year's report closely resembles last year's. Although there are a few new initiatives, mainly we report the new results we have obtained in the same five basic research areas. Much of this work is summarized on a video tape that is available on request. We also note two service contributions this past year. The Special Issue on Computer Vision of the Proceedings of the IEEE, August, 1988, was co-edited by one of us (John Kender [27]). And, the upcoming IEEE Computer Society Conference on Computer Vision and Pattem Recognition, June, 1989, is co-program chaired by one of us (John Kender [23])
Approximating Dynamic Time Warping and Edit Distance for a Pair of Point Sequences
We give the first subquadratic-time approximation schemes for dynamic time
warping (DTW) and edit distance (ED) of several natural families of point
sequences in , for any fixed . In particular, our
algorithms compute -approximations of DTW and ED in time
near-linear for point sequences drawn from k-packed or k-bounded curves, and
subquadratic for backbone sequences. Roughly speaking, a curve is
-packed if the length of its intersection with any ball of radius
is at most , and a curve is -bounded if the sub-curve
between two curve points does not go too far from the two points compared to
the distance between the two points. In backbone sequences, consecutive points
are spaced at approximately equal distances apart, and no two points lie very
close together. Recent results suggest that a subquadratic algorithm for DTW or
ED is unlikely for an arbitrary pair of point sequences even for . Our
algorithms work by constructing a small set of rectangular regions that cover
the entries of the dynamic programming table commonly used for these distance
measures. The weights of entries inside each rectangle are roughly the same, so
we are able to use efficient procedures to approximately compute the cheapest
paths through these rectangles
- …