217 research outputs found
Generalized Shortest Path Kernel on Graphs
We consider the problem of classifying graphs using graph kernels. We define
a new graph kernel, called the generalized shortest path kernel, based on the
number and length of shortest paths between nodes. For our example
classification problem, we consider the task of classifying random graphs from
two well-known families, by the number of clusters they contain. We verify
empirically that the generalized shortest path kernel outperforms the original
shortest path kernel on a number of datasets. We give a theoretical analysis
for explaining our experimental results. In particular, we estimate
distributions of the expected feature vectors for the shortest path kernel and
the generalized shortest path kernel, and we show some evidence explaining why
our graph kernel outperforms the shortest path kernel for our graph
classification problem.Comment: Short version presented at Discovery Science 2015 in Banf
Learning what matters - Sampling interesting patterns
In the field of exploratory data mining, local structure in data can be
described by patterns and discovered by mining algorithms. Although many
solutions have been proposed to address the redundancy problems in pattern
mining, most of them either provide succinct pattern sets or take the interests
of the user into account-but not both. Consequently, the analyst has to invest
substantial effort in identifying those patterns that are relevant to her
specific interests and goals. To address this problem, we propose a novel
approach that combines pattern sampling with interactive data mining. In
particular, we introduce the LetSIP algorithm, which builds upon recent
advances in 1) weighted sampling in SAT and 2) learning to rank in interactive
pattern mining. Specifically, it exploits user feedback to directly learn the
parameters of the sampling distribution that represents the user's interests.
We compare the performance of the proposed algorithm to the state-of-the-art in
interactive pattern mining by emulating the interests of a user. The resulting
system allows efficient and interleaved learning and sampling, thus
user-specific anytime data exploration. Finally, LetSIP demonstrates favourable
trade-offs concerning both quality-diversity and exploitation-exploration when
compared to existing methods.Comment: PAKDD 2017, extended versio
Quantity makes quality: learning with partial views
In many real world applications, the number of examples to learn from is plentiful, but we can only obtain limited information on each individual example. We study the possibilities of efficient, provably correct, large-scale learning in such settings. The main theme we would like to establish is that large amounts of examples can compensate for the lack of full information on each individual example. The type of partial information we consider can be due to inherent noise or from constraints on the type of interaction with the data source. In particular, we describe and analyze algorithms for budgeted learning, in which the learner can only view a few attributes of each training example (Cesa-Bianchi, Shalev-Shwartz, and Shamir 2010a; 2010c), and algorithms for learning kernel-based predictors, when individual examples are corrupted by random noise (Cesa-Bianchi, Shalev-Shwartz, and Shamir 2010b)
Spectral Sparsification and Regret Minimization Beyond Matrix Multiplicative Updates
In this paper, we provide a novel construction of the linear-sized spectral
sparsifiers of Batson, Spielman and Srivastava [BSS14]. While previous
constructions required running time [BSS14, Zou12], our
sparsification routine can be implemented in almost-quadratic running time
.
The fundamental conceptual novelty of our work is the leveraging of a strong
connection between sparsification and a regret minimization problem over
density matrices. This connection was known to provide an interpretation of the
randomized sparsifiers of Spielman and Srivastava [SS11] via the application of
matrix multiplicative weight updates (MWU) [CHS11, Vis14]. In this paper, we
explain how matrix MWU naturally arises as an instance of the
Follow-the-Regularized-Leader framework and generalize this approach to yield a
larger class of updates. This new class allows us to accelerate the
construction of linear-sized spectral sparsifiers, and give novel insights on
the motivation behind Batson, Spielman and Srivastava [BSS14]
The Computational Power of Optimization in Online Learning
We consider the fundamental problem of prediction with expert advice where
the experts are "optimizable": there is a black-box optimization oracle that
can be used to compute, in constant time, the leading expert in retrospect at
any point in time. In this setting, we give a novel online algorithm that
attains vanishing regret with respect to experts in total
computation time. We also give a lower bound showing
that this running time cannot be improved (up to log factors) in the oracle
model, thereby exhibiting a quadratic speedup as compared to the standard,
oracle-free setting where the required time for vanishing regret is
. These results demonstrate an exponential gap between
the power of optimization in online learning and its power in statistical
learning: in the latter, an optimization oracle---i.e., an efficient empirical
risk minimizer---allows to learn a finite hypothesis class of size in time
. We also study the implications of our results to learning in
repeated zero-sum games, in a setting where the players have access to oracles
that compute, in constant time, their best-response to any mixed strategy of
their opponent. We show that the runtime required for approximating the minimax
value of the game in this setting is , yielding
again a quadratic improvement upon the oracle-free setting, where
is known to be tight
Optimization in High Dimensions via Accelerated, Parallel, and Proximal Coordinate Descent
International audience<p>We propose a new randomized coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel and PROXimal; this is the first time such a method is proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate , where is the iteration counter, is a data-weighted \emph{average} degree of separability of the loss function, is the \emph{average} of Lipschitz constants associated with the coordinates and individual functions in the sum, and is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of accelerated coordinate descent, rendering it impractical. The fact that the method depends on the average degree of separability, and not on the maximum degree, can be attributed to the use of new safe large stepsizes, leading to improved expected separable overapproximation (ESO). These are of independent interest and can be utilized in all existing parallel randomized coordinate descent algorithms based on the concept of ESO. In special cases, our method recovers several classical and recent algorithms such as simple and accelerated proximal gradient descent, as well as serial, parallel and distributed versions of randomized block coordinate descent. \new{Due of this flexibility, APPROX had been used successfully by the authors in a graduate class setting as a modern introduction to deterministic and randomized proximal gradient methods. Our bounds match or improve on the best known bounds for each of the methods APPROX specializes to. Our method has applications in a number of areas, including machine learning, submodular optimization, linear and semidefinite programming.</p
Contextual Object Detection with a Few Relevant Neighbors
A natural way to improve the detection of objects is to consider the
contextual constraints imposed by the detection of additional objects in a
given scene. In this work, we exploit the spatial relations between objects in
order to improve detection capacity, as well as analyze various properties of
the contextual object detection problem. To precisely calculate context-based
probabilities of objects, we developed a model that examines the interactions
between objects in an exact probabilistic setting, in contrast to previous
methods that typically utilize approximations based on pairwise interactions.
Such a scheme is facilitated by the realistic assumption that the existence of
an object in any given location is influenced by only few informative locations
in space. Based on this assumption, we suggest a method for identifying these
relevant locations and integrating them into a mostly exact calculation of
probability based on their raw detector responses. This scheme is shown to
improve detection results and provides unique insights about the process of
contextual inference for object detection. We show that it is generally
difficult to learn that a particular object reduces the probability of another,
and that in cases when the context and detector strongly disagree this learning
becomes virtually impossible for the purposes of improving the results of an
object detector. Finally, we demonstrate improved detection results through use
of our approach as applied to the PASCAL VOC and COCO datasets
- …