6,249 research outputs found
Clustered Integer 3SUM via Additive Combinatorics
We present a collection of new results on problems related to 3SUM,
including:
1. The first truly subquadratic algorithm for
1a. computing the (min,+) convolution for monotone increasing
sequences with integer values bounded by ,
1b. solving 3SUM for monotone sets in 2D with integer coordinates
bounded by , and
1c. preprocessing a binary string for histogram indexing (also
called jumbled indexing).
The running time is:
with
randomization, or deterministically. This greatly improves the
previous time bound obtained from Williams'
recent result on all-pairs shortest paths [STOC'14], and answers an open
question raised by several researchers studying the histogram indexing problem.
2. The first algorithm for histogram indexing for any constant alphabet size
that achieves truly subquadratic preprocessing time and truly sublinear query
time.
3. A truly subquadratic algorithm for integer 3SUM in the case when the given
set can be partitioned into clusters each covered by an interval
of length , for any constant .
4. An algorithm to preprocess any set of integers so that subsequently
3SUM on any given subset can be solved in
time.
All these results are obtained by a surprising new technique, based on the
Balog--Szemer\'edi--Gowers Theorem from additive combinatorics
A Bayesian alternative to mutual information for the hierarchical clustering of dependent random variables
The use of mutual information as a similarity measure in agglomerative
hierarchical clustering (AHC) raises an important issue: some correction needs
to be applied for the dimensionality of variables. In this work, we formulate
the decision of merging dependent multivariate normal variables in an AHC
procedure as a Bayesian model comparison. We found that the Bayesian
formulation naturally shrinks the empirical covariance matrix towards a matrix
set a priori (e.g., the identity), provides an automated stopping rule, and
corrects for dimensionality using a term that scales up the measure as a
function of the dimensionality of the variables. Also, the resulting log Bayes
factor is asymptotically proportional to the plug-in estimate of mutual
information, with an additive correction for dimensionality in agreement with
the Bayesian information criterion. We investigated the behavior of these
Bayesian alternatives (in exact and asymptotic forms) to mutual information on
simulated and real data. An encouraging result was first derived on
simulations: the hierarchical clustering based on the log Bayes factor
outperformed off-the-shelf clustering techniques as well as raw and normalized
mutual information in terms of classification accuracy. On a toy example, we
found that the Bayesian approaches led to results that were similar to those of
mutual information clustering techniques, with the advantage of an automated
thresholding. On real functional magnetic resonance imaging (fMRI) datasets
measuring brain activity, it identified clusters consistent with the
established outcome of standard procedures. On this application, normalized
mutual information had a highly atypical behavior, in the sense that it
systematically favored very large clusters. These initial experiments suggest
that the proposed Bayesian alternatives to mutual information are a useful new
tool for hierarchical clustering
Integrated Inference and Learning of Neural Factors in Structural Support Vector Machines
Tackling pattern recognition problems in areas such as computer vision,
bioinformatics, speech or text recognition is often done best by taking into
account task-specific statistical relations between output variables. In
structured prediction, this internal structure is used to predict multiple
outputs simultaneously, leading to more accurate and coherent predictions.
Structural support vector machines (SSVMs) are nonprobabilistic models that
optimize a joint input-output function through margin-based learning. Because
SSVMs generally disregard the interplay between unary and interaction factors
during the training phase, final parameters are suboptimal. Moreover, its
factors are often restricted to linear combinations of input features, limiting
its generalization power. To improve prediction accuracy, this paper proposes:
(i) Joint inference and learning by integration of back-propagation and
loss-augmented inference in SSVM subgradient descent; (ii) Extending SSVM
factors to neural networks that form highly nonlinear functions of input
features. Image segmentation benchmark results demonstrate improvements over
conventional SSVM training methods in terms of accuracy, highlighting the
feasibility of end-to-end SSVM training with neural factors
Segregating Event Streams and Noise with a Markov Renewal Process Model
DS and MP are supported by EPSRC Leadership Fellowship EP/G007144/1
- …