39,433 research outputs found
Recurrent kernel machines : computing with infinite echo state networks
Echo state networks (ESNs) are large, random recurrent neural networks with a single trained linear readout layer. Despite the untrained nature of the recurrent weights, they are capable of performing universal computations on temporal input data, which makes them interesting for both theoretical research and practical applications. The key to their success lies in the fact that the network computes a broad set of nonlinear, spatiotemporal mappings of the input data, on which linear regression or classification can easily be performed. One could consider the reservoir as a spatiotemporal kernel, in which the mapping to a high-dimensional space is computed explicitly. In this letter, we build on this idea and extend the concept of ESNs to infinite-sized recurrent neural networks, which can be considered recursive kernels that subsequently can be used to create recursive support vector machines. We present the theoretical framework, provide several practical examples of recursive kernels, and apply them to typical temporal tasks
Distributed Tree Kernels
In this paper, we propose the distributed tree kernels (DTK) as a novel
method to reduce time and space complexity of tree kernels. Using a linear
complexity algorithm to compute vectors for trees, we embed feature spaces of
tree fragments in low-dimensional spaces where the kernel computation is
directly done with dot product. We show that DTKs are faster, correlate with
tree kernels, and obtain a statistically similar performance in two natural
language processing tasks.Comment: ICML201
Probabilistic Polynomials and Hamming Nearest Neighbors
We show how to compute any symmetric Boolean function on variables over
any field (as well as the integers) with a probabilistic polynomial of degree
and error at most . The degree
dependence on and is optimal, matching a lower bound of Razborov
(1987) and Smolensky (1987) for the MAJORITY function. The proof is
constructive: a low-degree polynomial can be efficiently sampled from the
distribution.
This polynomial construction is combined with other algebraic ideas to give
the first subquadratic time algorithm for computing a (worst-case) batch of
Hamming distances in superlogarithmic dimensions, exactly. To illustrate, let
. Suppose we are given a database
of vectors in and a collection of query vectors
in the same dimension. For all , we wish to compute a
with minimum Hamming distance from . We solve this problem in randomized time. Hence, the problem is in "truly subquadratic"
time for dimensions, and in subquadratic time for . We apply the algorithm to computing pairs with maximum
inner product, closest pair in for vectors with bounded integer
entries, and pairs with maximum Jaccard coefficients.Comment: 16 pages. To appear in 56th Annual IEEE Symposium on Foundations of
Computer Science (FOCS 2015
Unscented Orientation Estimation Based on the Bingham Distribution
Orientation estimation for 3D objects is a common problem that is usually
tackled with traditional nonlinear filtering techniques such as the extended
Kalman filter (EKF) or the unscented Kalman filter (UKF). Most of these
techniques assume Gaussian distributions to account for system noise and
uncertain measurements. This distributional assumption does not consider the
periodic nature of pose and orientation uncertainty. We propose a filter that
considers the periodicity of the orientation estimation problem in its
distributional assumption. This is achieved by making use of the Bingham
distribution, which is defined on the hypersphere and thus inherently more
suitable to periodic problems. Furthermore, handling of non-trivial system
functions is done using deterministic sampling in an efficient way. A
deterministic sampling scheme reminiscent of the UKF is proposed for the
nonlinear manifold of orientations. It is the first deterministic sampling
scheme that truly reflects the nonlinear manifold of the orientation
Butterfly Factorization
The paper introduces the butterfly factorization as a data-sparse
approximation for the matrices that satisfy a complementary low-rank property.
The factorization can be constructed efficiently if either fast algorithms for
applying the matrix and its adjoint are available or the entries of the matrix
can be sampled individually. For an matrix, the resulting
factorization is a product of sparse matrices, each with
non-zero entries. Hence, it can be applied rapidly in operations.
Numerical results are provided to demonstrate the effectiveness of the
butterfly factorization and its construction algorithms
Communication-Computation Efficient Gradient Coding
This paper develops coding techniques to reduce the running time of
distributed learning tasks. It characterizes the fundamental tradeoff to
compute gradients (and more generally vector summations) in terms of three
parameters: computation load, straggler tolerance and communication cost. It
further gives an explicit coding scheme that achieves the optimal tradeoff
based on recursive polynomial constructions, coding both across data subsets
and vector components. As a result, the proposed scheme allows to minimize the
running time for gradient computations. Implementations are made on Amazon EC2
clusters using Python with mpi4py package. Results show that the proposed
scheme maintains the same generalization error while reducing the running time
by compared to uncoded schemes and compared to prior coded
schemes focusing only on stragglers (Tandon et al., ICML 2017)
- …