1,065 research outputs found
Distributing the Kalman Filter for Large-Scale Systems
This paper derives a \emph{distributed} Kalman filter to estimate a sparsely
connected, large-scale, dimensional, dynamical system monitored by a
network of sensors. Local Kalman filters are implemented on the
(dimensional, where ) sub-systems that are obtained after
spatially decomposing the large-scale system. The resulting sub-systems
overlap, which along with an assimilation procedure on the local Kalman
filters, preserve an th order Gauss-Markovian structure of the centralized
error processes. The information loss due to the th order Gauss-Markovian
approximation is controllable as it can be characterized by a divergence that
decreases as . The order of the approximation, , leads to a lower
bound on the dimension of the sub-systems, hence, providing a criterion for
sub-system selection. The assimilation procedure is carried out on the local
error covariances with a distributed iterate collapse inversion (DICI)
algorithm that we introduce. The DICI algorithm computes the (approximated)
centralized Riccati and Lyapunov equations iteratively with only local
communication and low-order computation. We fuse the observations that are
common among the local Kalman filters using bipartite fusion graphs and
consensus averaging algorithms. The proposed algorithm achieves full
distribution of the Kalman filter that is coherent with the centralized Kalman
filter with an th order Gaussian-Markovian structure on the centralized
error processes. Nowhere storage, communication, or computation of
dimensional vectors and matrices is needed; only dimensional
vectors and matrices are communicated or used in the computation at the
sensors
Exact Sparse Matrix-Vector Multiplication on GPU's and Multicore Architectures
We propose different implementations of the sparse matrix--dense vector
multiplication (\spmv{}) for finite fields and rings \Zb/m\Zb. We take
advantage of graphic card processors (GPU) and multi-core architectures. Our
aim is to improve the speed of \spmv{} in the \linbox library, and henceforth
the speed of its black box algorithms. Besides, we use this and a new
parallelization of the sigma-basis algorithm in a parallel block Wiedemann rank
implementation over finite fields
Covariance Estimation in High Dimensions via Kronecker Product Expansions
This paper presents a new method for estimating high dimensional covariance
matrices. The method, permuted rank-penalized least-squares (PRLS), is based on
a Kronecker product series expansion of the true covariance matrix. Assuming an
i.i.d. Gaussian random sample, we establish high dimensional rates of
convergence to the true covariance as both the number of samples and the number
of variables go to infinity. For covariance matrices of low separation rank,
our results establish that PRLS has significantly faster convergence than the
standard sample covariance matrix (SCM) estimator. The convergence rate
captures a fundamental tradeoff between estimation error and approximation
error, thus providing a scalable covariance estimation framework in terms of
separation rank, similar to low rank approximation of covariance matrices. The
MSE convergence rates generalize the high dimensional rates recently obtained
for the ML Flip-flop algorithm for Kronecker product covariance estimation. We
show that a class of block Toeplitz covariance matrices is approximatable by
low separation rank and give bounds on the minimal separation rank that
ensures a given level of bias. Simulations are presented to validate the
theoretical bounds. As a real world application, we illustrate the utility of
the proposed Kronecker covariance estimator for spatio-temporal linear least
squares prediction of multivariate wind speed measurements.Comment: 47 pages, accepted to IEEE Transactions on Signal Processin
Algebraic approaches for coded caching and distributed computing
This dissertation examines the power of algebraic methods in two areas of modern interest: caching for large scale content distribution and straggler mitigation within distributed computation.
Caching is a popular technique for facilitating large scale content delivery over the Internet. Traditionally, caching operates by storing popular content closer to the end users. Recent work within the domain of information theory demonstrates that allowing coding in the cache and coded transmission from the server (referred to as coded caching) to the end users can allow for significant reductions in the number of bits transmitted from the server to the end users. The first part of this dissertation examines problems within coded caching.
The original formulation of the coded caching problem assumes that the server and the end users are connected via a single shared link. In Chapter 2, we consider a more general topology where there is a layer of relay nodes between the server and the users. We propose novel schemes for a class of such networks that satisfy a so-called resolvability property and demonstrate that the performance of our scheme is strictly better than previously proposed schemes. Moreover, the original coded caching scheme requires that each file hosted in the server be partitioned into a large number (i.e., the subpacketization level) of non-overlapping subfiles. From a practical perspective, this is problematic as it means that prior schemes are only applicable when the size of the files is extremely large. In Chapter 3, we propose a novel coded caching scheme that enjoys a significantly lower subpacketization level than prior schemes, while only suffering a marginal increase in the transmission rate. We demonstrate that several schemes with subpacketization levels that are exponentially smaller than the basic scheme can be obtained.
The second half of this dissertation deals with large scale distributed matrix computations. Distributed matrix multiplication is an important problem, especially in domains such as deep learning of neural networks. It is well recognized that the computation times on distributed clusters are often dominated by the slowest workers (called stragglers). Recently, techniques from coding theory have found applications in straggler mitigation in the specific context of matrix-matrix and matrix-vector multiplication. The computation can be completed as long as a certain number of workers (called the recovery threshold) complete their assigned tasks.
In Chapter 4, we consider matrix multiplication under the assumption that the absolute values of the matrix entries are sufficiently small. Under this condition, we present a method with a significantly smaller recovery threshold than prior work. Besides, the prior work suffers from serious numerical issues owing to the condition number of the corresponding real Vandermonde-structured recovery matrices; this condition number grows exponentially in the number of workers. In Chapter 5, we present a novel approach that leverages the properties of circulant permutation matrices and rotation matrices for coded matrix computation. In addition to having an optimal recovery threshold, we demonstrate an upper bound on the worst case condition number of our recovery matrices grows polynomially in the number of workers
Polynomial Chaos Expansion of random coefficients and the solution of stochastic partial differential equations in the Tensor Train format
We apply the Tensor Train (TT) decomposition to construct the tensor product
Polynomial Chaos Expansion (PCE) of a random field, to solve the stochastic
elliptic diffusion PDE with the stochastic Galerkin discretization, and to
compute some quantities of interest (mean, variance, exceedance probabilities).
We assume that the random diffusion coefficient is given as a smooth
transformation of a Gaussian random field. In this case, the PCE is delivered
by a complicated formula, which lacks an analytic TT representation. To
construct its TT approximation numerically, we develop the new block TT cross
algorithm, a method that computes the whole TT decomposition from a few
evaluations of the PCE formula. The new method is conceptually similar to the
adaptive cross approximation in the TT format, but is more efficient when
several tensors must be stored in the same TT representation, which is the case
for the PCE. Besides, we demonstrate how to assemble the stochastic Galerkin
matrix and to compute the solution of the elliptic equation and its
post-processing, staying in the TT format.
We compare our technique with the traditional sparse polynomial chaos and the
Monte Carlo approaches. In the tensor product polynomial chaos, the polynomial
degree is bounded for each random variable independently. This provides higher
accuracy than the sparse polynomial set or the Monte Carlo method, but the
cardinality of the tensor product set grows exponentially with the number of
random variables. However, when the PCE coefficients are implicitly
approximated in the TT format, the computations with the full tensor product
polynomial set become possible. In the numerical experiments, we confirm that
the new methodology is competitive in a wide range of parameters, especially
where high accuracy and high polynomial degrees are required.Comment: This is a major revision of the manuscript arXiv:1406.2816 with
significantly extended numerical experiments. Some unused material is remove
Precoded FIR and Redundant V-BLAST Systems for Frequency-Selective MIMO Channels
The vertical Bell labs layered space-time (V-BLAST) system is a multi-input multioutput (MIMO) system designed to achieve good multiplexing gain. In recent literature, a precoder, which exploits channel information, has been added in the V-BLAST transmitter. This precoder forces each symbol stream to have an identical mean square error (MSE). It can be viewed as an alternative to the bit-loading method. In this paper, this precoded V-BLAST system is extended to the case of frequency-selective MIMO channels. Both the FIR and redundant types of transceivers, which use cyclic-prefixing and zero-padding, are considered. A fast algorithm for computing a cyclic-prefixing-based precoded V-BLAST transceiver is developed. Experiments show that the proposed methods with redundancy have better performance than the SVD-based system with optimal powerloading and bit loading for frequency-selective MIMO channels. The gain comes from the fact that the MSE-equalizing precoder has better bit-error rate performance than the optimal bitloading method
A Tight Lower Bound for Counting Hamiltonian Cycles via Matrix Rank
For even , the matchings connectivity matrix encodes which
pairs of perfect matchings on vertices form a single cycle. Cygan et al.
(STOC 2013) showed that the rank of over is
and used this to give an
time algorithm for counting Hamiltonian cycles modulo on graphs of
pathwidth . The same authors complemented their algorithm by an
essentially tight lower bound under the Strong Exponential Time Hypothesis
(SETH). This bound crucially relied on a large permutation submatrix within
, which enabled a "pattern propagation" commonly used in previous
related lower bounds, as initiated by Lokshtanov et al. (SODA 2011).
We present a new technique for a similar pattern propagation when only a
black-box lower bound on the asymptotic rank of is given; no
stronger structural insights such as the existence of large permutation
submatrices in are needed. Given appropriate rank bounds, our
technique yields lower bounds for counting Hamiltonian cycles (also modulo
fixed primes ) parameterized by pathwidth.
To apply this technique, we prove that the rank of over the
rationals is . We also show that the rank of
over is for any prime
and even for some primes.
As a consequence, we obtain that Hamiltonian cycles cannot be counted in time
for any unless SETH fails. This
bound is tight due to a time algorithm by Bodlaender et
al. (ICALP 2013). Under SETH, we also obtain that Hamiltonian cycles cannot be
counted modulo primes in time , indicating
that the modulus can affect the complexity in intricate ways.Comment: improved lower bounds modulo primes, improved figures, to appear in
SODA 201
- …