2,309 research outputs found
Distributed Semidefinite Programming with Application to Large-scale System Analysis
Distributed algorithms for solving coupled semidefinite programs (SDPs)
commonly require many iterations to converge. They also put high computational
demand on the computational agents. In this paper we show that in case the
coupled problem has an inherent tree structure, it is possible to devise an
efficient distributed algorithm for solving such problems. This algorithm can
potentially enjoy the same efficiency as centralized solvers that exploit
sparsity. The proposed algorithm relies on predictor-corrector primal-dual
interior-point methods, where we use a message-passing algorithm to compute the
search directions distributedly. Message-passing here is closely related to
dynamic programming over trees. This allows us to compute the exact search
directions in a finite number of steps. Furthermore this number can be computed
a priori and only depends on the coupling structure of the problem. We use the
proposed algorithm for analyzing robustness of large-scale uncertain systems
distributedly. We test the performance of this algorithm using numerical
examples.Comment: 14 pages and 6 figurs. Submitted to IEEE Transactions on Automatic
Contro
Extension and optimization of the FIND algorithm: computing Green's and less-than Green's functions (with technical appendix)
The FIND algorithm is a fast algorithm designed to calculate certain entries
of the inverse of a sparse matrix. Such calculation is critical in many
applications, e.g., quantum transport in nano-devices. We extended the
algorithm to other matrix inverse related calculations. Those are required for
example to calculate the less-than Green's function and the current density
through the device. For a 2D device discretized as an N_x x N_y mesh, the best
known algorithms have a running time of O(N_x^3 N_y), whereas FIND only
requires O(N_x^2 N_y). Even though this complexity has been reduced by an order
of magnitude, the matrix inverse calculation is still the most time consuming
part in the simulation of transport problems. We could not reduce the order of
complexity, but we were able to significantly reduce the constant factor
involved in the computation cost. By exploiting the sparsity and symmetry, the
size of the problem beyond which FIND is faster than other methods typically
decreases from a 130x130 2D mesh down to a 40x40 mesh. These improvements make
the optimized FIND algorithm even more competitive for real-life applications
Distributed Localization of Tree-structured Scattered Sensor Networks
Many of the distributed localization algorithms are based on relaxed
optimization formulations of the localization problem. These algorithms
commonly rely on first-order optimization methods, and hence may require many
iterations or communications among computational agents. Furthermore, some of
these distributed algorithms put a considerable computational demand on the
agents. In this paper, we show that for tree-structured scattered sensor
networks, which are networks that their inter-sensor range measurement graphs
have few edges (few range measurements among sensors) and can be represented
using a tree, it is possible to devise an efficient distributed localization
algorithm that solely relies on second-order methods. Particularly, we apply a
state-of-the-art primal-dual interior-point method to a semidefinite relaxation
of the maximum-likelihood formulation of the localization problem. We then show
how it is possible to exploit the tree-structure in the network and use
message-passing or dynamic programming over trees, to distribute computations
among different computational agents. The resulting algorithm requires far
fewer iterations and communications among agents to converge to an accurate
estimate. Moreover, the number of required communications among agents, seems
to be less sensitive and more robust to the number of sensors in the network,
the number of available measurements and the quality of the measurements. This
is in stark contrast to distributed algorithms that rely on first-order
methods. We illustrate the performance of our algorithm using experiments based
on simulated and real data.Comment: 14 pages and 11 Figure
A distributed-memory hierarchical solver for general sparse linear systems
We present a parallel hierarchical solver for general sparse linear systems
on distributed-memory machines. For large-scale problems, this fully algebraic
algorithm is faster and more memory-efficient than sparse direct solvers
because it exploits the low-rank structure of fill-in blocks. Depending on the
accuracy of low-rank approximations, the hierarchical solver can be used either
as a direct solver or as a preconditioner. The parallel algorithm is based on
data decomposition and requires only local communication for updating boundary
data on every processor. Moreover, the computation-to-communication ratio of
the parallel algorithm is approximately the volume-to-surface-area ratio of the
subdomain owned by every processor. We present various numerical results to
demonstrate the versatility and scalability of the parallel algorithm
Data-driven approximations of dynamical systems operators for control
The Koopman and Perron Frobenius transport operators are fundamentally
changing how we approach dynamical systems, providing linear representations
for even strongly nonlinear dynamics. Although there is tremendous potential
benefit of such a linear representation for estimation and control, transport
operators are infinite-dimensional, making them difficult to work with
numerically. Obtaining low-dimensional matrix approximations of these operators
is paramount for applications, and the dynamic mode decomposition has quickly
become a standard numerical algorithm to approximate the Koopman operator.
Related methods have seen rapid development, due to a combination of an
increasing abundance of data and the extensibility of DMD based on its simple
framing in terms of linear algebra. In this chapter, we review key innovations
in the data-driven characterization of transport operators for control,
providing a high-level and unified perspective. We emphasize important recent
developments around sparsity and control, and discuss emerging methods in big
data and machine learning.Comment: 37 pages, 4 figure
Tracking Switched Dynamic Network Topologies from Information Cascades
Contagions such as the spread of popular news stories, or infectious
diseases, propagate in cascades over dynamic networks with unobservable
topologies. However, "social signals" such as product purchase time, or blog
entry timestamps are measurable, and implicitly depend on the underlying
topology, making it possible to track it over time. Interestingly, network
topologies often "jump" between discrete states that may account for sudden
changes in the observed signals. The present paper advocates a switched dynamic
structural equation model to capture the topology-dependent cascade evolution,
as well as the discrete states driving the underlying topologies. Conditions
under which the proposed switched model is identifiable are established.
Leveraging the edge sparsity inherent to social networks, a recursive
-norm regularized least-squares estimator is put forth to jointly track
the states and network topologies. An efficient first-order proximal-gradient
algorithm is developed to solve the resulting optimization problem. Numerical
experiments on both synthetic data and real cascades measured over the span of
one year are conducted, and test results corroborate the efficacy of the
advocated approach
Embedded Ensemble Propagation for Improving Performance, Portability and Scalability of Uncertainty Quantification on Emerging Computational Architectures
Quantifying simulation uncertainties is a critical component of rigorous
predictive simulation. A key component of this is forward propagation of
uncertainties in simulation input data to output quantities of interest.
Typical approaches involve repeated sampling of the simulation over the
uncertain input data, and can require numerous samples when accurately
propagating uncertainties from large numbers of sources. Often simulation
processes from sample to sample are similar and much of the data generated from
each sample evaluation could be reused. We explore a new method for
implementing sampling methods that simultaneously propagates groups of samples
together in an embedded fashion, which we call embedded ensemble propagation.
We show how this approach takes advantage of properties of modern computer
architectures to improve performance by enabling reuse between samples,
reducing memory bandwidth requirements, improving memory access patterns,
improving opportunities for fine-grained parallelization, and reducing
communication costs. We describe a software technique for implementing embedded
ensemble propagation based on the use of C++ templates and describe its
integration with various scientific computing libraries within Trilinos. We
demonstrate improved performance, portability and scalability for the approach
applied to the simulation of partial differential equations on a variety of
CPU, GPU, and accelerator architectures, including up to 131,072 cores on a
Cray XK7 (Titan)
A Relaxation-based Network Decomposition Algorithm for Parallel Transient Stability Simulation with Improved Convergence
Transient stability simulation of a large-scale and interconnected electric
power system involves solving a large set of differential algebraic equations
(DAEs) at every simulation time-step. With the ever-growing size and complexity
of power grids, dynamic simulation becomes more time-consuming and
computationally difficult using conventional sequential simulation techniques.
To cope with this challenge, this paper aims to develop a fully distributed
approach intended for implementation on High Performance Computer (HPC)
clusters. A novel, relaxation-based domain decomposition algorithm known as
Parallel-General-Norton with Multiple-port Equivalent (PGNME) is proposed as
the core technique of a two-stage decomposition approach to divide the overall
dynamic simulation problem into a set of subproblems that can be solved
concurrently to exploit parallelism and scalability. While the convergence
property has traditionally been a concern for relaxation-based decomposition,
an estimation mechanism based on multiple-port network equivalent is adopted as
the preconditioner to enhance the convergence of the proposed algorithm. The
proposed algorithm is illustrated using rigorous mathematics and validated both
in terms of speed-up and capability. Moreover, a complexity analysis is
performed to support the observation that PGNME scales well when the size of
the subproblems are sufficiently large
Deformation corrected compressed sensing (DC-CS): a novel framework for accelerated dynamic MRI
We propose a novel deformation corrected compressed sensing (DC-CS) framework
to recover dynamic magnetic resonance images from undersampled measurements. We
introduce a generalized formulation that is capable of handling a wide class of
sparsity/compactness priors on the deformation corrected dynamic signal. In
this work, we consider example compactness priors such as sparsity in temporal
Fourier domain, sparsity in temporal finite difference domain, and nuclear norm
penalty to exploit low rank structure. Using variable splitting, we decouple
the complex optimization problem to simpler and well understood sub problems;
the resulting algorithm alternates between simple steps of shrinkage based
denoising, deformable registration, and a quadratic optimization step.
Additionally, we employ efficient continuation strategies to minimize the risk
of convergence to local minima. The proposed formulation contrasts with
existing DC-CS schemes that are customized for free breathing cardiac cine
applications, and other schemes that rely on fully sampled reference frames or
navigator signals to estimate the deformation parameters. The efficient
decoupling enabled by the proposed scheme allows its application to a wide
range of applications including contrast enhanced dynamic MRI. Through
experiments on numerical phantom and in vivo myocardial perfusion MRI datasets,
we demonstrate the utility of the proposed DC-CS scheme in providing robust
reconstructions with reduced motion artifacts over classical compressed sensing
schemes that utilize the compact priors on the original deformation
un-corrected signal
Using the VBARMS method in parallel computing
The paper describes an improved parallel MPI-based implementation of VBARMS,
a variable block variant of the pARMS preconditioner proposed by Li,~Saad and
Sosonkina [NLAA, 2003] for solving general nonsymmetric linear systems. The
parallel VBARMS solver can detect automatically exact or approximate dense
structures in the linear system, and exploits this information to achieve
improved reliability and increased throughput during the factorization. A novel
graph compression algorithm is discussed that finds these approximate dense
blocks structures and requires only one simple to use parameter. A complete
study of the numerical and parallel performance of parallel VBARMS is presented
for the analysis of large turbulent Navier-Stokes equations on a suite of
three-dimensional test cases
- …