31,720 research outputs found
Slow decay of concentration variance due to no-slip walls in chaotic mixing
Chaotic mixing in a closed vessel is studied experimentally and numerically
in different 2-D flow configurations. For a purely hyperbolic phase space, it
is well-known that concentration fluctuations converge to an eigenmode of the
advection-diffusion operator and decay exponentially with time. We illustrate
how the unstable manifold of hyperbolic periodic points dominates the resulting
persistent pattern. We show for different physical viscous flows that, in the
case of a fully chaotic Poincare section, parabolic periodic points at the
walls lead to slower (algebraic) decay. A persistent pattern, the backbone of
which is the unstable manifold of parabolic points, can be observed. However,
slow stretching at the wall forbids the rapid propagation of stretched
filaments throughout the whole domain, and hence delays the formation of an
eigenmode until it is no longer experimentally observable. Inspired by the
baker's map, we introduce a 1-D model with a parabolic point that gives a good
account of the slow decay observed in experiments. We derive a universal decay
law for such systems parametrized by the rate at which a particle approaches
the no-slip wall.Comment: 17 pages, 12 figure
Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees
Asynchronous distributed algorithms are a popular way to reduce
synchronization costs in large-scale optimization, and in particular for neural
network training. However, for nonsmooth and nonconvex objectives, few
convergence guarantees exist beyond cases where closed-form proximal operator
solutions are available. As most popular contemporary deep neural networks lead
to nonsmooth and nonconvex objectives, there is now a pressing need for such
convergence guarantees. In this paper, we analyze for the first time the
convergence of stochastic asynchronous optimization for this general class of
objectives. In particular, we focus on stochastic subgradient methods allowing
for block variable partitioning, where the shared-memory-based model is
asynchronously updated by concurrent processes. To this end, we first introduce
a probabilistic model which captures key features of real asynchronous
scheduling between concurrent processes; under this model, we establish
convergence with probability one to an invariant set for stochastic subgradient
methods with momentum.
From the practical perspective, one issue with the family of methods we
consider is that it is not efficiently supported by machine learning
frameworks, as they mostly focus on distributed data-parallel strategies. To
address this, we propose a new implementation strategy for shared-memory based
training of deep neural networks, whereby concurrent parameter servers are
utilized to train a partitioned but shared model in single- and multi-GPU
settings. Based on this implementation, we achieve on average 1.2x speed-up in
comparison to state-of-the-art training methods for popular image
classification tasks without compromising accuracy
Hardness of parameter estimation in graphical models
We consider the problem of learning the canonical parameters specifying an
undirected graphical model (Markov random field) from the mean parameters. For
graphical models representing a minimal exponential family, the canonical
parameters are uniquely determined by the mean parameters, so the problem is
feasible in principle. The goal of this paper is to investigate the
computational feasibility of this statistical task. Our main result shows that
parameter estimation is in general intractable: no algorithm can learn the
canonical parameters of a generic pair-wise binary graphical model from the
mean parameters in time bounded by a polynomial in the number of variables
(unless RP = NP). Indeed, such a result has been believed to be true (see the
monograph by Wainwright and Jordan (2008)) but no proof was known.
Our proof gives a polynomial time reduction from approximating the partition
function of the hard-core model, known to be hard, to learning approximate
parameters. Our reduction entails showing that the marginal polytope boundary
has an inherent repulsive property, which validates an optimization procedure
over the polytope that does not use any knowledge of its structure (as required
by the ellipsoid method and others).Comment: 15 pages. To appear in NIPS 201
Solving the TTC 2011 Reengineering Case with GReTL
This paper discusses the GReTL reference solution of the TTC 2011
Reengineering case. Given a Java syntax graph, a simple state machine model has
to be extracted. The submitted solution covers both the core task and the two
extension tasks.Comment: In Proceedings TTC 2011, arXiv:1111.440
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization
Due to their simplicity and excellent performance, parallel asynchronous
variants of stochastic gradient descent have become popular methods to solve a
wide range of large-scale optimization problems on multi-core architectures.
Yet, despite their practical success, support for nonsmooth objectives is still
lacking, making them unsuitable for many problems of interest in machine
learning, such as the Lasso, group Lasso or empirical risk minimization with
convex constraints.
In this work, we propose and analyze ProxASAGA, a fully asynchronous sparse
method inspired by SAGA, a variance reduced incremental gradient algorithm. The
proposed method is easy to implement and significantly outperforms the state of
the art on several nonsmooth, large-scale problems. We prove that our method
achieves a theoretical linear speedup with respect to the sequential version
under assumptions on the sparsity of gradients and block-separability of the
proximal term. Empirical benchmarks on a multi-core architecture illustrate
practical speedups of up to 12x on a 20-core machine.Comment: Appears in Advances in Neural Information Processing Systems 30 (NIPS
2017), 28 page
Downward transference of mice and universality of local core models
If M is a proper class inner model of ZFC and omega_2^M=omega_2, then every
sound mouse projecting to omega and not past 0-pistol belongs to M. In fact,
under the assumption that 0-pistol does not belong to M, K^M \| omega_2 is
universal for all countable mice in V.
Similarly, if M is a proper class inner model of ZFC, delta>omega_1 is
regular, (delta^+)^M = delta^+, and in V there is no proper class inner model
with a Woodin cardinal, then K^M \| delta is universal for all mice in V of
cardinality less than delta.Comment: Revised version, incorporating the referee's suggestion
Efficient First Order Methods for Linear Composite Regularizers
A wide class of regularization problems in machine learning and statistics
employ a regularization term which is obtained by composing a simple convex
function \omega with a linear transformation. This setting includes Group Lasso
methods, the Fused Lasso and other total variation methods, multi-task learning
methods and many more. In this paper, we present a general approach for
computing the proximity operator of this class of regularizers, under the
assumption that the proximity operator of the function \omega is known in
advance. Our approach builds on a recent line of research on optimal first
order optimization methods and uses fixed point iterations for numerically
computing the proximity operator. It is more general than current approaches
and, as we show with numerical simulations, computationally more efficient than
available first order methods which do not achieve the optimal rate. In
particular, our method outperforms state of the art O(1/T) methods for
overlapping Group Lasso and matches optimal O(1/T^2) methods for the Fused
Lasso and tree structured Group Lasso.Comment: 19 pages, 8 figure
- …