9,068 research outputs found
Fast Asynchronous Parallel Stochastic Gradient Decent
Stochastic gradient descent~(SGD) and its variants have become more and more
popular in machine learning due to their efficiency and effectiveness. To
handle large-scale problems, researchers have recently proposed several
parallel SGD methods for multicore systems. However, existing parallel SGD
methods cannot achieve satisfactory performance in real applications. In this
paper, we propose a fast asynchronous parallel SGD method, called AsySVRG, by
designing an asynchronous strategy to parallelize the recently proposed SGD
variant called stochastic variance reduced gradient~(SVRG). Both theoretical
and empirical results show that AsySVRG can outperform existing
state-of-the-art parallel SGD methods like Hogwild! in terms of convergence
rate and computation cost
On the Convergence of Memory-Based Distributed SGD
Distributed stochastic gradient descent~(DSGD) has been widely used for
optimizing large-scale machine learning models, including both convex and
non-convex models. With the rapid growth of model size, huge communication cost
has been the bottleneck of traditional DSGD. Recently, many communication
compression methods have been proposed. Memory-based distributed stochastic
gradient descent~(M-DSGD) is one of the efficient methods since each worker
communicates a sparse vector in each iteration so that the communication cost
is small. Recent works propose the convergence rate of M-DSGD when it adopts
vanilla SGD. However, there is still a lack of convergence theory for M-DSGD
when it adopts momentum SGD. In this paper, we propose a universal convergence
analysis for M-DSGD by introducing \emph{transformation equation}. The
transformation equation describes the relation between traditional DSGD and
M-DSGD so that we can transform M-DSGD to its corresponding DSGD. Hence we get
the convergence rate of M-DSGD with momentum for both convex and non-convex
problems. Furthermore, we combine M-DSGD and stagewise learning that the
learning rate of M-DSGD in each stage is a constant and is decreased by stage,
instead of iteration. Using the transformation equation, we propose the
convergence rate of stagewise M-DSGD which bridges the gap between theory and
practice
Scalar waves from a star orbiting a BTZ black hole
In this paper we compute the decay rates of massless scalar waves excited by
a star circularly orbiting around the non-extremal (general) and extremal BTZ
black holes. These decay rates are compared with the corresponding quantities
computed in the corresponding dual conformal field theories respectively. We
find that matches are achieved in both cases.Comment: In v2, 17 pages, title changed (contents not changed), discussion of
the isometry group of the near-horizon-extremal BTZ geometry and its effects
on the solutions is added, references added. V3, minor corrections, several
more references adde
Size-Sensitive Young's modulus of Kinked Silicon Nanowires
We perform both classical molecular dynamics simulations and beam model
calculations to investigate the Young's modulus of kinked silicon nanowires
(KSiNWs). The Young's modulus is found to be highly sensitive to the arm length
of the kink and is essentially inversely proportional to the arm length. The
mechanism underlying the size dependence is found to be the interplay between
the kink angle potential and the arm length potential, where we obtain an
analytic relationship between the Young's modulus and the arm length of the
KSiNW. Our results provide insight into the application of this novel building
block in nanomechanical devices.Comment: Nanotechnology, accepted (2013
Scalable Stochastic Alternating Direction Method of Multipliers
Stochastic alternating direction method of multipliers (ADMM), which visits
only one sample or a mini-batch of samples each time, has recently been proved
to achieve better performance than batch ADMM. However, most stochastic methods
can only achieve a convergence rate on general convex
problems,where T is the number of iterations. Hence, these methods are not
scalable with respect to convergence rate (computation cost). There exists only
one stochastic method, called SA-ADMM, which can achieve convergence rate
on general convex problems. However, an extra memory is needed for
SA-ADMM to store the historic gradients on all samples, and thus it is not
scalable with respect to storage cost. In this paper, we propose a novel
method, called scalable stochastic ADMM(SCAS-ADMM), for large-scale
optimization and learning problems. Without the need to store the historic
gradients, SCAS-ADMM can achieve the same convergence rate as the best
stochastic method SA-ADMM and batch ADMM on general convex problems.
Experiments on graph-guided fused lasso show that SCAS-ADMM can achieve
state-of-the-art performance in real application
Anomalies, effective action and Hawking temperatures of a Schwarzschild black hole in the isotropic coordinates
Motivated by the universality of Hawking radiation and that of the anomaly
cancellation technique as well as that of the effective action method, we
investigate the Hawking radiation of a Schwarzschild black hole in the
isotropic coordinates via the cancellation of gravitational anomaly. After
performing a dimensional reduction from the four-dimensional isotropic
Schwarzschild metric, we show that this reduction procedure will, in general,
result in two classes of two-dimensional effective metrics: the conformal
equivalent and the inequivalent ones. For the physically equivalent class, the
two-dimensional effective metric displays such a distinct feature that the
determinant is not equal to the unity (), but also vanishes
at the horizon, the latter of which possibly invalidates the anomaly analysis
there. ...
This is an updated version to replace our e-print arXiv:0709.0044 [hep-th].
Abstract is too long to exceed the limit of 24 lines by arXiv.Comment: 26 pages. Published version, with references update
Exploring supersymmetry with machine learning
Investigation of well-motivated parameter space in the theories of Beyond the
Standard Model (BSM) plays an important role in new physics discoveries.
However, a large-scale exploration of models with multi-parameter or equivalent
solutions with a finite separation, such as supersymmetric models, is typically
a time-consuming and challenging task. In this paper, we propose a
self-exploration method, named Machine Learning Scan (MLS), to achieve an
efficient test of models. As a proof-of-concept, we apply MLS to investigate
the subspace of MSSM and CMSSM and find that such a method can reduce the
computational cost and may be helpful for accelerating the exploration of
supersymmetry.Comment: 7 pages, 8 figures. Discussions, comments and CMSSM model are added.
Accepted for publication in Nuclear Physics
Counterpart synchronization of duplex networks with delayed nodes and noise perturbation
In the real world, many complex systems are represented not by single
networks but rather by sets of interdependent ones. In these specific networks,
nodes in one network mutually interact with nodes in other networks. This paper
focuses on a simple representative case of two-layer networks (the so-called
duplex networks) with unidirectional inter-layer couplings. That is, each node
in one network depends on a counterpart in the other network. Accordingly, the
former network is called the response layer and the latter network is the drive
layer. Specifically, synchronization between each node in the drive layer and
its counterpart in the response layer (counterpart synchronization, or CS) of
this sort of duplex networks with delayed nodes and noise perturbation is
investigated. Based on the LaSalle-type invariance principle, a control
technique is proposed and a sufficient condition is developed for realizing
counterpart synchronization of duplex networks. Furthermore, two corollaries
are derived as special cases. In addition, node dynamics within each layer can
be various and topologies of the two layers are not necessarily identical.
Therefore, the proposed synchronization method can be applied to a wide range
of multiplex networks. Numerical examples are provided to illustrate the
feasibility and effectiveness of the results.Comment: 11 page
Metric Learning Driven Multi-Task Structured Output Optimization for Robust Keypoint Tracking
As an important and challenging problem in computer vision and graphics,
keypoint-based object tracking is typically formulated in a spatio-temporal
statistical learning framework. However, most existing keypoint trackers are
incapable of effectively modeling and balancing the following three aspects in
a simultaneous manner: temporal model coherence across frames, spatial model
consistency within frames, and discriminative feature construction. To address
this issue, we propose a robust keypoint tracker based on spatio-temporal
multi-task structured output optimization driven by discriminative metric
learning. Consequently, temporal model coherence is characterized by multi-task
structured keypoint model learning over several adjacent frames, while spatial
model consistency is modeled by solving a geometric verification based
structured learning problem. Discriminative feature construction is enabled by
metric learning to ensure the intra-class compactness and inter-class
separability. Finally, the above three modules are simultaneously optimized in
a joint learning scheme. Experimental results have demonstrated the
effectiveness of our tracker.Comment: Accepted by AAAI-1
Detecting Adversarial Examples via Key-based Network
Though deep neural networks have achieved state-of-the-art performance in
visual classification, recent studies have shown that they are all vulnerable
to the attack of adversarial examples. Small and often imperceptible
perturbations to the input images are sufficient to fool the most powerful deep
neural networks. Various defense methods have been proposed to address this
issue. However, they either require knowledge on the process of generating
adversarial examples, or are not robust against new attacks specifically
designed to penetrate the existing defense. In this work, we introduce
key-based network, a new detection-based defense mechanism to distinguish
adversarial examples from normal ones based on error correcting output codes,
using the binary code vectors produced by multiple binary classifiers applied
to randomly chosen label-sets as signatures to match normal images and reject
adversarial examples. In contrast to existing defense methods, the proposed
method does not require knowledge of the process for generating adversarial
examples and can be applied to defend against different types of attacks. For
the practical black-box and gray-box scenarios, where the attacker does not
know the encoding scheme, we show empirically that key-based network can
effectively detect adversarial examples generated by several state-of-the-art
attacks.Comment: 6 page
- …