9,068 research outputs found

    Fast Asynchronous Parallel Stochastic Gradient Decent

    Full text link
    Stochastic gradient descent~(SGD) and its variants have become more and more popular in machine learning due to their efficiency and effectiveness. To handle large-scale problems, researchers have recently proposed several parallel SGD methods for multicore systems. However, existing parallel SGD methods cannot achieve satisfactory performance in real applications. In this paper, we propose a fast asynchronous parallel SGD method, called AsySVRG, by designing an asynchronous strategy to parallelize the recently proposed SGD variant called stochastic variance reduced gradient~(SVRG). Both theoretical and empirical results show that AsySVRG can outperform existing state-of-the-art parallel SGD methods like Hogwild! in terms of convergence rate and computation cost

    On the Convergence of Memory-Based Distributed SGD

    Full text link
    Distributed stochastic gradient descent~(DSGD) has been widely used for optimizing large-scale machine learning models, including both convex and non-convex models. With the rapid growth of model size, huge communication cost has been the bottleneck of traditional DSGD. Recently, many communication compression methods have been proposed. Memory-based distributed stochastic gradient descent~(M-DSGD) is one of the efficient methods since each worker communicates a sparse vector in each iteration so that the communication cost is small. Recent works propose the convergence rate of M-DSGD when it adopts vanilla SGD. However, there is still a lack of convergence theory for M-DSGD when it adopts momentum SGD. In this paper, we propose a universal convergence analysis for M-DSGD by introducing \emph{transformation equation}. The transformation equation describes the relation between traditional DSGD and M-DSGD so that we can transform M-DSGD to its corresponding DSGD. Hence we get the convergence rate of M-DSGD with momentum for both convex and non-convex problems. Furthermore, we combine M-DSGD and stagewise learning that the learning rate of M-DSGD in each stage is a constant and is decreased by stage, instead of iteration. Using the transformation equation, we propose the convergence rate of stagewise M-DSGD which bridges the gap between theory and practice

    Scalar waves from a star orbiting a BTZ black hole

    Full text link
    In this paper we compute the decay rates of massless scalar waves excited by a star circularly orbiting around the non-extremal (general) and extremal BTZ black holes. These decay rates are compared with the corresponding quantities computed in the corresponding dual conformal field theories respectively. We find that matches are achieved in both cases.Comment: In v2, 17 pages, title changed (contents not changed), discussion of the isometry group of the near-horizon-extremal BTZ geometry and its effects on the solutions is added, references added. V3, minor corrections, several more references adde

    Size-Sensitive Young's modulus of Kinked Silicon Nanowires

    Full text link
    We perform both classical molecular dynamics simulations and beam model calculations to investigate the Young's modulus of kinked silicon nanowires (KSiNWs). The Young's modulus is found to be highly sensitive to the arm length of the kink and is essentially inversely proportional to the arm length. The mechanism underlying the size dependence is found to be the interplay between the kink angle potential and the arm length potential, where we obtain an analytic relationship between the Young's modulus and the arm length of the KSiNW. Our results provide insight into the application of this novel building block in nanomechanical devices.Comment: Nanotechnology, accepted (2013

    Scalable Stochastic Alternating Direction Method of Multipliers

    Full text link
    Stochastic alternating direction method of multipliers (ADMM), which visits only one sample or a mini-batch of samples each time, has recently been proved to achieve better performance than batch ADMM. However, most stochastic methods can only achieve a convergence rate O(1/T)O(1/\sqrt T) on general convex problems,where T is the number of iterations. Hence, these methods are not scalable with respect to convergence rate (computation cost). There exists only one stochastic method, called SA-ADMM, which can achieve convergence rate O(1/T)O(1/T) on general convex problems. However, an extra memory is needed for SA-ADMM to store the historic gradients on all samples, and thus it is not scalable with respect to storage cost. In this paper, we propose a novel method, called scalable stochastic ADMM(SCAS-ADMM), for large-scale optimization and learning problems. Without the need to store the historic gradients, SCAS-ADMM can achieve the same convergence rate O(1/T)O(1/T) as the best stochastic method SA-ADMM and batch ADMM on general convex problems. Experiments on graph-guided fused lasso show that SCAS-ADMM can achieve state-of-the-art performance in real application

    Anomalies, effective action and Hawking temperatures of a Schwarzschild black hole in the isotropic coordinates

    Full text link
    Motivated by the universality of Hawking radiation and that of the anomaly cancellation technique as well as that of the effective action method, we investigate the Hawking radiation of a Schwarzschild black hole in the isotropic coordinates via the cancellation of gravitational anomaly. After performing a dimensional reduction from the four-dimensional isotropic Schwarzschild metric, we show that this reduction procedure will, in general, result in two classes of two-dimensional effective metrics: the conformal equivalent and the inequivalent ones. For the physically equivalent class, the two-dimensional effective metric displays such a distinct feature that the determinant is not equal to the unity (−g≠1\sqrt{-g} \neq 1), but also vanishes at the horizon, the latter of which possibly invalidates the anomaly analysis there. ... This is an updated version to replace our e-print arXiv:0709.0044 [hep-th]. Abstract is too long to exceed the limit of 24 lines by arXiv.Comment: 26 pages. Published version, with references update

    Exploring supersymmetry with machine learning

    Full text link
    Investigation of well-motivated parameter space in the theories of Beyond the Standard Model (BSM) plays an important role in new physics discoveries. However, a large-scale exploration of models with multi-parameter or equivalent solutions with a finite separation, such as supersymmetric models, is typically a time-consuming and challenging task. In this paper, we propose a self-exploration method, named Machine Learning Scan (MLS), to achieve an efficient test of models. As a proof-of-concept, we apply MLS to investigate the subspace of MSSM and CMSSM and find that such a method can reduce the computational cost and may be helpful for accelerating the exploration of supersymmetry.Comment: 7 pages, 8 figures. Discussions, comments and CMSSM model are added. Accepted for publication in Nuclear Physics

    Counterpart synchronization of duplex networks with delayed nodes and noise perturbation

    Full text link
    In the real world, many complex systems are represented not by single networks but rather by sets of interdependent ones. In these specific networks, nodes in one network mutually interact with nodes in other networks. This paper focuses on a simple representative case of two-layer networks (the so-called duplex networks) with unidirectional inter-layer couplings. That is, each node in one network depends on a counterpart in the other network. Accordingly, the former network is called the response layer and the latter network is the drive layer. Specifically, synchronization between each node in the drive layer and its counterpart in the response layer (counterpart synchronization, or CS) of this sort of duplex networks with delayed nodes and noise perturbation is investigated. Based on the LaSalle-type invariance principle, a control technique is proposed and a sufficient condition is developed for realizing counterpart synchronization of duplex networks. Furthermore, two corollaries are derived as special cases. In addition, node dynamics within each layer can be various and topologies of the two layers are not necessarily identical. Therefore, the proposed synchronization method can be applied to a wide range of multiplex networks. Numerical examples are provided to illustrate the feasibility and effectiveness of the results.Comment: 11 page

    Metric Learning Driven Multi-Task Structured Output Optimization for Robust Keypoint Tracking

    Full text link
    As an important and challenging problem in computer vision and graphics, keypoint-based object tracking is typically formulated in a spatio-temporal statistical learning framework. However, most existing keypoint trackers are incapable of effectively modeling and balancing the following three aspects in a simultaneous manner: temporal model coherence across frames, spatial model consistency within frames, and discriminative feature construction. To address this issue, we propose a robust keypoint tracker based on spatio-temporal multi-task structured output optimization driven by discriminative metric learning. Consequently, temporal model coherence is characterized by multi-task structured keypoint model learning over several adjacent frames, while spatial model consistency is modeled by solving a geometric verification based structured learning problem. Discriminative feature construction is enabled by metric learning to ensure the intra-class compactness and inter-class separability. Finally, the above three modules are simultaneously optimized in a joint learning scheme. Experimental results have demonstrated the effectiveness of our tracker.Comment: Accepted by AAAI-1

    Detecting Adversarial Examples via Key-based Network

    Full text link
    Though deep neural networks have achieved state-of-the-art performance in visual classification, recent studies have shown that they are all vulnerable to the attack of adversarial examples. Small and often imperceptible perturbations to the input images are sufficient to fool the most powerful deep neural networks. Various defense methods have been proposed to address this issue. However, they either require knowledge on the process of generating adversarial examples, or are not robust against new attacks specifically designed to penetrate the existing defense. In this work, we introduce key-based network, a new detection-based defense mechanism to distinguish adversarial examples from normal ones based on error correcting output codes, using the binary code vectors produced by multiple binary classifiers applied to randomly chosen label-sets as signatures to match normal images and reject adversarial examples. In contrast to existing defense methods, the proposed method does not require knowledge of the process for generating adversarial examples and can be applied to defend against different types of attacks. For the practical black-box and gray-box scenarios, where the attacker does not know the encoding scheme, we show empirically that key-based network can effectively detect adversarial examples generated by several state-of-the-art attacks.Comment: 6 page
    • …
    corecore