1,869 research outputs found

    Spectral Sparsification and Regret Minimization Beyond Matrix Multiplicative Updates

    Full text link
    In this paper, we provide a novel construction of the linear-sized spectral sparsifiers of Batson, Spielman and Srivastava [BSS14]. While previous constructions required Ω(n4)\Omega(n^4) running time [BSS14, Zou12], our sparsification routine can be implemented in almost-quadratic running time O(n2+ε)O(n^{2+\varepsilon}). The fundamental conceptual novelty of our work is the leveraging of a strong connection between sparsification and a regret minimization problem over density matrices. This connection was known to provide an interpretation of the randomized sparsifiers of Spielman and Srivastava [SS11] via the application of matrix multiplicative weight updates (MWU) [CHS11, Vis14]. In this paper, we explain how matrix MWU naturally arises as an instance of the Follow-the-Regularized-Leader framework and generalize this approach to yield a larger class of updates. This new class allows us to accelerate the construction of linear-sized spectral sparsifiers, and give novel insights on the motivation behind Batson, Spielman and Srivastava [BSS14]

    Contextual Object Detection with a Few Relevant Neighbors

    Full text link
    A natural way to improve the detection of objects is to consider the contextual constraints imposed by the detection of additional objects in a given scene. In this work, we exploit the spatial relations between objects in order to improve detection capacity, as well as analyze various properties of the contextual object detection problem. To precisely calculate context-based probabilities of objects, we developed a model that examines the interactions between objects in an exact probabilistic setting, in contrast to previous methods that typically utilize approximations based on pairwise interactions. Such a scheme is facilitated by the realistic assumption that the existence of an object in any given location is influenced by only few informative locations in space. Based on this assumption, we suggest a method for identifying these relevant locations and integrating them into a mostly exact calculation of probability based on their raw detector responses. This scheme is shown to improve detection results and provides unique insights about the process of contextual inference for object detection. We show that it is generally difficult to learn that a particular object reduces the probability of another, and that in cases when the context and detector strongly disagree this learning becomes virtually impossible for the purposes of improving the results of an object detector. Finally, we demonstrate improved detection results through use of our approach as applied to the PASCAL VOC and COCO datasets

    Largeness and SQ-universality of cyclically presented groups

    Get PDF
    Largeness, SQ-universality, and the existence of free subgroups of rank 2 are measures of the complexity of a finitely presented group. We obtain conditions under which a cyclically presented group possesses one or more of these properties. We apply our results to a class of groups introduced by Prishchepov which contains, amongst others, the various generalizations of Fibonacci groups introduced by Campbell and Robertson

    Premise Selection for Mathematics by Corpus Analysis and Kernel Methods

    Get PDF
    Smart premise selection is essential when using automated reasoning as a tool for large-theory formal proof development. A good method for premise selection in complex mathematical libraries is the application of machine learning to large corpora of proofs. This work develops learning-based premise selection in two ways. First, a newly available minimal dependency analysis of existing high-level formal mathematical proofs is used to build a large knowledge base of proof dependencies, providing precise data for ATP-based re-verification and for training premise selection algorithms. Second, a new machine learning algorithm for premise selection based on kernel methods is proposed and implemented. To evaluate the impact of both techniques, a benchmark consisting of 2078 large-theory mathematical problems is constructed,extending the older MPTP Challenge benchmark. The combined effect of the techniques results in a 50% improvement on the benchmark over the Vampire/SInE state-of-the-art system for automated reasoning in large theories.Comment: 26 page

    Generalization Error in Deep Learning

    Get PDF
    Deep learning models have lately shown great performance in various fields such as computer vision, speech recognition, speech translation, and natural language processing. However, alongside their state-of-the-art performance, it is still generally unclear what is the source of their generalization ability. Thus, an important question is what makes deep neural networks able to generalize well from the training set to new data. In this article, we provide an overview of the existing theory and bounds for the characterization of the generalization error of deep neural networks, combining both classical and more recent theoretical and empirical results

    Scalable and Interpretable One-class SVMs with Deep Learning and Random Fourier features

    Full text link
    One-class support vector machine (OC-SVM) for a long time has been one of the most effective anomaly detection methods and extensively adopted in both research as well as industrial applications. The biggest issue for OC-SVM is yet the capability to operate with large and high-dimensional datasets due to optimization complexity. Those problems might be mitigated via dimensionality reduction techniques such as manifold learning or autoencoder. However, previous work often treats representation learning and anomaly prediction separately. In this paper, we propose autoencoder based one-class support vector machine (AE-1SVM) that brings OC-SVM, with the aid of random Fourier features to approximate the radial basis kernel, into deep learning context by combining it with a representation learning architecture and jointly exploit stochastic gradient descent to obtain end-to-end training. Interestingly, this also opens up the possible use of gradient-based attribution methods to explain the decision making for anomaly detection, which has ever been challenging as a result of the implicit mappings between the input space and the kernel space. To the best of our knowledge, this is the first work to study the interpretability of deep learning in anomaly detection. We evaluate our method on a wide range of unsupervised anomaly detection tasks in which our end-to-end training architecture achieves a performance significantly better than the previous work using separate training.Comment: Accepted at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD) 201

    Optimization in High Dimensions via Accelerated, Parallel, and Proximal Coordinate Descent

    Get PDF
    International audience<p>We propose a new randomized coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel and PROXimal; this is the first time such a method is proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate 2ωˉLˉR2/(k+1)22\bar{\omega}\bar{L} R^2/(k+1)^2 , where kk is the iteration counter, ωˉ\bar{\omega} is a data-weighted \emph{average} degree of separability of the loss function, Lˉ\bar{L} is the \emph{average} of Lipschitz constants associated with the coordinates and individual functions in the sum, and RR is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of accelerated coordinate descent, rendering it impractical. The fact that the method depends on the average degree of separability, and not on the maximum degree, can be attributed to the use of new safe large stepsizes, leading to improved expected separable overapproximation (ESO). These are of independent interest and can be utilized in all existing parallel randomized coordinate descent algorithms based on the concept of ESO. In special cases, our method recovers several classical and recent algorithms such as simple and accelerated proximal gradient descent, as well as serial, parallel and distributed versions of randomized block coordinate descent. \new{Due of this flexibility, APPROX had been used successfully by the authors in a graduate class setting as a modern introduction to deterministic and randomized proximal gradient methods. Our bounds match or improve on the best known bounds for each of the methods APPROX specializes to. Our method has applications in a number of areas, including machine learning, submodular optimization, linear and semidefinite programming.</p

    Ultrafast Deflection of Spatial Solitons in AlGaAs Slab Waveguides

    Full text link
    We demonstrate ultrafast all-optical deflection of spatial solitons in an AlGaAs slab waveguide using 190 fs, 1550 nm pulses which are used to generate and deflect the spatial soliton. The steering beam is focused onto the top of the waveguide near the soliton pathway and the soliton is steered due to refractive index changes induced by optical Kerr, or free carrier (Drude) effects. Angular deflections up to 8 mR are observed.Comment: accept. for publ. in Optics Letter
    corecore