1,706 research outputs found

    Private Incremental Regression

    Full text link
    Data is continuously generated by modern data sources, and a recent challenge in machine learning has been to develop techniques that perform well in an incremental (streaming) setting. In this paper, we investigate the problem of private machine learning, where as common in practice, the data is not given at once, but rather arrives incrementally over time. We introduce the problems of private incremental ERM and private incremental regression where the general goal is to always maintain a good empirical risk minimizer for the history observed under differential privacy. Our first contribution is a generic transformation of private batch ERM mechanisms into private incremental ERM mechanisms, based on a simple idea of invoking the private batch ERM procedure at some regular time intervals. We take this construction as a baseline for comparison. We then provide two mechanisms for the private incremental regression problem. Our first mechanism is based on privately constructing a noisy incremental gradient function, which is then used in a modified projected gradient procedure at every timestep. This mechanism has an excess empirical risk of d\approx\sqrt{d}, where dd is the dimensionality of the data. While from the results of [Bassily et al. 2014] this bound is tight in the worst-case, we show that certain geometric properties of the input and constraint set can be used to derive significantly better results for certain interesting regression problems.Comment: To appear in PODS 201

    Optimization in High Dimensions via Accelerated, Parallel, and Proximal Coordinate Descent

    Get PDF
    International audience<p>We propose a new randomized coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel and PROXimal; this is the first time such a method is proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate 2ωˉLˉR2/(k+1)22\bar{\omega}\bar{L} R^2/(k+1)^2 , where kk is the iteration counter, ωˉ\bar{\omega} is a data-weighted \emph{average} degree of separability of the loss function, Lˉ\bar{L} is the \emph{average} of Lipschitz constants associated with the coordinates and individual functions in the sum, and RR is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of accelerated coordinate descent, rendering it impractical. The fact that the method depends on the average degree of separability, and not on the maximum degree, can be attributed to the use of new safe large stepsizes, leading to improved expected separable overapproximation (ESO). These are of independent interest and can be utilized in all existing parallel randomized coordinate descent algorithms based on the concept of ESO. In special cases, our method recovers several classical and recent algorithms such as simple and accelerated proximal gradient descent, as well as serial, parallel and distributed versions of randomized block coordinate descent. \new{Due of this flexibility, APPROX had been used successfully by the authors in a graduate class setting as a modern introduction to deterministic and randomized proximal gradient methods. Our bounds match or improve on the best known bounds for each of the methods APPROX specializes to. Our method has applications in a number of areas, including machine learning, submodular optimization, linear and semidefinite programming.</p

    The time to extinction for an SIS-household-epidemic model

    Full text link
    We analyse a stochastic SIS epidemic amongst a finite population partitioned into households. Since the population is finite, the epidemic will eventually go extinct, i.e., have no more infectives in the population. We study the effects of population size and within household transmission upon the time to extinction. This is done through two approximations. The first approximation is suitable for all levels of within household transmission and is based upon an Ornstein-Uhlenbeck process approximation for the diseases fluctuations about an endemic level relying on a large population. The second approximation is suitable for high levels of within household transmission and approximates the number of infectious households by a simple homogeneously mixing SIS model with the households replaced by individuals. The analysis, supported by a simulation study, shows that the mean time to extinction is minimized by moderate levels of within household transmission

    Measurement of the e+eK+Kπ+πe^+e^- \to K^+K^-\pi^+\pi^- cross section with the CMD-3 detector at the VEPP-2000 collider

    Get PDF
    The process e+eK+Kπ+πe^+e^- \to K^+K^-\pi^+\pi^- has been studied in the center-of-mass energy range from 1500 to 2000\,MeV using a data sample of 23 pb1^{-1} collected with the CMD-3 detector at the VEPP-2000 e+ee^+e^- collider. Using about 24000 selected events, the e+eK+Kπ+πe^+e^- \to K^+K^-\pi^+\pi^- cross section has been measured with a systematic uncertainty decreasing from 11.7\% at 1500-1600\,MeV to 6.1\% above 1800\,MeV. A preliminary study of K+Kπ+πK^+K^-\pi^+\pi^- production dynamics has been performed

    Premise Selection for Mathematics by Corpus Analysis and Kernel Methods

    Get PDF
    Smart premise selection is essential when using automated reasoning as a tool for large-theory formal proof development. A good method for premise selection in complex mathematical libraries is the application of machine learning to large corpora of proofs. This work develops learning-based premise selection in two ways. First, a newly available minimal dependency analysis of existing high-level formal mathematical proofs is used to build a large knowledge base of proof dependencies, providing precise data for ATP-based re-verification and for training premise selection algorithms. Second, a new machine learning algorithm for premise selection based on kernel methods is proposed and implemented. To evaluate the impact of both techniques, a benchmark consisting of 2078 large-theory mathematical problems is constructed,extending the older MPTP Challenge benchmark. The combined effect of the techniques results in a 50% improvement on the benchmark over the Vampire/SInE state-of-the-art system for automated reasoning in large theories.Comment: 26 page
    corecore