1,706 research outputs found
Private Incremental Regression
Data is continuously generated by modern data sources, and a recent challenge
in machine learning has been to develop techniques that perform well in an
incremental (streaming) setting. In this paper, we investigate the problem of
private machine learning, where as common in practice, the data is not given at
once, but rather arrives incrementally over time.
We introduce the problems of private incremental ERM and private incremental
regression where the general goal is to always maintain a good empirical risk
minimizer for the history observed under differential privacy. Our first
contribution is a generic transformation of private batch ERM mechanisms into
private incremental ERM mechanisms, based on a simple idea of invoking the
private batch ERM procedure at some regular time intervals. We take this
construction as a baseline for comparison. We then provide two mechanisms for
the private incremental regression problem. Our first mechanism is based on
privately constructing a noisy incremental gradient function, which is then
used in a modified projected gradient procedure at every timestep. This
mechanism has an excess empirical risk of , where is the
dimensionality of the data. While from the results of [Bassily et al. 2014]
this bound is tight in the worst-case, we show that certain geometric
properties of the input and constraint set can be used to derive significantly
better results for certain interesting regression problems.Comment: To appear in PODS 201
Optimization in High Dimensions via Accelerated, Parallel, and Proximal Coordinate Descent
International audience<p>We propose a new randomized coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel and PROXimal; this is the first time such a method is proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate , where is the iteration counter, is a data-weighted \emph{average} degree of separability of the loss function, is the \emph{average} of Lipschitz constants associated with the coordinates and individual functions in the sum, and is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of accelerated coordinate descent, rendering it impractical. The fact that the method depends on the average degree of separability, and not on the maximum degree, can be attributed to the use of new safe large stepsizes, leading to improved expected separable overapproximation (ESO). These are of independent interest and can be utilized in all existing parallel randomized coordinate descent algorithms based on the concept of ESO. In special cases, our method recovers several classical and recent algorithms such as simple and accelerated proximal gradient descent, as well as serial, parallel and distributed versions of randomized block coordinate descent. \new{Due of this flexibility, APPROX had been used successfully by the authors in a graduate class setting as a modern introduction to deterministic and randomized proximal gradient methods. Our bounds match or improve on the best known bounds for each of the methods APPROX specializes to. Our method has applications in a number of areas, including machine learning, submodular optimization, linear and semidefinite programming.</p
The time to extinction for an SIS-household-epidemic model
We analyse a stochastic SIS epidemic amongst a finite population partitioned
into households. Since the population is finite, the epidemic will eventually
go extinct, i.e., have no more infectives in the population. We study the
effects of population size and within household transmission upon the time to
extinction. This is done through two approximations. The first approximation is
suitable for all levels of within household transmission and is based upon an
Ornstein-Uhlenbeck process approximation for the diseases fluctuations about an
endemic level relying on a large population. The second approximation is
suitable for high levels of within household transmission and approximates the
number of infectious households by a simple homogeneously mixing SIS model with
the households replaced by individuals. The analysis, supported by a simulation
study, shows that the mean time to extinction is minimized by moderate levels
of within household transmission
Measurement of the cross section with the CMD-3 detector at the VEPP-2000 collider
The process has been studied in the
center-of-mass energy range from 1500 to 2000\,MeV using a data sample of 23
pb collected with the CMD-3 detector at the VEPP-2000 collider.
Using about 24000 selected events, the cross
section has been measured with a systematic uncertainty decreasing from 11.7\%
at 1500-1600\,MeV to 6.1\% above 1800\,MeV. A preliminary study of
production dynamics has been performed
Premise Selection for Mathematics by Corpus Analysis and Kernel Methods
Smart premise selection is essential when using automated reasoning as a tool
for large-theory formal proof development. A good method for premise selection
in complex mathematical libraries is the application of machine learning to
large corpora of proofs. This work develops learning-based premise selection in
two ways. First, a newly available minimal dependency analysis of existing
high-level formal mathematical proofs is used to build a large knowledge base
of proof dependencies, providing precise data for ATP-based re-verification and
for training premise selection algorithms. Second, a new machine learning
algorithm for premise selection based on kernel methods is proposed and
implemented. To evaluate the impact of both techniques, a benchmark consisting
of 2078 large-theory mathematical problems is constructed,extending the older
MPTP Challenge benchmark. The combined effect of the techniques results in a
50% improvement on the benchmark over the Vampire/SInE state-of-the-art system
for automated reasoning in large theories.Comment: 26 page
- …