80,549 research outputs found
Change-points Estimation in Statistical Inference and Machine Learning Problems
Statistical inference plays an increasingly important role in science, finance and industry. Despite the extensive research and wide application of statistical inference, most of the efforts focus on uniform models. This thesis considers the statistical inference in models with abrupt changes instead. The task is to estimate change-points where the underlying models change. We first study low dimensional linear regression problems for which the underlying model undergoes multiple changes. Our goal is to estimate the number and locations of change-points that segment available data into different regions, and further produce sparse and interpretable models for each region. To address challenges of the existing approaches and to produce interpretable models, we propose a sparse group Lasso (SGL) based approach for linear regression problems with change-points. Then we extend our method to high dimensional nonhomogeneous linear regression models. Under certain assumptions and using a properly chosen regularization parameter, we show several desirable properties of the method. We further extend our studies to generalized linear models (GLM) and prove similar results. In practice, change-points inference usually involves high dimensional data, hence it is prone to tackle for distributed learning with feature partitioning data, which implies each machine in the cluster stores a part of the features. One bottleneck for distributed learning is communication. For this implementation concern, we design communication efficient algorithm for feature partitioning data sets to speed up not only change-points inference but also other classes of machine learning problem including Lasso, support vector machine (SVM) and logistic regression
A Hybrid Approach to Privacy-Preserving Federated Learning
Federated learning facilitates the collaborative training of models without
the sharing of raw data. However, recent attacks demonstrate that simply
maintaining data locality during training processes does not provide sufficient
privacy guarantees. Rather, we need a federated learning system capable of
preventing inference over both the messages exchanged during training and the
final trained model while ensuring the resulting model also has acceptable
predictive accuracy. Existing federated learning approaches either use secure
multiparty computation (SMC) which is vulnerable to inference or differential
privacy which can lead to low accuracy given a large number of parties with
relatively small amounts of data each. In this paper, we present an alternative
approach that utilizes both differential privacy and SMC to balance these
trade-offs. Combining differential privacy with secure multiparty computation
enables us to reduce the growth of noise injection as the number of parties
increases without sacrificing privacy while maintaining a pre-defined rate of
trust. Our system is therefore a scalable approach that protects against
inference threats and produces models with high accuracy. Additionally, our
system can be used to train a variety of machine learning models, which we
validate with experimental results on 3 different machine learning algorithms.
Our experiments demonstrate that our approach out-performs state of the art
solutions
- …