7 research outputs found
Federated Learning Under Restricted User Availability
Federated Learning (FL) is a decentralized machine learning framework that
enables collaborative model training while respecting data privacy. In various
applications, non-uniform availability or participation of users is unavoidable
due to an adverse or stochastic environment, the latter often being
uncontrollable during learning. Here, we posit a generic user selection
mechanism implementing a possibly randomized, stationary selection policy,
suggestively termed as a Random Access Model (RAM). We propose a new
formulation of the FL problem which effectively captures and mitigates limited
participation of data originating from infrequent, or restricted users, at the
presence of a RAM. By employing the Conditional Value-at-Risk (CVaR) over the
(unknown) RAM distribution, we extend the expected loss FL objective to a
risk-aware objective, enabling the design of an efficient training algorithm
that is completely oblivious to the RAM, and with essentially identical
complexity as FedAvg. Our experiments on synthetic and benchmark datasets show
that the proposed approach achieves significantly improved performance as
compared with standard FL, under a variety of setups.Comment: 5 pages, 4 figure
Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning
Methods for carefully selecting or generating a small set of training data to
learn from, i.e., data pruning, coreset selection, and data distillation, have
been shown to be effective in reducing the ever-increasing cost of training
neural networks. Behind this success are rigorously designed strategies for
identifying informative training examples out of large datasets. However, these
strategies come with additional computational costs associated with subset
selection or data distillation before training begins, and furthermore, many
are shown to even under-perform random sampling in high data compression
regimes. As such, many data pruning, coreset selection, or distillation methods
may not reduce 'time-to-accuracy', which has become a critical efficiency
measure of training deep neural networks over large datasets. In this work, we
revisit a powerful yet overlooked random sampling strategy to address these
challenges and introduce an approach called Repeated Sampling of Random Subsets
(RSRS or RS2), where we randomly sample the subset of training data for each
epoch of model training. We test RS2 against thirty state-of-the-art data
pruning and data distillation methods across four datasets including ImageNet.
Our results demonstrate that RS2 significantly reduces time-to-accuracy
compared to existing techniques. For example, when training on ImageNet in the
high-compression regime (using less than 10% of the dataset each epoch), RS2
yields accuracy improvements up to 29% compared to competing pruning methods
while offering a runtime reduction of 7x. Beyond the above meta-study, we
provide a convergence analysis for RS2 and discuss its generalization
capability. The primary goal of our work is to establish RS2 as a competitive
baseline for future data selection or distillation techniques aimed at
efficient training
Select without Fear: Almost All Mini-Batch Schedules Generalize Optimally
We establish matching upper and lower generalization error bounds for
mini-batch Gradient Descent (GD) training with either deterministic or
stochastic, data-independent, but otherwise arbitrary batch selection rules. We
consider smooth Lipschitz-convex/nonconvex/strongly-convex loss functions, and
show that classical upper bounds for Stochastic GD (SGD) also hold verbatim for
such arbitrary nonadaptive batch schedules, including all deterministic ones.
Further, for convex and strongly-convex losses we prove matching lower bounds
directly on the generalization error uniform over the aforementioned class of
batch schedules, showing that all such batch schedules generalize optimally.
Lastly, for smooth (non-Lipschitz) nonconvex losses, we show that full-batch
(deterministic) GD is essentially optimal, among all possible batch schedules
within the considered class, including all stochastic ones.Comment: 37 pages, 2 table
Enteric Release Essential Oil Prepared by Co-Spray Drying Methacrylate/Polysaccharides—Influence of Starch Type
Oregano essential oil (EO) enteric release powder was formulated by spray drying feed emulsions stabilized with polysaccharides (PSC) and Eudragit® L100 (PLM). Different modified starches were used in the PSC component. Spray-dried powders were evaluated for particle size and morphology, dynamic packing, flowability, chemical interactions, reconstitution, and gastric protection. Feed emulsions were stable, indicating the good emulsification ability of the PLM/PSC combination. The presence of polymer in the encapsulating wall neutralized electrostatic charges indicating physical attraction, and FTIR spectra showed peaks of both PLM and PSC without significant shifting. Furthermore, the presence of polymer influenced spray drying, resulting in the elimination of surface cavities and the improvement of powder packing and flowability, which was best when the surface-active, low-viscosity sodium octenyl succinate starch was used (angle of repose 42°). When a PLM/PSC ratio of 80/20 was used in the encapsulating wall, the spray-dried product showed negligible re-emulsification and less than 15% release in pH 1.2 medium for 2 h, confirming gastric protection, whereas at pH 6.8, it provided complete re-emulsification and release. In conclusion, (1) polymer–PSC physical interaction promoted the formation of a smoother particle surface and product with improved technological properties, which is important for further processing, and (2) the gastro protective function of Eudragit® L100 was not impaired due to the absence of significant chemical interactions
Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD
We provide sharp path-dependent generalization and excess risk guarantees for
the full-batch Gradient Descent (GD) algorithm on smooth losses (possibly
non-Lipschitz, possibly nonconvex), under an interpolation regime. At the heart
of our analysis is a new generalization error bound for deterministic symmetric
algorithms, which implies that average output stability and a bounded expected
optimization error at termination lead to generalization. This result shows
that small generalization error occurs along the optimization path, and allows
us to bypass Lipschitz or sub-Gaussian assumptions on the loss prevalent in
previous works. For nonconvex, Polyak-Lojasiewicz (PL), convex and strongly
convex losses, we show the explicit dependence of the generalization error in
terms of the accumulated path-dependent optimization error, terminal
optimization error, number of samples, and number of iterations. For nonconvex
smooth losses, we prove that full-batch GD efficiently generalizes close to any
stationary point at termination, under the proper choice of a decreasing step
size. Further, if the loss is nonconvex but the objective is PL, we derive
quadratically vanishing bounds on the generalization error and the
corresponding excess risk, for a choice of a large constant step size. For
(resp. strongly-) convex smooth losses, we prove that full-batch GD also
generalizes for large constant step sizes, and achieves (resp. quadratically)
small excess risk while training fast. In all cases, we close the
generalization error gap, by showing matching generalization and optimization
error rates. Our full-batch GD generalization error and excess risk bounds are
strictly tighter than existing bounds for (stochastic) GD, when the loss is
smooth (but possibly non-Lipschitz).Comment: 33 page