7 research outputs found

    Federated Learning Under Restricted User Availability

    Full text link
    Federated Learning (FL) is a decentralized machine learning framework that enables collaborative model training while respecting data privacy. In various applications, non-uniform availability or participation of users is unavoidable due to an adverse or stochastic environment, the latter often being uncontrollable during learning. Here, we posit a generic user selection mechanism implementing a possibly randomized, stationary selection policy, suggestively termed as a Random Access Model (RAM). We propose a new formulation of the FL problem which effectively captures and mitigates limited participation of data originating from infrequent, or restricted users, at the presence of a RAM. By employing the Conditional Value-at-Risk (CVaR) over the (unknown) RAM distribution, we extend the expected loss FL objective to a risk-aware objective, enabling the design of an efficient training algorithm that is completely oblivious to the RAM, and with essentially identical complexity as FedAvg. Our experiments on synthetic and benchmark datasets show that the proposed approach achieves significantly improved performance as compared with standard FL, under a variety of setups.Comment: 5 pages, 4 figure

    Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning

    Full text link
    Methods for carefully selecting or generating a small set of training data to learn from, i.e., data pruning, coreset selection, and data distillation, have been shown to be effective in reducing the ever-increasing cost of training neural networks. Behind this success are rigorously designed strategies for identifying informative training examples out of large datasets. However, these strategies come with additional computational costs associated with subset selection or data distillation before training begins, and furthermore, many are shown to even under-perform random sampling in high data compression regimes. As such, many data pruning, coreset selection, or distillation methods may not reduce 'time-to-accuracy', which has become a critical efficiency measure of training deep neural networks over large datasets. In this work, we revisit a powerful yet overlooked random sampling strategy to address these challenges and introduce an approach called Repeated Sampling of Random Subsets (RSRS or RS2), where we randomly sample the subset of training data for each epoch of model training. We test RS2 against thirty state-of-the-art data pruning and data distillation methods across four datasets including ImageNet. Our results demonstrate that RS2 significantly reduces time-to-accuracy compared to existing techniques. For example, when training on ImageNet in the high-compression regime (using less than 10% of the dataset each epoch), RS2 yields accuracy improvements up to 29% compared to competing pruning methods while offering a runtime reduction of 7x. Beyond the above meta-study, we provide a convergence analysis for RS2 and discuss its generalization capability. The primary goal of our work is to establish RS2 as a competitive baseline for future data selection or distillation techniques aimed at efficient training

    Select without Fear: Almost All Mini-Batch Schedules Generalize Optimally

    Full text link
    We establish matching upper and lower generalization error bounds for mini-batch Gradient Descent (GD) training with either deterministic or stochastic, data-independent, but otherwise arbitrary batch selection rules. We consider smooth Lipschitz-convex/nonconvex/strongly-convex loss functions, and show that classical upper bounds for Stochastic GD (SGD) also hold verbatim for such arbitrary nonadaptive batch schedules, including all deterministic ones. Further, for convex and strongly-convex losses we prove matching lower bounds directly on the generalization error uniform over the aforementioned class of batch schedules, showing that all such batch schedules generalize optimally. Lastly, for smooth (non-Lipschitz) nonconvex losses, we show that full-batch (deterministic) GD is essentially optimal, among all possible batch schedules within the considered class, including all stochastic ones.Comment: 37 pages, 2 table

    Enteric Release Essential Oil Prepared by Co-Spray Drying Methacrylate/Polysaccharides—Influence of Starch Type

    No full text
    Oregano essential oil (EO) enteric release powder was formulated by spray drying feed emulsions stabilized with polysaccharides (PSC) and Eudragit® L100 (PLM). Different modified starches were used in the PSC component. Spray-dried powders were evaluated for particle size and morphology, dynamic packing, flowability, chemical interactions, reconstitution, and gastric protection. Feed emulsions were stable, indicating the good emulsification ability of the PLM/PSC combination. The presence of polymer in the encapsulating wall neutralized electrostatic charges indicating physical attraction, and FTIR spectra showed peaks of both PLM and PSC without significant shifting. Furthermore, the presence of polymer influenced spray drying, resulting in the elimination of surface cavities and the improvement of powder packing and flowability, which was best when the surface-active, low-viscosity sodium octenyl succinate starch was used (angle of repose 42°). When a PLM/PSC ratio of 80/20 was used in the encapsulating wall, the spray-dried product showed negligible re-emulsification and less than 15% release in pH 1.2 medium for 2 h, confirming gastric protection, whereas at pH 6.8, it provided complete re-emulsification and release. In conclusion, (1) polymer–PSC physical interaction promoted the formation of a smoother particle surface and product with improved technological properties, which is important for further processing, and (2) the gastro protective function of Eudragit® L100 was not impaired due to the absence of significant chemical interactions

    Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD

    Full text link
    We provide sharp path-dependent generalization and excess risk guarantees for the full-batch Gradient Descent (GD) algorithm on smooth losses (possibly non-Lipschitz, possibly nonconvex), under an interpolation regime. At the heart of our analysis is a new generalization error bound for deterministic symmetric algorithms, which implies that average output stability and a bounded expected optimization error at termination lead to generalization. This result shows that small generalization error occurs along the optimization path, and allows us to bypass Lipschitz or sub-Gaussian assumptions on the loss prevalent in previous works. For nonconvex, Polyak-Lojasiewicz (PL), convex and strongly convex losses, we show the explicit dependence of the generalization error in terms of the accumulated path-dependent optimization error, terminal optimization error, number of samples, and number of iterations. For nonconvex smooth losses, we prove that full-batch GD efficiently generalizes close to any stationary point at termination, under the proper choice of a decreasing step size. Further, if the loss is nonconvex but the objective is PL, we derive quadratically vanishing bounds on the generalization error and the corresponding excess risk, for a choice of a large constant step size. For (resp. strongly-) convex smooth losses, we prove that full-batch GD also generalizes for large constant step sizes, and achieves (resp. quadratically) small excess risk while training fast. In all cases, we close the generalization error gap, by showing matching generalization and optimization error rates. Our full-batch GD generalization error and excess risk bounds are strictly tighter than existing bounds for (stochastic) GD, when the loss is smooth (but possibly non-Lipschitz).Comment: 33 page
    corecore