313 research outputs found

    The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects

    Full text link
    Understanding the behavior of stochastic gradient descent (SGD) in the context of deep neural networks has raised lots of concerns recently. Along this line, we study a general form of gradient based optimization dynamics with unbiased noise, which unifies SGD and standard Langevin dynamics. Through investigating this general optimization dynamics, we analyze the behavior of SGD on escaping from minima and its regularization effects. A novel indicator is derived to characterize the efficiency of escaping from minima through measuring the alignment of noise covariance and the curvature of loss function. Based on this indicator, two conditions are established to show which type of noise structure is superior to isotropic noise in term of escaping efficiency. We further show that the anisotropic noise in SGD satisfies the two conditions, and thus helps to escape from sharp and poor minima effectively, towards more stable and flat minima that typically generalize well. We systematically design various experiments to verify the benefits of the anisotropic noise, compared with full gradient descent plus isotropic diffusion (i.e. Langevin dynamics).Comment: ICML 2019 camera read

    Water use efficiency of China\u27s terrestrial ecosystems and responses to drought

    Get PDF
    Water use efficiency (WUE) measures the trade-off between carbon gain and water loss of terrestrial ecosystems, and better understanding its dynamics and controlling factors is essential for predicting ecosystem responses to climate change. We assessed the magnitude, spatial patterns, and trends of WUE of China’s terrestrial ecosystems and its responses to drought using a process-based ecosystem model. During the period from 2000 to 2011, the national average annual WUE (net primary productivity (NPP)/evapotranspiration (ET)) of China was 0.79 g C kg−1 H2O. Annual WUE decreased in the southern regions because of the decrease in NPP and the increase in ET and increased in most northern regions mainly because of the increase in NPP. Droughts usually increased annual WUE in Northeast China and central Inner Mongolia but decreased annual WUE in central China. “Turning-points” were observed for southern China where moderate and extreme droughts reduced annual WUE and severe drought slightly increased annual WUE. The cumulative lagged effect of drought on monthly WUE varied by region. Our findings have implications for ecosystem management and climate policy making. WUE is expected to continue to change under future climate change particularly as drought is projected to increase in both frequency and severity

    In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization

    Full text link
    We study the \emph{in-context learning} (ICL) ability of a \emph{Linear Transformer Block} (LTB) that combines a linear attention component and a linear multi-layer perceptron (MLP) component. For ICL of linear regression with a Gaussian prior and a \emph{non-zero mean}, we show that LTB can achieve nearly Bayes optimal ICL risk. In contrast, using only linear attention must incur an irreducible additive approximation error. Furthermore, we establish a correspondence between LTB and one-step gradient descent estimators with learnable initialization (GD-β\mathsf{GD}\text{-}\mathbf{\beta}), in the sense that every GD-β\mathsf{GD}\text{-}\mathbf{\beta} estimator can be implemented by an LTB estimator and every optimal LTB estimator that minimizes the in-class ICL risk is effectively a GD-β\mathsf{GD}\text{-}\mathbf{\beta} estimator. Finally, we show that GD-β\mathsf{GD}\text{-}\mathbf{\beta} estimators can be efficiently optimized with gradient flow, despite a non-convex training objective. Our results reveal that LTB achieves ICL by implementing GD-β\mathsf{GD}\text{-}\mathbf{\beta}, and they highlight the role of MLP layers in reducing approximation error.Comment: 39 page

    Private Federated Frequency Estimation: Adapting to the Hardness of the Instance

    Full text link
    In federated frequency estimation (FFE), multiple clients work together to estimate the frequencies of their collective data by communicating with a server that respects the privacy constraints of Secure Summation (SecSum), a cryptographic multi-party computation protocol that ensures that the server can only access the sum of client-held vectors. For single-round FFE, it is known that count sketching is nearly information-theoretically optimal for achieving the fundamental accuracy-communication trade-offs [Chen et al., 2022]. However, we show that under the more practical multi-round FEE setting, simple adaptations of count sketching are strictly sub-optimal, and we propose a novel hybrid sketching algorithm that is provably more accurate. We also address the following fundamental question: how should a practitioner set the sketch size in a way that adapts to the hardness of the underlying problem? We propose a two-phase approach that allows for the use of a smaller sketch size for simpler problems (e.g., near-sparse or light-tailed distributions). We conclude our work by showing how differential privacy can be added to our algorithm and verifying its superior performance through extensive experiments conducted on large-scale datasets.Comment: NeurIPS 2023 camera ready versio

    Experimental study of PLLA/INH slow release implant fabricated by three dimensional printing technique and drug release characteristics in vitro

    Get PDF
    BACKGROUND: Local slow release implant provided long term and stable drug release in the lesion. The objective of this study was to fabricate biodegradable slow release INH/PLLA tablet via 3 dimensional printing technique (3DP) and to compare the drug release characteristics of three different structured tablets in vitro. METHODS: Three different drug delivery systems (columnar-shaped tablet (CST), doughnut-shaped tablet (DST) and multilayer doughnut-shaped tablet (MDST)) were manufactured by the three dimensional printing machine and isoniazid was loaded into the implant. Dynamic soaking method was used to study the drug release characteristics of the three implants. MTT cytotoxicity test and direct contact test were utilized to study the biocompatibility of the implant. The microstructures of the implants’ surfaces were observed with electron microscope. RESULTS: The PLLA powder in the tablet could be excellently combined through 3DP without disintegration. Electron microscope observations showed that INH distributed evenly on the surface of the tablet in a “nest-shaped” way, while the surface of the barrier layer in the multilayer doughnut shaped tablet was compact and did not contain INH. The concentration of INH in all of the three tablets were still higher than the effective bacteriostasis concentration (Isoniazid: 0.025 ~ 0.05 μg/ml) after 30 day’s release in vitro. All of the tablets showed initial burst release of the INH in the early period. Drug concentration of MDST became stable and had little fluctuation starting from the 6th day of the release. Drug concentration of DST and CST decreased gradually and the rate of decrease in concentration was faster in DST than CST. MTT cytotoxicity test and direct contact test indicated that the INH-PLLA tablet had low cytotoxicity and favorable biocompatibility. CONCLUSIONS: Three dimensional printing technique was a reliable technique to fabricate complicated implants. Drug release pattern in MDST was the most stable among the three implants. It was an ideal drug delivery system for the antibiotics. Biocompatibility tests demonstrated that the INH-PLLA implant did not have cytotoxicity. The multilayer donut-shaped tablet provided a new constant slow release method after an initial burst for the topical application of the antibiotic

    Shipboard flow injection analyis (FIA) of dissolved Al, Fe, and Mn from R/V Knorr cruise KN204-01 (GA03) in the Subtropical northern Atlantic Ocean in 2011 (U.S. GEOTRACES NAT project)

    Get PDF
    Dataset: GT11 - FIA-AlFeMnShipboard flow injection analyis (FIA) of dissolved Al, Fe, and Mn from R/V Knorr cruise KN204-01 (GA03) in the Subtropical northern Atlantic Ocean in 2011 as part of the U.S. GEOTRACES North Atlantic project. For a complete list of measurements, refer to the full dataset description in the supplemental file 'Dataset_description.pdf'. The most current version of this dataset is available at: https://www.bco-dmo.org/dataset/3822NSF Division of Ocean Sciences (NSF OCE) OCE-0928741, NSF Division of Ocean Sciences (NSF OCE) OCE-113781

    The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

    Full text link
    We study linear regression under covariate shift, where the marginal distribution over the input covariates differs in the source and the target domains, while the conditional distribution of the output given the input covariates is similar across the two domains. We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data (both conducted by online SGD) for this problem. We establish sharp instance-dependent excess risk upper and lower bounds for this approach. Our bounds suggest that for a large class of linear regression instances, transfer learning with O(N2)O(N^2) source data (and scarce or no target data) is as effective as supervised learning with NN target data. In addition, we show that finetuning, even with only a small amount of target data, could drastically reduce the amount of source data required by pretraining. Our theory sheds light on the effectiveness and limitation of pretraining as well as the benefits of finetuning for tackling covariate shift problems.Comment: 32 pages, 1 figure, 1 tabl
    corecore