313 research outputs found
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects
Understanding the behavior of stochastic gradient descent (SGD) in the
context of deep neural networks has raised lots of concerns recently. Along
this line, we study a general form of gradient based optimization dynamics with
unbiased noise, which unifies SGD and standard Langevin dynamics. Through
investigating this general optimization dynamics, we analyze the behavior of
SGD on escaping from minima and its regularization effects. A novel indicator
is derived to characterize the efficiency of escaping from minima through
measuring the alignment of noise covariance and the curvature of loss function.
Based on this indicator, two conditions are established to show which type of
noise structure is superior to isotropic noise in term of escaping efficiency.
We further show that the anisotropic noise in SGD satisfies the two conditions,
and thus helps to escape from sharp and poor minima effectively, towards more
stable and flat minima that typically generalize well. We systematically design
various experiments to verify the benefits of the anisotropic noise, compared
with full gradient descent plus isotropic diffusion (i.e. Langevin dynamics).Comment: ICML 2019 camera read
Water use efficiency of China\u27s terrestrial ecosystems and responses to drought
Water use efficiency (WUE) measures the trade-off between carbon gain and water loss of terrestrial ecosystems, and better understanding its dynamics and controlling factors is essential for predicting ecosystem responses to climate change. We assessed the magnitude, spatial patterns, and trends of WUE of China’s terrestrial ecosystems and its responses to drought using a process-based ecosystem model. During the period from 2000 to 2011, the national average annual WUE (net primary productivity (NPP)/evapotranspiration (ET)) of China was 0.79 g C kg−1 H2O. Annual WUE decreased in the southern regions because of the decrease in NPP and the increase in ET and increased in most northern regions mainly because of the increase in NPP. Droughts usually increased annual WUE in Northeast China and central Inner Mongolia but decreased annual WUE in central China. “Turning-points” were observed for southern China where moderate and extreme droughts reduced annual WUE and severe drought slightly increased annual WUE. The cumulative lagged effect of drought on monthly WUE varied by region. Our findings have implications for ecosystem management and climate policy making. WUE is expected to continue to change under future climate change particularly as drought is projected to increase in both frequency and severity
In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization
We study the \emph{in-context learning} (ICL) ability of a \emph{Linear
Transformer Block} (LTB) that combines a linear attention component and a
linear multi-layer perceptron (MLP) component. For ICL of linear regression
with a Gaussian prior and a \emph{non-zero mean}, we show that LTB can achieve
nearly Bayes optimal ICL risk. In contrast, using only linear attention must
incur an irreducible additive approximation error. Furthermore, we establish a
correspondence between LTB and one-step gradient descent estimators with
learnable initialization (), in the sense
that every estimator can be implemented by
an LTB estimator and every optimal LTB estimator that minimizes the in-class
ICL risk is effectively a estimator.
Finally, we show that estimators can be
efficiently optimized with gradient flow, despite a non-convex training
objective. Our results reveal that LTB achieves ICL by implementing
, and they highlight the role of MLP layers
in reducing approximation error.Comment: 39 page
Private Federated Frequency Estimation: Adapting to the Hardness of the Instance
In federated frequency estimation (FFE), multiple clients work together to
estimate the frequencies of their collective data by communicating with a
server that respects the privacy constraints of Secure Summation (SecSum), a
cryptographic multi-party computation protocol that ensures that the server can
only access the sum of client-held vectors. For single-round FFE, it is known
that count sketching is nearly information-theoretically optimal for achieving
the fundamental accuracy-communication trade-offs [Chen et al., 2022]. However,
we show that under the more practical multi-round FEE setting, simple
adaptations of count sketching are strictly sub-optimal, and we propose a novel
hybrid sketching algorithm that is provably more accurate. We also address the
following fundamental question: how should a practitioner set the sketch size
in a way that adapts to the hardness of the underlying problem? We propose a
two-phase approach that allows for the use of a smaller sketch size for simpler
problems (e.g., near-sparse or light-tailed distributions). We conclude our
work by showing how differential privacy can be added to our algorithm and
verifying its superior performance through extensive experiments conducted on
large-scale datasets.Comment: NeurIPS 2023 camera ready versio
Experimental study of PLLA/INH slow release implant fabricated by three dimensional printing technique and drug release characteristics in vitro
BACKGROUND: Local slow release implant provided long term and stable drug release in the lesion. The objective of this study was to fabricate biodegradable slow release INH/PLLA tablet via 3 dimensional printing technique (3DP) and to compare the drug release characteristics of three different structured tablets in vitro. METHODS: Three different drug delivery systems (columnar-shaped tablet (CST), doughnut-shaped tablet (DST) and multilayer doughnut-shaped tablet (MDST)) were manufactured by the three dimensional printing machine and isoniazid was loaded into the implant. Dynamic soaking method was used to study the drug release characteristics of the three implants. MTT cytotoxicity test and direct contact test were utilized to study the biocompatibility of the implant. The microstructures of the implants’ surfaces were observed with electron microscope. RESULTS: The PLLA powder in the tablet could be excellently combined through 3DP without disintegration. Electron microscope observations showed that INH distributed evenly on the surface of the tablet in a “nest-shaped” way, while the surface of the barrier layer in the multilayer doughnut shaped tablet was compact and did not contain INH. The concentration of INH in all of the three tablets were still higher than the effective bacteriostasis concentration (Isoniazid: 0.025 ~ 0.05 μg/ml) after 30 day’s release in vitro. All of the tablets showed initial burst release of the INH in the early period. Drug concentration of MDST became stable and had little fluctuation starting from the 6th day of the release. Drug concentration of DST and CST decreased gradually and the rate of decrease in concentration was faster in DST than CST. MTT cytotoxicity test and direct contact test indicated that the INH-PLLA tablet had low cytotoxicity and favorable biocompatibility. CONCLUSIONS: Three dimensional printing technique was a reliable technique to fabricate complicated implants. Drug release pattern in MDST was the most stable among the three implants. It was an ideal drug delivery system for the antibiotics. Biocompatibility tests demonstrated that the INH-PLLA implant did not have cytotoxicity. The multilayer donut-shaped tablet provided a new constant slow release method after an initial burst for the topical application of the antibiotic
Shipboard flow injection analyis (FIA) of dissolved Al, Fe, and Mn from R/V Knorr cruise KN204-01 (GA03) in the Subtropical northern Atlantic Ocean in 2011 (U.S. GEOTRACES NAT project)
Dataset: GT11 - FIA-AlFeMnShipboard flow injection analyis (FIA) of dissolved Al, Fe, and Mn from R/V Knorr cruise KN204-01 (GA03) in the Subtropical northern Atlantic Ocean in 2011 as part of the U.S. GEOTRACES North Atlantic project.
For a complete list of measurements, refer to the full dataset description in the supplemental file 'Dataset_description.pdf'. The most current version of this dataset is available at: https://www.bco-dmo.org/dataset/3822NSF Division of Ocean Sciences (NSF OCE) OCE-0928741, NSF Division of Ocean Sciences (NSF OCE) OCE-113781
The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift
We study linear regression under covariate shift, where the marginal
distribution over the input covariates differs in the source and the target
domains, while the conditional distribution of the output given the input
covariates is similar across the two domains. We investigate a transfer
learning approach with pretraining on the source data and finetuning based on
the target data (both conducted by online SGD) for this problem. We establish
sharp instance-dependent excess risk upper and lower bounds for this approach.
Our bounds suggest that for a large class of linear regression instances,
transfer learning with source data (and scarce or no target data) is
as effective as supervised learning with target data. In addition, we show
that finetuning, even with only a small amount of target data, could
drastically reduce the amount of source data required by pretraining. Our
theory sheds light on the effectiveness and limitation of pretraining as well
as the benefits of finetuning for tackling covariate shift problems.Comment: 32 pages, 1 figure, 1 tabl
- …