Search CORE

313 research outputs found

The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects

Author: Ma Jinwen
Wu Jingfeng
Wu Lei
Yu Bing
Zhu Zhanxing
Publication venue
Publication date: 01/06/2019
Field of study

Understanding the behavior of stochastic gradient descent (SGD) in the context of deep neural networks has raised lots of concerns recently. Along this line, we study a general form of gradient based optimization dynamics with unbiased noise, which unifies SGD and standard Langevin dynamics. Through investigating this general optimization dynamics, we analyze the behavior of SGD on escaping from minima and its regularization effects. A novel indicator is derived to characterize the efficiency of escaping from minima through measuring the alignment of noise covariance and the curvature of loss function. Based on this indicator, two conditions are established to show which type of noise structure is superior to isotropic noise in term of escaping efficiency. We further show that the anisotropic noise in SGD satisfies the two conditions, and thus helps to escape from sharp and poor minima effectively, towards more stable and flat minima that typically generalize well. We systematically design various experiments to verify the benefits of the anisotropic noise, compared with full gradient descent plus isotropic diffusion (i.e. Langevin dynamics).Comment: ICML 2019 camera read

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

Water use efficiency of China\u27s terrestrial ecosystems and responses to drought

Author: Ju Weimin
Liu Yibo
Wang Shaoqiang
Wu Xiaocui
Xiao Jingfeng
Zhou Yanlian
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 08/09/2015
Field of study

Water use efficiency (WUE) measures the trade-off between carbon gain and water loss of terrestrial ecosystems, and better understanding its dynamics and controlling factors is essential for predicting ecosystem responses to climate change. We assessed the magnitude, spatial patterns, and trends of WUE of China’s terrestrial ecosystems and its responses to drought using a process-based ecosystem model. During the period from 2000 to 2011, the national average annual WUE (net primary productivity (NPP)/evapotranspiration (ET)) of China was 0.79 g C kg−1 H2O. Annual WUE decreased in the southern regions because of the decrease in NPP and the increase in ET and increased in most northern regions mainly because of the increase in NPP. Droughts usually increased annual WUE in Northeast China and central Inner Mongolia but decreased annual WUE in central China. “Turning-points” were observed for southern China where moderate and extreme droughts reduced annual WUE and severe drought slightly increased annual WUE. The cumulative lagged effect of drought on monthly WUE varied by region. Our findings have implications for ecosystem management and climate policy making. WUE is expected to continue to change under future climate change particularly as drought is projected to increase in both frequency and severity

PubMed Central

UNH Scholars' Repository

In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization

Author: Bartlett Peter L.
Wu Jingfeng
Zhang Ruiqi
Publication venue
Publication date: 22/02/2024
Field of study

We study the \emph{in-context learning} (ICL) ability of a \emph{Linear Transformer Block} (LTB) that combines a linear attention component and a linear multi-layer perceptron (MLP) component. For ICL of linear regression with a Gaussian prior and a \emph{non-zero mean}, we show that LTB can achieve nearly Bayes optimal ICL risk. In contrast, using only linear attention must incur an irreducible additive approximation error. Furthermore, we establish a correspondence between LTB and one-step gradient descent estimators with learnable initialization (

\mathsf{GD}\text{-}\mathbf{\beta}

), in the sense that every

\mathsf{GD}\text{-}\mathbf{\beta}

estimator can be implemented by an LTB estimator and every optimal LTB estimator that minimizes the in-class ICL risk is effectively a

\mathsf{GD}\text{-}\mathbf{\beta}

estimator. Finally, we show that

\mathsf{GD}\text{-}\mathbf{\beta}

estimators can be efficiently optimized with gradient flow, despite a non-convex training objective. Our results reveal that LTB achieves ICL by implementing

\mathsf{GD}\text{-}\mathbf{\beta}

, and they highlight the role of MLP layers in reducing approximation error.Comment: 39 page

arXiv.org e-Print Archive

Private Federated Frequency Estimation: Adapting to the Hardness of the Instance

Author: Braverman Vladimir
Kairouz Peter
Wu Jingfeng
Zhu Wennan
Publication venue
Publication date: 02/12/2023
Field of study

In federated frequency estimation (FFE), multiple clients work together to estimate the frequencies of their collective data by communicating with a server that respects the privacy constraints of Secure Summation (SecSum), a cryptographic multi-party computation protocol that ensures that the server can only access the sum of client-held vectors. For single-round FFE, it is known that count sketching is nearly information-theoretically optimal for achieving the fundamental accuracy-communication trade-offs [Chen et al., 2022]. However, we show that under the more practical multi-round FEE setting, simple adaptations of count sketching are strictly sub-optimal, and we propose a novel hybrid sketching algorithm that is provably more accurate. We also address the following fundamental question: how should a practitioner set the sketch size in a way that adapts to the hardness of the underlying problem? We propose a two-phase approach that allows for the use of a smaller sketch size for simpler problems (e.g., near-sparse or light-tailed distributions). We conclude our work by showing how differential privacy can be added to our algorithm and verifying its superior performance through extensive experiments conducted on large-scale datasets.Comment: NeurIPS 2023 camera ready versio

arXiv.org e-Print Archive

Experimental study of PLLA/INH slow release implant fabricated by three dimensional printing technique and drug release characteristics in vitro

Author: Gui Wu
Jianbo Zhou
Jingfeng Li
Qixin Zheng
Weigang Wu
Zhilei Hu
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

BACKGROUND: Local slow release implant provided long term and stable drug release in the lesion. The objective of this study was to fabricate biodegradable slow release INH/PLLA tablet via 3 dimensional printing technique (3DP) and to compare the drug release characteristics of three different structured tablets in vitro. METHODS: Three different drug delivery systems (columnar-shaped tablet (CST), doughnut-shaped tablet (DST) and multilayer doughnut-shaped tablet (MDST)) were manufactured by the three dimensional printing machine and isoniazid was loaded into the implant. Dynamic soaking method was used to study the drug release characteristics of the three implants. MTT cytotoxicity test and direct contact test were utilized to study the biocompatibility of the implant. The microstructures of the implants’ surfaces were observed with electron microscope. RESULTS: The PLLA powder in the tablet could be excellently combined through 3DP without disintegration. Electron microscope observations showed that INH distributed evenly on the surface of the tablet in a “nest-shaped” way, while the surface of the barrier layer in the multilayer doughnut shaped tablet was compact and did not contain INH. The concentration of INH in all of the three tablets were still higher than the effective bacteriostasis concentration (Isoniazid: 0.025 ~ 0.05 μg/ml) after 30 day’s release in vitro. All of the tablets showed initial burst release of the INH in the early period. Drug concentration of MDST became stable and had little fluctuation starting from the 6th day of the release. Drug concentration of DST and CST decreased gradually and the rate of decrease in concentration was faster in DST than CST. MTT cytotoxicity test and direct contact test indicated that the INH-PLLA tablet had low cytotoxicity and favorable biocompatibility. CONCLUSIONS: Three dimensional printing technique was a reliable technique to fabricate complicated implants. Drug release pattern in MDST was the most stable among the three implants. It was an ideal drug delivery system for the antibiotics. Biocompatibility tests demonstrated that the INH-PLLA implant did not have cytotoxicity. The multilayer donut-shaped tablet provided a new constant slow release method after an initial burst for the topical application of the antibiotic

Springer - Publisher Connector

PubMed Central

Shipboard flow injection analyis (FIA) of dissolved Al, Fe, and Mn from R/V Knorr cruise KN204-01 (GA03) in the Subtropical northern Atlantic Ocean in 2011 (U.S. GEOTRACES NAT project)

Author: Measures Christopher I.
Wu Jingfeng
Publication venue: Biological and Chemical Oceanography Data Management Office (BCO-DMO). Contact: [email protected]
Publication date: 19/01/2021
Field of study

Dataset: GT11 - FIA-AlFeMnShipboard flow injection analyis (FIA) of dissolved Al, Fe, and Mn from R/V Knorr cruise KN204-01 (GA03) in the Subtropical northern Atlantic Ocean in 2011 as part of the U.S. GEOTRACES North Atlantic project. For a complete list of measurements, refer to the full dataset description in the supplemental file 'Dataset_description.pdf'. The most current version of this dataset is available at: https://www.bco-dmo.org/dataset/3822NSF Division of Ocean Sciences (NSF OCE) OCE-0928741, NSF Division of Ocean Sciences (NSF OCE) OCE-113781

Woods Hole Open Access Server

The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

Author: Braverman Vladimir
Gu Quanquan
Kakade Sham M.
Wu Jingfeng
Zou Difan
Publication venue
Publication date: 03/08/2022
Field of study

We study linear regression under covariate shift, where the marginal distribution over the input covariates differs in the source and the target domains, while the conditional distribution of the output given the input covariates is similar across the two domains. We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data (both conducted by online SGD) for this problem. We establish sharp instance-dependent excess risk upper and lower bounds for this approach. Our bounds suggest that for a large class of linear regression instances, transfer learning with

O(N^2)

source data (and scarce or no target data) is as effective as supervised learning with

N

target data. In addition, we show that finetuning, even with only a small amount of target data, could drastically reduce the amount of source data required by pretraining. Our theory sheds light on the effectiveness and limitation of pretraining as well as the benefits of finetuning for tackling covariate shift problems.Comment: 32 pages, 1 figure, 1 tabl

arXiv.org e-Print Archive