Search CORE

44 research outputs found

Utility-Probability Duality of Neural Networks

Author: Bojun Huang
Yuan Fei
Publication venue
Publication date: 25/05/2023
Field of study

It is typically understood that the training of modern neural networks is a process of fitting the probability distribution of desired output. However, recent paradoxical observations in a number of language generation tasks let one wonder if this canonical probability-based explanation can really account for the empirical success of deep learning. To resolve this issue, we propose an alternative utility-based explanation to the standard supervised learning procedure in deep learning. The basic idea is to interpret the learned neural network not as a probability model but as an ordinal utility function that encodes the preference revealed in training data. In this perspective, training of the neural network corresponds to a utility learning process. Specifically, we show that for all neural networks with softmax outputs, the SGD learning dynamic of maximum likelihood estimation (MLE) can be seen as an iteration process that optimizes the neural network toward an optimal utility function. This utility-based interpretation can explain several otherwise-paradoxical observations about the neural networks thus trained. Moreover, our utility-based theory also entails an equation that can transform the learned utility values back to a new kind of probability estimation with which probability-compatible decision rules enjoy dramatic (double-digits) performance improvements. These evidences collectively reveal a phenomenon of utility-probability duality in terms of what modern neural networks are (truly) modeling: We thought they are one thing (probabilities), until the unexplainable showed up; changing mindset and treating them as another thing (utility values) largely reconcile the theory, despite remaining subtleties regarding its original (probabilistic) identity

arXiv.org e-Print Archive

Simpson's Bias in NLP Training

Author: Bojun Huang
Liang Yaobo
Yuan Fei
Zhang Longtu
Publication venue
Publication date: 13/03/2021
Field of study

In most machine learning tasks, we evaluate a model

M

on a given data population

S

by measuring a population-level metric

F(S;M)

. Examples of such evaluation metric

F

include precision/recall for (binary) recognition, the F1 score for multi-class classification, and the BLEU metric for language generation. On the other hand, the model

M

is trained by optimizing a sample-level loss

G(S_t;M)

at each learning step

t

, where

S_t

is a subset of

S

(a.k.a. the mini-batch). Popular choices of

G

include cross-entropy loss, the Dice loss, and sentence-level BLEU scores. A fundamental assumption behind this paradigm is that the mean value of the sample-level loss

G

, if averaged over all possible samples, should effectively represent the population-level metric

F

of the task, such as, that

\mathbb{E}[ G(S_t;M) ] \approx F(S;M)

. In this paper, we systematically investigate the above assumption in several NLP tasks. We show, both theoretically and experimentally, that some popular designs of the sample-level loss

G

may be inconsistent with the true population-level metric

F

of the task, so that models trained to optimize the former can be substantially sub-optimal to the latter, a phenomenon we call it, Simpson's bias, due to its deep connections with the classic paradox known as Simpson's reversal paradox in statistics and social sciences.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Start Learning Chinese Words Fast: An Introduction

Author: Huang Bojun
Huang Dahui
Perret Gilles
Wu Baixin
Yan Haifeng
Zhu Meizhen
Publication venue: 'Academy Publication'
Publication date: 01/09/2017
Field of study

In order to cater to the needs of Chinese language lovers, 28 basic strokes of Chinese words are firstly introduced. It is pointed out that the difficulty for foreigners to learn Chinese words is their grotesque shapes written by brush (soft) pen and printed in books. The special writing method with a hard pen and 8 directions moving steps are invented and firstly shown, which is easy for foreigners to try. The size (length) of strokes will guide them to control the proportion of a word. It could be changed according to paper size and how large they want to write. Secondly, 48 common fragments derived from 28 basic strokes are listed and the writing method described. It could help foreigners to separate and re-write unknown Chinese words and even guess out the meanings. Lastly, many characteristics or regularities of Chinese words will have great attraction for foreign language learners. Some Chinese cultures or amusing stories are also exposed in fragments and example words

Searching for the nano-Hertz stochastic gravitational wave background with the Chinese Pulsar Timing Array Data Release I

Author: Caballero R. Nicolas
Chen Siyuan
Guan Xin
Guo Yanjun
Han Jinlin
Hao Longfei
Huang Menglin
Jiang Jinchen
Jiang Peng
Lee Kejia
Luo Jingtao
Manchester Richard
Qian Lei
Shen Zhiqiang
Sun Chun
Wang Bojun
Wang Jingbo
Wang Min
Wang Na
Wu Xiangping
Xu Heng
Xu Jiangwei
Xu Renxin
Xu Yonghua
Xue Zihan
Yuan Jianping
Zhu Yan
Publication venue: 'IOP Publishing'
Publication date: 28/06/2023
Field of study

Observing and timing a group of millisecond pulsars (MSPs) with high rotational stability enables the direct detection of gravitational waves (GWs). The GW signals can be identified from the spatial correlations encoded in the times-of-arrival of widely spaced pulsar-pairs. The Chinese Pulsar Timing Array (CPTA) is a collaboration aiming at the direct GW detection with observations carried out using Chinese radio telescopes. This short article serves as a `table of contents' for a forthcoming series of papers related to the CPTA Data Release 1 (CPTA DR1) which uses observations from the Five-hundred-meter Aperture Spherical radio Telescope (FAST). Here, after summarizing the time span and accuracy of CPTA DR1, we report the key results of our statistical inference finding a correlated signal with amplitude \log A_{\rm c}= -14.4 \,^{+1.0}_{-2.8} for spectral index in the range of

\alpha\in [-1.8, 1.5]

assuming a GW background (GWB) induced quadrupolar correlation. The search for the Hellings-Downs (HD) correlation curve is also presented, where some evidence for the HD correlation has been found that a 4.6-

\sigma

statistical significance is achieved using the discrete frequency method around the frequency of 14 nHz. We expect that the future International Pulsar Timing Array data analysis and the next CPTA data release will be more sensitive to the nHz GWB, which could verify the current results.Comment: 18 pages, 6 figures, submitted to "Research in astronomy and astrophysics" 22nd March 202

arXiv.org e-Print Archive

Effects of Low-Level Autonomic Stimulation on Prevention of Atrial Fibrillation Induced by Acute Electrical Remodeling

Author: Aidong Zhang
Bojun Gong
Dandan Huang
Dongdong Zhang
Gang Du
Hairui Li
Hong Li
Huijie Xing
Jia Chen
Jianyi Feng
Li Zhou
Liwei Wang
Min Guan
Ning Bian
Ruijie Liu
Shaorong Wu
Tao Zhang
Xianwu Lan
Xiaoming Chen
Yubi Lin
Zhengyi Huang
Zicheng Li
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

Background. Rapid atrial pacing (RAP) can induce electrical and autonomic remodeling and facilitate atrial fibrillation (AF). Recent reports showed that low-level vagosympathetic nerve stimulation (LLVNS) can suppress AF, as an antiarrhythmic effect. We hypothesized that LLVNS can reverse substrate heterogeneity induced by RAP. Methods and Results. Mongrel dogs were divided into (LLVNS+RAP) and RAP groups. Electrode catheters were sutured to multiple atrial sites, and LLVNS was applied to cervical vagosympathetic trunks with voltage 50% below the threshold slowing sinus rate by ⩽30 msec. RAP induced a significant decrease in effective refractory period (ERP) and increase in the window of vulnerability at all sites, characterized by descending and elevated gradient differences towards the ganglionic plexi (GP) sites, respectively. The ERP dispersion was obviously enlarged by RAP and more significant when the ERP of GP-related sites was considered. Recovery time from AF was also prolonged significantly as a result of RAP. LLVNS could reverse all these changes induced by RAP and recover the heterogeneous substrate to baseline. Conclusions. LLVNS can reverse the electrical and autonomic remodeling and abolish the GP-central gradient differences induced by RAP, and thus it can recover the homogeneous substrate, which may be the underlying mechanism of its antiarrhythmic effect

Crossref

Directory of Open Access Journals

Stomatin Inhibits Pannexin-1-Mediated Whole-Cell Currents by Interacting with Its Carboxyl Terminal

Author: A Montel-Hagen
A Ray
AG Mannsfeldt
B Chen
BA MacVicar
Bojun Chen
C Wetzel
Craig S. Moore
EC Park
F Qiu
FB Chekeni
GE Sosinsky
GW Stewart
GW Stewart
Haiying Zhan
JE Kim
JM Garre
JS Davidson
KJ Livak
Kumiko Ijichi
L Bao
L Lapatsina
M Mairhofer
M Sridharan
MB Goodman
Michael V. L. Bennett
MM Sedensky
MM Sedensky
MP Price
P Pelegrin
PA Weber
PG Gallagher
PG Morgan
R Bruzzone
R Dando
R Iglesias
RD Veenstra
RD Veenstra
RJ Thompson
RJ Thompson
S Brenner
S Bunse
S Iwabuchi
S Locovei
S Locovei
S Rajaram
SO Suadicani
Stephen J. Crocker
Steven Barnes
T Woehrle
TA Starich
TB Huber
VI Shestopalov
W Ma
WR Silverman
Xin Zhou
Xin-Ming Ma
Xue-Jun Li
Y Huang
Y Panchin
Y Qu
Y Wang
Zhao-Wen Wang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The pannexin-1 (Panx1) channel (often referred to as the Panx1 hemichannel) is a large-conductance channel in the plasma membrane of many mammalian cells. While opening of the channel is potentially detrimental to the cell, little is known about how it is regulated under physiological conditions. Here we show that stomatin inhibited Panx1 channel activity. In transfected HEK-293 cells, stomatin reduced Panx1-mediated whole-cell currents without altering either the total or membrane surface Panx1 protein expression. Stomatin coimmunoprecipitated with full-length Panx1 as well as a Panx1 fragment containing the fourth membrane-spanning domain and the cytosolic carboxyl terminal. The inhibitory effect of stomatin on Panx1-mediated whole-cell currents was abolished by truncating Panx1 at a site in the cytosolic carboxyl terminal. In primary culture of mouse astrocytes, inhibition of endogenous stomatin expression by small interfering RNA enhanced Panx1-mediated outward whole-cell currents. These observations suggest that stomatin may play important roles in astrocytes and other cells by interacting with Panx1 carboxyl terminal to limit channel opening

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Pruning Game Tree by Rollouts

Author: Huang Bojun
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 16/02/2015
Field of study

In this paper we show that the alpha-beta algorithm and its successor MT-SSS*, as two classic minimax search algorithms, can be implemented as rollout algorithms, a generic algorithmic paradigm widely used in many domains. Specifically, we define a family of rollout algorithms, in which the rollout policy is restricted to select successor nodes only from a certain subset of the children list. We show that any rollout policy in this family (either deterministic or randomized) is guaranteed to evaluate the game tree correctly with a finite number of rollouts. Moreover, we identify simple rollout policies in this family that ``implement'' alpha-beta and MT-SSS*. Specifically, given any game tree, the rollout algorithms with these particular policies always visit the same set of leaf nodes in the same order with alpha-beta and MT-SSS*, respectively. Our results suggest that traditional pruning techniques and the recent Monte Carlo Tree Search algorithms, as two competing approaches for game tree evaluation, may be unified under the rollout paradigm

Association for the Advancement of Artificial Intelligence: AAAI Publications

Solar Image Restoration with the CycleGAN Based on Multi-fractal Properties of Texture Features

Author: Cai Bojun
Cai Dongmei
Huang Yi
Jia Peng
Publication venue: American Astronomical Society
Publication date: 11/08/2019
Field of study

Texture is one of the most obvious characteristics in solar images and it is normally described by texture features. Because textures from solar images of the same wavelength are similar, we assume that texture features of solar images are multi-fractals. Based on this assumption, we propose a pure data-based image restoration method: with several high-resolution solar images as references, we use the Cycle-Consistent Adversarial Network to restore blurred images of the same steady physical process, in the same wavelength obtained by the same telescope. We test our method with simulated and real observation data and find that our method can improve the spatial resolution of solar images, without loss of any frames. Because our method does not need a paired training set or additional instruments, it can be used as a post-processing method for solar images obtained by either seeing-limited telescopes or telescopes with ground-layer adaptive optic systems

arXiv.org e-Print Archive

Durham Research Online

PSF–NET: A Nonparametric Point-spread Function Model for Ground-based Optical Telescopes

Author: Cai Bojun
Cai Dongmei
Jia Peng
Wu Xuebo
Yi Huang
Publication venue: IOP Publishing
Publication date: 01/03/2020
Field of study

Ground-based optical telescopes are seriously affected by atmospheric turbulence induced aberrations. Understanding properties of these aberrations is important both for instrument design and image restoration method development. Because the point-spread function can reflect performance of the whole optic system, it is appropriate to use the point-spread function to describe atmospheric turbulence induced aberrations. Assuming point-spread functions induced by the atmospheric turbulence with the same profile belong to the same manifold space, we propose a nonparametric point-spread function—PSF–NET. The PSF–NET has a cycle convolutional neural network structure and is a statistical representation of the manifold space of PSFs induced by the atmospheric turbulence with the same profile. Testing the PSF–NET with simulated and real observation data, we find that a well trained PSF–NET can restore any short exposure images blurred by atmospheric turbulence with the same profile. Besides, we further use the impulse response of the PSF–NET, which can be viewed as the statistical mean PSF, to analyze interpretation properties of the PSF–NET. We find that variations of statistical mean PSFs are caused by variations of the atmospheric turbulence profile: as the difference of the atmospheric turbulence profile increases, the difference between statistical mean PSFs also increases. The PSF–NET proposed in this paper provides a new way to analyze atmospheric turbulence induced aberrations, which would benefit the development of new observation methods for ground-based optical telescopes

arXiv.org e-Print Archive

Durham Research Online