44 research outputs found

    Utility-Probability Duality of Neural Networks

    Full text link
    It is typically understood that the training of modern neural networks is a process of fitting the probability distribution of desired output. However, recent paradoxical observations in a number of language generation tasks let one wonder if this canonical probability-based explanation can really account for the empirical success of deep learning. To resolve this issue, we propose an alternative utility-based explanation to the standard supervised learning procedure in deep learning. The basic idea is to interpret the learned neural network not as a probability model but as an ordinal utility function that encodes the preference revealed in training data. In this perspective, training of the neural network corresponds to a utility learning process. Specifically, we show that for all neural networks with softmax outputs, the SGD learning dynamic of maximum likelihood estimation (MLE) can be seen as an iteration process that optimizes the neural network toward an optimal utility function. This utility-based interpretation can explain several otherwise-paradoxical observations about the neural networks thus trained. Moreover, our utility-based theory also entails an equation that can transform the learned utility values back to a new kind of probability estimation with which probability-compatible decision rules enjoy dramatic (double-digits) performance improvements. These evidences collectively reveal a phenomenon of utility-probability duality in terms of what modern neural networks are (truly) modeling: We thought they are one thing (probabilities), until the unexplainable showed up; changing mindset and treating them as another thing (utility values) largely reconcile the theory, despite remaining subtleties regarding its original (probabilistic) identity

    Simpson's Bias in NLP Training

    Full text link
    In most machine learning tasks, we evaluate a model MM on a given data population SS by measuring a population-level metric F(S;M)F(S;M). Examples of such evaluation metric FF include precision/recall for (binary) recognition, the F1 score for multi-class classification, and the BLEU metric for language generation. On the other hand, the model MM is trained by optimizing a sample-level loss G(St;M)G(S_t;M) at each learning step tt, where StS_t is a subset of SS (a.k.a. the mini-batch). Popular choices of GG include cross-entropy loss, the Dice loss, and sentence-level BLEU scores. A fundamental assumption behind this paradigm is that the mean value of the sample-level loss GG, if averaged over all possible samples, should effectively represent the population-level metric FF of the task, such as, that E[G(St;M)]≈F(S;M)\mathbb{E}[ G(S_t;M) ] \approx F(S;M). In this paper, we systematically investigate the above assumption in several NLP tasks. We show, both theoretically and experimentally, that some popular designs of the sample-level loss GG may be inconsistent with the true population-level metric FF of the task, so that models trained to optimize the former can be substantially sub-optimal to the latter, a phenomenon we call it, Simpson's bias, due to its deep connections with the classic paradox known as Simpson's reversal paradox in statistics and social sciences.Comment: AAAI 202

    Start Learning Chinese Words Fast: An Introduction

    Get PDF
    In order to cater to the needs of Chinese language lovers, 28 basic strokes of Chinese words are firstly introduced. It is pointed out that the difficulty for foreigners to learn Chinese words is their grotesque shapes written by brush (soft) pen and printed in books. The special writing method with a hard pen and 8 directions moving steps are invented and firstly shown, which is easy for foreigners to try. The size (length) of strokes will guide them to control the proportion of a word. It could be changed according to paper size and how large they want to write. Secondly, 48 common fragments derived from 28 basic strokes are listed and the writing method described. It could help foreigners to separate and re-write unknown Chinese words and even guess out the meanings. Lastly, many characteristics or regularities of Chinese words will have great attraction for foreign language learners. Some Chinese cultures or amusing stories are also exposed in fragments and example words

    Searching for the nano-Hertz stochastic gravitational wave background with the Chinese Pulsar Timing Array Data Release I

    Full text link
    Observing and timing a group of millisecond pulsars (MSPs) with high rotational stability enables the direct detection of gravitational waves (GWs). The GW signals can be identified from the spatial correlations encoded in the times-of-arrival of widely spaced pulsar-pairs. The Chinese Pulsar Timing Array (CPTA) is a collaboration aiming at the direct GW detection with observations carried out using Chinese radio telescopes. This short article serves as a `table of contents' for a forthcoming series of papers related to the CPTA Data Release 1 (CPTA DR1) which uses observations from the Five-hundred-meter Aperture Spherical radio Telescope (FAST). Here, after summarizing the time span and accuracy of CPTA DR1, we report the key results of our statistical inference finding a correlated signal with amplitude \log A_{\rm c}= -14.4 \,^{+1.0}_{-2.8} for spectral index in the range of α∈[−1.8,1.5]\alpha\in [-1.8, 1.5] assuming a GW background (GWB) induced quadrupolar correlation. The search for the Hellings-Downs (HD) correlation curve is also presented, where some evidence for the HD correlation has been found that a 4.6-σ\sigma statistical significance is achieved using the discrete frequency method around the frequency of 14 nHz. We expect that the future International Pulsar Timing Array data analysis and the next CPTA data release will be more sensitive to the nHz GWB, which could verify the current results.Comment: 18 pages, 6 figures, submitted to "Research in astronomy and astrophysics" 22nd March 202

    Effects of Low-Level Autonomic Stimulation on Prevention of Atrial Fibrillation Induced by Acute Electrical Remodeling

    Get PDF
    Background. Rapid atrial pacing (RAP) can induce electrical and autonomic remodeling and facilitate atrial fibrillation (AF). Recent reports showed that low-level vagosympathetic nerve stimulation (LLVNS) can suppress AF, as an antiarrhythmic effect. We hypothesized that LLVNS can reverse substrate heterogeneity induced by RAP. Methods and Results. Mongrel dogs were divided into (LLVNS+RAP) and RAP groups. Electrode catheters were sutured to multiple atrial sites, and LLVNS was applied to cervical vagosympathetic trunks with voltage 50% below the threshold slowing sinus rate by ⩽30 msec. RAP induced a significant decrease in effective refractory period (ERP) and increase in the window of vulnerability at all sites, characterized by descending and elevated gradient differences towards the ganglionic plexi (GP) sites, respectively. The ERP dispersion was obviously enlarged by RAP and more significant when the ERP of GP-related sites was considered. Recovery time from AF was also prolonged significantly as a result of RAP. LLVNS could reverse all these changes induced by RAP and recover the heterogeneous substrate to baseline. Conclusions. LLVNS can reverse the electrical and autonomic remodeling and abolish the GP-central gradient differences induced by RAP, and thus it can recover the homogeneous substrate, which may be the underlying mechanism of its antiarrhythmic effect

    Stomatin Inhibits Pannexin-1-Mediated Whole-Cell Currents by Interacting with Its Carboxyl Terminal

    Get PDF
    The pannexin-1 (Panx1) channel (often referred to as the Panx1 hemichannel) is a large-conductance channel in the plasma membrane of many mammalian cells. While opening of the channel is potentially detrimental to the cell, little is known about how it is regulated under physiological conditions. Here we show that stomatin inhibited Panx1 channel activity. In transfected HEK-293 cells, stomatin reduced Panx1-mediated whole-cell currents without altering either the total or membrane surface Panx1 protein expression. Stomatin coimmunoprecipitated with full-length Panx1 as well as a Panx1 fragment containing the fourth membrane-spanning domain and the cytosolic carboxyl terminal. The inhibitory effect of stomatin on Panx1-mediated whole-cell currents was abolished by truncating Panx1 at a site in the cytosolic carboxyl terminal. In primary culture of mouse astrocytes, inhibition of endogenous stomatin expression by small interfering RNA enhanced Panx1-mediated outward whole-cell currents. These observations suggest that stomatin may play important roles in astrocytes and other cells by interacting with Panx1 carboxyl terminal to limit channel opening

    Pruning Game Tree by Rollouts

    No full text
    In this paper we show that the alpha-beta algorithm and its successor MT-SSS*, as two classic minimax search algorithms, can be implemented as rollout algorithms, a generic algorithmic paradigm widely used in many domains. Specifically, we define a family of rollout algorithms, in which the rollout policy is restricted to select successor nodes only from a certain subset of the children list. We show that any rollout policy in this family (either deterministic or randomized) is guaranteed to evaluate the game tree correctly with a finite number of rollouts. Moreover, we identify simple rollout policies in this family that ``implement'' alpha-beta and MT-SSS*. Specifically, given any game tree, the rollout algorithms with these particular policies always visit the same set of leaf nodes in the same order with alpha-beta and MT-SSS*, respectively. Our results suggest that traditional pruning techniques and the recent Monte Carlo Tree Search algorithms, as two competing approaches for game tree evaluation, may be unified under the rollout paradigm

    Solar Image Restoration with the CycleGAN Based on Multi-fractal Properties of Texture Features

    Get PDF
    Texture is one of the most obvious characteristics in solar images and it is normally described by texture features. Because textures from solar images of the same wavelength are similar, we assume that texture features of solar images are multi-fractals. Based on this assumption, we propose a pure data-based image restoration method: with several high-resolution solar images as references, we use the Cycle-Consistent Adversarial Network to restore blurred images of the same steady physical process, in the same wavelength obtained by the same telescope. We test our method with simulated and real observation data and find that our method can improve the spatial resolution of solar images, without loss of any frames. Because our method does not need a paired training set or additional instruments, it can be used as a post-processing method for solar images obtained by either seeing-limited telescopes or telescopes with ground-layer adaptive optic systems

    PSF–NET: A Nonparametric Point-spread Function Model for Ground-based Optical Telescopes

    Get PDF
    Ground-based optical telescopes are seriously affected by atmospheric turbulence induced aberrations. Understanding properties of these aberrations is important both for instrument design and image restoration method development. Because the point-spread function can reflect performance of the whole optic system, it is appropriate to use the point-spread function to describe atmospheric turbulence induced aberrations. Assuming point-spread functions induced by the atmospheric turbulence with the same profile belong to the same manifold space, we propose a nonparametric point-spread function—PSF–NET. The PSF–NET has a cycle convolutional neural network structure and is a statistical representation of the manifold space of PSFs induced by the atmospheric turbulence with the same profile. Testing the PSF–NET with simulated and real observation data, we find that a well trained PSF–NET can restore any short exposure images blurred by atmospheric turbulence with the same profile. Besides, we further use the impulse response of the PSF–NET, which can be viewed as the statistical mean PSF, to analyze interpretation properties of the PSF–NET. We find that variations of statistical mean PSFs are caused by variations of the atmospheric turbulence profile: as the difference of the atmospheric turbulence profile increases, the difference between statistical mean PSFs also increases. The PSF–NET proposed in this paper provides a new way to analyze atmospheric turbulence induced aberrations, which would benefit the development of new observation methods for ground-based optical telescopes
    corecore