49 research outputs found

    Utility-Probability Duality of Neural Networks

    Full text link
    It is typically understood that the training of modern neural networks is a process of fitting the probability distribution of desired output. However, recent paradoxical observations in a number of language generation tasks let one wonder if this canonical probability-based explanation can really account for the empirical success of deep learning. To resolve this issue, we propose an alternative utility-based explanation to the standard supervised learning procedure in deep learning. The basic idea is to interpret the learned neural network not as a probability model but as an ordinal utility function that encodes the preference revealed in training data. In this perspective, training of the neural network corresponds to a utility learning process. Specifically, we show that for all neural networks with softmax outputs, the SGD learning dynamic of maximum likelihood estimation (MLE) can be seen as an iteration process that optimizes the neural network toward an optimal utility function. This utility-based interpretation can explain several otherwise-paradoxical observations about the neural networks thus trained. Moreover, our utility-based theory also entails an equation that can transform the learned utility values back to a new kind of probability estimation with which probability-compatible decision rules enjoy dramatic (double-digits) performance improvements. These evidences collectively reveal a phenomenon of utility-probability duality in terms of what modern neural networks are (truly) modeling: We thought they are one thing (probabilities), until the unexplainable showed up; changing mindset and treating them as another thing (utility values) largely reconcile the theory, despite remaining subtleties regarding its original (probabilistic) identity

    Simpson's Bias in NLP Training

    Full text link
    In most machine learning tasks, we evaluate a model MM on a given data population SS by measuring a population-level metric F(S;M)F(S;M). Examples of such evaluation metric FF include precision/recall for (binary) recognition, the F1 score for multi-class classification, and the BLEU metric for language generation. On the other hand, the model MM is trained by optimizing a sample-level loss G(St;M)G(S_t;M) at each learning step tt, where StS_t is a subset of SS (a.k.a. the mini-batch). Popular choices of GG include cross-entropy loss, the Dice loss, and sentence-level BLEU scores. A fundamental assumption behind this paradigm is that the mean value of the sample-level loss GG, if averaged over all possible samples, should effectively represent the population-level metric FF of the task, such as, that E[G(St;M)]β‰ˆF(S;M)\mathbb{E}[ G(S_t;M) ] \approx F(S;M). In this paper, we systematically investigate the above assumption in several NLP tasks. We show, both theoretically and experimentally, that some popular designs of the sample-level loss GG may be inconsistent with the true population-level metric FF of the task, so that models trained to optimize the former can be substantially sub-optimal to the latter, a phenomenon we call it, Simpson's bias, due to its deep connections with the classic paradox known as Simpson's reversal paradox in statistics and social sciences.Comment: AAAI 202

    Start Learning Chinese Words Fast: An Introduction

    Get PDF
    In order to cater to the needs of Chinese language lovers, 28 basic strokes of Chinese words are firstly introduced. It is pointed out that the difficulty for foreigners to learn Chinese words is their grotesque shapes written by brush (soft) pen and printed in books. The special writing method with a hard pen and 8 directions moving steps are invented and firstly shown, which is easy for foreigners to try. The size (length) of strokes will guide them to control the proportion of a word. It could be changed according to paper size and how large they want to write. Secondly, 48 common fragments derived from 28 basic strokes are listed and the writing method described. It could help foreigners to separate and re-write unknown Chinese words and even guess out the meanings. Lastly, many characteristics or regularities of Chinese words will have great attraction for foreign language learners. Some Chinese cultures or amusing stories are also exposed in fragments and example words

    Cang-ai volatile oil alleviates nasal inflammation via Th1/Th2 cell imbalance regulation in a rat model of ovalbumin-induced allergic rhinitis

    Get PDF
    We previously revealed that Cang-ai volatile oil (CAVO) regulates T-cell activity, enhancing the immune response in people with chronic respiratory diseases. However, the effects of CAVO on allergic rhinitis (AR) have not been investigated. Herein, we established an ovalbumin (OVA)-induced AR rat model to determine these effects. Sprague–Dawley (SD) rats were exposed to OVA for 3Β weeks. CAVO or loratadine (positive control) was given orally once daily for 2Β weeks to OVA-exposed rats. Behavior modeling nasal allergies was observed. Nasal mucosa, serum, and spleen samples of AR rats were analyzed. CAVO treatment significantly reduced the number of nose rubs and sneezes, and ameliorated several hallmarks of nasal mucosa tissue remodeling: inflammation, eosinophilic infiltration, goblet cell metaplasia, and mast cell hyperplasia. CAVO administration markedly upregulated expressions of interferon-Ξ³, interleukin (IL)-2, and IL-12, and downregulated expressions of serum tumor necrosis factor-Ξ±, IL-4, IL-5, IL-6, IL-13, immunoglobulin-E, and histamine. CAVO therapy also increased production of IFN-Ξ³ and T-helper type 1 (Th1)-specific T-box transcription factor (T-bet) of the cluster of differentiation-4+ T-cells in splenic lymphocytes, and protein and mRNA expressions of T-bet in nasal mucosa. In contrast, levels of the Th2 cytokine IL-4 and Th2-specific transcription factor GATA binding protein-3 were suppressed by CAVO. These cumulative findings demonstrate that CAVO therapy can alleviate AR by regulating the balance between Th1 and Th2 cells

    Searching for the nano-Hertz stochastic gravitational wave background with the Chinese Pulsar Timing Array Data Release I

    Full text link
    Observing and timing a group of millisecond pulsars (MSPs) with high rotational stability enables the direct detection of gravitational waves (GWs). The GW signals can be identified from the spatial correlations encoded in the times-of-arrival of widely spaced pulsar-pairs. The Chinese Pulsar Timing Array (CPTA) is a collaboration aiming at the direct GW detection with observations carried out using Chinese radio telescopes. This short article serves as a `table of contents' for a forthcoming series of papers related to the CPTA Data Release 1 (CPTA DR1) which uses observations from the Five-hundred-meter Aperture Spherical radio Telescope (FAST). Here, after summarizing the time span and accuracy of CPTA DR1, we report the key results of our statistical inference finding a correlated signal with amplitude \log A_{\rm c}= -14.4 \,^{+1.0}_{-2.8} for spectral index in the range of α∈[βˆ’1.8,1.5]\alpha\in [-1.8, 1.5] assuming a GW background (GWB) induced quadrupolar correlation. The search for the Hellings-Downs (HD) correlation curve is also presented, where some evidence for the HD correlation has been found that a 4.6-Οƒ\sigma statistical significance is achieved using the discrete frequency method around the frequency of 14 nHz. We expect that the future International Pulsar Timing Array data analysis and the next CPTA data release will be more sensitive to the nHz GWB, which could verify the current results.Comment: 18 pages, 6 figures, submitted to "Research in astronomy and astrophysics" 22nd March 202

    Effects of Low-Level Autonomic Stimulation on Prevention of Atrial Fibrillation Induced by Acute Electrical Remodeling

    Get PDF
    Background. Rapid atrial pacing (RAP) can induce electrical and autonomic remodeling and facilitate atrial fibrillation (AF). Recent reports showed that low-level vagosympathetic nerve stimulation (LLVNS) can suppress AF, as an antiarrhythmic effect. We hypothesized that LLVNS can reverse substrate heterogeneity induced by RAP. Methods and Results. Mongrel dogs were divided into (LLVNS+RAP) and RAP groups. Electrode catheters were sutured to multiple atrial sites, and LLVNS was applied to cervical vagosympathetic trunks with voltage 50% below the threshold slowing sinus rate by β©½30 msec. RAP induced a significant decrease in effective refractory period (ERP) and increase in the window of vulnerability at all sites, characterized by descending and elevated gradient differences towards the ganglionic plexi (GP) sites, respectively. The ERP dispersion was obviously enlarged by RAP and more significant when the ERP of GP-related sites was considered. Recovery time from AF was also prolonged significantly as a result of RAP. LLVNS could reverse all these changes induced by RAP and recover the heterogeneous substrate to baseline. Conclusions. LLVNS can reverse the electrical and autonomic remodeling and abolish the GP-central gradient differences induced by RAP, and thus it can recover the homogeneous substrate, which may be the underlying mechanism of its antiarrhythmic effect

    Stomatin Inhibits Pannexin-1-Mediated Whole-Cell Currents by Interacting with Its Carboxyl Terminal

    Get PDF
    The pannexin-1 (Panx1) channel (often referred to as the Panx1 hemichannel) is a large-conductance channel in the plasma membrane of many mammalian cells. While opening of the channel is potentially detrimental to the cell, little is known about how it is regulated under physiological conditions. Here we show that stomatin inhibited Panx1 channel activity. In transfected HEK-293 cells, stomatin reduced Panx1-mediated whole-cell currents without altering either the total or membrane surface Panx1 protein expression. Stomatin coimmunoprecipitated with full-length Panx1 as well as a Panx1 fragment containing the fourth membrane-spanning domain and the cytosolic carboxyl terminal. The inhibitory effect of stomatin on Panx1-mediated whole-cell currents was abolished by truncating Panx1 at a site in the cytosolic carboxyl terminal. In primary culture of mouse astrocytes, inhibition of endogenous stomatin expression by small interfering RNA enhanced Panx1-mediated outward whole-cell currents. These observations suggest that stomatin may play important roles in astrocytes and other cells by interacting with Panx1 carboxyl terminal to limit channel opening

    Pruning Game Tree by Rollouts

    No full text
    In this paper we show that the alpha-beta algorithm and its successor MT-SSS*, as two classic minimax search algorithms, can be implemented as rollout algorithms, a generic algorithmic paradigm widely used in many domains. Specifically, we define a family of rollout algorithms, in which the rollout policy is restricted to select successor nodes only from a certain subset of the children list. We show that any rollout policy in this family (either deterministic or randomized) is guaranteed to evaluate the game tree correctly with a finite number of rollouts. Moreover, we identify simple rollout policies in this family that ``implement'' alpha-beta and MT-SSS*. Specifically, given any game tree, the rollout algorithms with these particular policies always visit the same set of leaf nodes in the same order with alpha-beta and MT-SSS*, respectively. Our results suggest that traditional pruning techniques and the recent Monte Carlo Tree Search algorithms, as two competing approaches for game tree evaluation, may be unified under the rollout paradigm

    Solar Image Restoration with the CycleGAN Based on Multi-fractal Properties of Texture Features

    Get PDF
    Texture is one of the most obvious characteristics in solar images and it is normally described by texture features. Because textures from solar images of the same wavelength are similar, we assume that texture features of solar images are multi-fractals. Based on this assumption, we propose a pure data-based image restoration method: with several high-resolution solar images as references, we use the Cycle-Consistent Adversarial Network to restore blurred images of the same steady physical process, in the same wavelength obtained by the same telescope. We test our method with simulated and real observation data and find that our method can improve the spatial resolution of solar images, without loss of any frames. Because our method does not need a paired training set or additional instruments, it can be used as a post-processing method for solar images obtained by either seeing-limited telescopes or telescopes with ground-layer adaptive optic systems
    corecore