49 research outputs found
Utility-Probability Duality of Neural Networks
It is typically understood that the training of modern neural networks is a
process of fitting the probability distribution of desired output. However,
recent paradoxical observations in a number of language generation tasks let
one wonder if this canonical probability-based explanation can really account
for the empirical success of deep learning. To resolve this issue, we propose
an alternative utility-based explanation to the standard supervised learning
procedure in deep learning. The basic idea is to interpret the learned neural
network not as a probability model but as an ordinal utility function that
encodes the preference revealed in training data. In this perspective, training
of the neural network corresponds to a utility learning process. Specifically,
we show that for all neural networks with softmax outputs, the SGD learning
dynamic of maximum likelihood estimation (MLE) can be seen as an iteration
process that optimizes the neural network toward an optimal utility function.
This utility-based interpretation can explain several otherwise-paradoxical
observations about the neural networks thus trained. Moreover, our
utility-based theory also entails an equation that can transform the learned
utility values back to a new kind of probability estimation with which
probability-compatible decision rules enjoy dramatic (double-digits)
performance improvements. These evidences collectively reveal a phenomenon of
utility-probability duality in terms of what modern neural networks are (truly)
modeling: We thought they are one thing (probabilities), until the
unexplainable showed up; changing mindset and treating them as another thing
(utility values) largely reconcile the theory, despite remaining subtleties
regarding its original (probabilistic) identity
Simpson's Bias in NLP Training
In most machine learning tasks, we evaluate a model on a given data
population by measuring a population-level metric . Examples of
such evaluation metric include precision/recall for (binary) recognition,
the F1 score for multi-class classification, and the BLEU metric for language
generation. On the other hand, the model is trained by optimizing a
sample-level loss at each learning step , where is a subset
of (a.k.a. the mini-batch). Popular choices of include cross-entropy
loss, the Dice loss, and sentence-level BLEU scores. A fundamental assumption
behind this paradigm is that the mean value of the sample-level loss , if
averaged over all possible samples, should effectively represent the
population-level metric of the task, such as, that .
In this paper, we systematically investigate the above assumption in several
NLP tasks. We show, both theoretically and experimentally, that some popular
designs of the sample-level loss may be inconsistent with the true
population-level metric of the task, so that models trained to optimize the
former can be substantially sub-optimal to the latter, a phenomenon we call it,
Simpson's bias, due to its deep connections with the classic paradox known as
Simpson's reversal paradox in statistics and social sciences.Comment: AAAI 202
Start Learning Chinese Words Fast: An Introduction
In order to cater to the needs of Chinese language lovers, 28 basic strokes of Chinese words are firstly introduced. It is pointed out that the difficulty for foreigners to learn Chinese words is their grotesque shapes written by brush (soft) pen and printed in books. The special writing method with a hard pen and 8 directions moving steps are invented and firstly shown, which is easy for foreigners to try. The size (length) of strokes will guide them to control the proportion of a word. It could be changed according to paper size and how large they want to write. Secondly, 48 common fragments derived from 28 basic strokes are listed and the writing method described. It could help foreigners to separate and re-write unknown Chinese words and even guess out the meanings. Lastly, many characteristics or regularities of Chinese words will have great attraction for foreign language learners. Some Chinese cultures or amusing stories are also exposed in fragments and example words
Cang-ai volatile oil alleviates nasal inflammation via Th1/Th2 cell imbalance regulation in a rat model of ovalbumin-induced allergic rhinitis
We previously revealed that Cang-ai volatile oil (CAVO) regulates T-cell activity, enhancing the immune response in people with chronic respiratory diseases. However, the effects of CAVO on allergic rhinitis (AR) have not been investigated. Herein, we established an ovalbumin (OVA)-induced AR rat model to determine these effects. SpragueβDawley (SD) rats were exposed to OVA for 3Β weeks. CAVO or loratadine (positive control) was given orally once daily for 2Β weeks to OVA-exposed rats. Behavior modeling nasal allergies was observed. Nasal mucosa, serum, and spleen samples of AR rats were analyzed. CAVO treatment significantly reduced the number of nose rubs and sneezes, and ameliorated several hallmarks of nasal mucosa tissue remodeling: inflammation, eosinophilic infiltration, goblet cell metaplasia, and mast cell hyperplasia. CAVO administration markedly upregulated expressions of interferon-Ξ³, interleukin (IL)-2, and IL-12, and downregulated expressions of serum tumor necrosis factor-Ξ±, IL-4, IL-5, IL-6, IL-13, immunoglobulin-E, and histamine. CAVO therapy also increased production of IFN-Ξ³ and T-helper type 1 (Th1)-specific T-box transcription factor (T-bet) of the cluster of differentiation-4+ T-cells in splenic lymphocytes, and protein and mRNA expressions of T-bet in nasal mucosa. In contrast, levels of the Th2 cytokine IL-4 and Th2-specific transcription factor GATA binding protein-3 were suppressed by CAVO. These cumulative findings demonstrate that CAVO therapy can alleviate AR by regulating the balance between Th1 and Th2 cells
Searching for the nano-Hertz stochastic gravitational wave background with the Chinese Pulsar Timing Array Data Release I
Observing and timing a group of millisecond pulsars (MSPs) with high
rotational stability enables the direct detection of gravitational waves (GWs).
The GW signals can be identified from the spatial correlations encoded in the
times-of-arrival of widely spaced pulsar-pairs. The Chinese Pulsar Timing Array
(CPTA) is a collaboration aiming at the direct GW detection with observations
carried out using Chinese radio telescopes. This short article serves as a
`table of contents' for a forthcoming series of papers related to the CPTA Data
Release 1 (CPTA DR1) which uses observations from the Five-hundred-meter
Aperture Spherical radio Telescope (FAST). Here, after summarizing the time
span and accuracy of CPTA DR1, we report the key results of our statistical
inference finding a correlated signal with amplitude \log A_{\rm c}= -14.4
\,^{+1.0}_{-2.8} for spectral index in the range of
assuming a GW background (GWB) induced quadrupolar correlation. The search for
the Hellings-Downs (HD) correlation curve is also presented, where some
evidence for the HD correlation has been found that a 4.6- statistical
significance is achieved using the discrete frequency method around the
frequency of 14 nHz. We expect that the future International Pulsar Timing
Array data analysis and the next CPTA data release will be more sensitive to
the nHz GWB, which could verify the current results.Comment: 18 pages, 6 figures, submitted to "Research in astronomy and
astrophysics" 22nd March 202
Effects of Low-Level Autonomic Stimulation on Prevention of Atrial Fibrillation Induced by Acute Electrical Remodeling
Background. Rapid atrial pacing (RAP) can induce electrical and autonomic remodeling and facilitate atrial fibrillation (AF). Recent reports showed that low-level vagosympathetic nerve stimulation (LLVNS) can suppress AF, as an antiarrhythmic effect. We hypothesized that LLVNS can reverse substrate heterogeneity induced by RAP. Methods and Results. Mongrel dogs were divided into (LLVNS+RAP) and RAP groups. Electrode catheters were sutured to multiple atrial sites, and LLVNS was applied to cervical vagosympathetic trunks with voltage 50% below the threshold slowing sinus rate by β©½30βmsec. RAP induced a significant decrease in effective refractory period (ERP) and increase in the window of vulnerability at all sites, characterized by descending and elevated gradient differences towards the ganglionic plexi (GP) sites, respectively. The ERP dispersion was obviously enlarged by RAP and more significant when the ERP of GP-related sites was considered. Recovery time from AF was also prolonged significantly as a result of RAP. LLVNS could reverse all these changes induced by RAP and recover the heterogeneous substrate to baseline. Conclusions. LLVNS can reverse the electrical and autonomic remodeling and abolish the GP-central gradient differences induced by RAP, and thus it can recover the homogeneous substrate, which may be the underlying mechanism of its antiarrhythmic effect
Stomatin Inhibits Pannexin-1-Mediated Whole-Cell Currents by Interacting with Its Carboxyl Terminal
The pannexin-1 (Panx1) channel (often referred to as the Panx1 hemichannel) is a large-conductance channel in the plasma membrane of many mammalian cells. While opening of the channel is potentially detrimental to the cell, little is known about how it is regulated under physiological conditions. Here we show that stomatin inhibited Panx1 channel activity. In transfected HEK-293 cells, stomatin reduced Panx1-mediated whole-cell currents without altering either the total or membrane surface Panx1 protein expression. Stomatin coimmunoprecipitated with full-length Panx1 as well as a Panx1 fragment containing the fourth membrane-spanning domain and the cytosolic carboxyl terminal. The inhibitory effect of stomatin on Panx1-mediated whole-cell currents was abolished by truncating Panx1 at a site in the cytosolic carboxyl terminal. In primary culture of mouse astrocytes, inhibition of endogenous stomatin expression by small interfering RNA enhanced Panx1-mediated outward whole-cell currents. These observations suggest that stomatin may play important roles in astrocytes and other cells by interacting with Panx1 carboxyl terminal to limit channel opening
Pruning Game Tree by Rollouts
In this paper we show that the alpha-beta algorithm and its successor MT-SSS*, as two classic minimax search algorithms, can be implemented as rollout algorithms, a generic algorithmic paradigm widely used in many domains. Specifically, we define a family of rollout algorithms, in which the rollout policy is restricted to select successor nodes only from a certain subset of the children list. We show that any rollout policy in this family (either deterministic or randomized) is guaranteed to evaluate the game tree correctly with a finite number of rollouts. Moreover, we identify simple rollout policies in this family that ``implement'' alpha-beta and MT-SSS*. Specifically, given any game tree, the rollout algorithms with these particular policies always visit the same set of leaf nodes in the same order with alpha-beta and MT-SSS*, respectively. Our results suggest that traditional pruning techniques and the recent Monte Carlo Tree Search algorithms, as two competing approaches for game tree evaluation, may be unified under the rollout paradigm
Solar Image Restoration with the CycleGAN Based on Multi-fractal Properties of Texture Features
Texture is one of the most obvious characteristics in solar images and it is normally described by texture features. Because textures from solar images of the same wavelength are similar, we assume that texture features of solar images are multi-fractals. Based on this assumption, we propose a pure data-based image restoration method: with several high-resolution solar images as references, we use the Cycle-Consistent Adversarial Network to restore blurred images of the same steady physical process, in the same wavelength obtained by the same telescope. We test our method with simulated and real observation data and find that our method can improve the spatial resolution of solar images, without loss of any frames. Because our method does not need a paired training set or additional instruments, it can be used as a post-processing method for solar images obtained by either seeing-limited telescopes or telescopes with ground-layer adaptive optic systems