315 research outputs found
Information geometry in quantum field theory: lessons from simple examples
Motivated by the increasing connections between information theory and
high-energy physics, particularly in the context of the AdS/CFT correspondence,
we explore the information geometry associated to a variety of simple systems.
By studying their Fisher metrics, we derive some general lessons that may have
important implications for the application of information geometry in
holography. We begin by demonstrating that the symmetries of the physical
theory under study play a strong role in the resulting geometry, and that the
appearance of an AdS metric is a relatively general feature. We then
investigate what information the Fisher metric retains about the physics of the
underlying theory by studying the geometry for both the classical 2d Ising
model and the corresponding 1d free fermion theory, and find that the curvature
diverges precisely at the phase transition on both sides. We discuss the
differences that result from placing a metric on the space of theories vs.
states, using the example of coherent free fermion states. We compare the
latter to the metric on the space of coherent free boson states and show that
in both cases the metric is determined by the symmetries of the corresponding
density matrix. We also clarify some misconceptions in the literature
pertaining to different notions of flatness associated to metric and non-metric
connections, with implications for how one interprets the curvature of the
geometry. Our results indicate that in general, caution is needed when
connecting the AdS geometry arising from certain models with the AdS/CFT
correspondence, and seek to provide a useful collection of guidelines for
future progress in this exciting area.Comment: 36 pages, 2 figures; added new section and appendix, miscellaneous
improvement
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Transformer-based language models (LMs) are at the core of modern NLP, but
their internal prediction construction process is opaque and largely not
understood. In this work, we make a substantial step towards unveiling this
underlying prediction process, by reverse-engineering the operation of the
feed-forward network (FFN) layers, one of the building blocks of transformer
models. We view the token representation as a changing distribution over the
vocabulary, and the output from each FFN layer as an additive update to that
distribution. Then, we analyze the FFN updates in the vocabulary space, showing
that each update can be decomposed to sub-updates corresponding to single FFN
parameter vectors, each promoting concepts that are often human-interpretable.
We then leverage these findings for controlling LM predictions, where we reduce
the toxicity of GPT2 by almost 50%, and for improving computation efficiency
with a simple early exit rule, saving 20% of computation on average
The edge of chaos: quantum field theory and deep neural networks
We explicitly construct the quantum field theory corresponding to a general class of deep neural networks encompassing both recurrent and feedforward architectures. We first consider the mean-field theory (MFT) obtained as the leading saddlepoint in the action, and derive the condition for criticality via the largest Lyapunov exponent. We then compute the loop corrections to the correlation function in a perturbative expansion in the ratio of depth T to width N, and find a precise analogy with the well-studied O(N) vector model, in which the variance of the weight initializations plays the role of the 't Hooft coupling. In particular, we compute both the O(1) corrections quantifying fluctuations from typicality in the ensemble of networks, and the subleading O(T/N) corrections due to finite-width effects. These provide corrections to the correlation length that controls the depth to which information can propagate through the network, and thereby sets the scale at which such networks are trainable by gradient descent. Our analysis provides a first-principles approach to the rapidly emerging NN-QFT correspondence, and opens several interesting avenues to the study of criticality in deep neural networks
Impact of Hypothyroidism and Heart Failure on Hospitalization Risk.
BackgroundPrior studies suggest that the relationship between hypothyroidism and mortality is dependent on underlying cardiovascular risk. Little is known about the association of hypothyroidism with hospitalization risk, and how these associations are modified by cardiovascular status.MethodsThis study examined the association of thyroid status, defined by serum thyrotropin (TSH), with hospitalization risk among patients who received care at a large university-based tertiary care center between 1990 and 2015. Thyroid status was categorized as hypothyroidism versus euthyroidism (TSH >4.7 vs. 0.3-4.7 mIU/L, respectively). The relationship between thyroid status and hospitalization risk stratified by cardiovascular status was examined using multivariable Cox models.ResultsAmong 52,856 patients who met eligibility criteria, 49,791 (94.2%) had euthyroidism and 3065 (5.8%) had hypothyroidism. In analyses stratified by congestive heart failure (CHF) status, compared to euthyroidism, hypothyroidism was associated with higher risk of hospitalization in those with CHF but slightly lower risk in those without CHF (adjusted hazard ratio [aHRs] = 1.86 [confidence interval (CI) 1.17-2.94] and HR = 0.95 [CI 0.92-0.99], respectively; p = 0.006). In sensitivity analyses accounting for death as a competing event, underlying coronary artery disease modified the hypothyroidism-hospitalization relationship, such that stronger associations were observed among those with versus without coronary artery disease. In competing risk analyses, hypothyroidism was associated with higher versus lower risk of hospitalization among those with versus without cerebrovascular disease, respectively.ConclusionsHypothyroidism is associated with higher hospitalization risk among patients with underlying cardiovascular disease. Future studies are needed to determine whether correction of thyroid status with replacement therapy ameliorates hospitalization risk in this population
Towards quantifying information flows: relative entropy in deep neural networks and the renormalization group
We investigate the analogy between the renormalization group (RG) and deep neural networks, wherein subsequent layers of neurons are analogous to successive steps along the RG. In particular, we quantify the flow of information by explicitly computing the relative entropy or Kullback-Leibler divergence in both the one- and two-dimensional Ising models under decimation RG, as well as in a feedforward neural network as a function of depth. We observe qualitatively identical behavior characterized by the monotonic increase to a parameter-dependent asymptotic value. On the quantum field theory side, the monotonic increase confirms the connection between the relative entropy and the c-theorem. For the neural networks, the asymptotic behavior may have implications for various information maximization methods in machine learning, as well as for disentangling compactness and generalizability. Furthermore, while both the two-dimensional Ising model and the random neural networks we consider exhibit non-trivial critical points, the relative entropy appears insensitive to the phase structure of either system. In this sense, more refined probes are required in order to fully elucidate the flow of information in these models
Criticality versus uniformity in deep neural networks
Deep feedforward networks initialized along the edge of chaos exhibit exponentially superior training ability as quantified by maximum trainable depth. In this work, we explore the effect of saturation of the tanh activation function along the edge of chaos. In particular, we determine the line of uniformity in phase space along which the post-activation distribution has maximum entropy. This line intersects the edge of chaos, and indicates the regime beyond which saturation of the activation function begins to impede training efficiency. Our results suggest that initialization along the edge of chaos is a necessary but not sufficient condition for optimal trainability
Optimally splitting cases for training and testing high dimensional classifiers
<p>Abstract</p> <p>Background</p> <p>We consider the problem of designing a study to develop a predictive classifier from high dimensional data. A common study design is to split the sample into a training set and an independent test set, where the former is used to develop the classifier and the latter to evaluate its performance. In this paper we address the question of what proportion of the samples should be devoted to the training set. How does this proportion impact the mean squared error (MSE) of the prediction accuracy estimate?</p> <p>Results</p> <p>We develop a non-parametric algorithm for determining an optimal splitting proportion that can be applied with a specific dataset and classifier algorithm. We also perform a broad simulation study for the purpose of better understanding the factors that determine the best split proportions and to evaluate commonly used splitting strategies (1/2 training or 2/3 training) under a wide variety of conditions. These methods are based on a decomposition of the MSE into three intuitive component parts.</p> <p>Conclusions</p> <p>By applying these approaches to a number of synthetic and real microarray datasets we show that for linear classifiers the optimal proportion depends on the overall number of samples available and the degree of differential expression between the classes. The optimal proportion was found to depend on the full dataset size (n) and classification accuracy - with higher accuracy and smaller <it>n </it>resulting in more assigned to the training set. The commonly used strategy of allocating 2/3rd of cases for training was close to optimal for reasonable sized datasets (<it>n </it>≥ 100) with strong signals (i.e. 85% or greater full dataset accuracy). In general, we recommend use of our nonparametric resampling approach for determing the optimal split. This approach can be applied to any dataset, using any predictor development method, to determine the best split.</p
The Marker State Space (MSS) Method for Classifying Clinical Samples
The development of accurate clinical biomarkers has been challenging in part due to the diversity between patients and diseases. One approach to account for the diversity is to use multiple markers to classify patients, based on the concept that each individual marker contributes information from its respective subclass of patients. Here we present a new strategy for developing biomarker panels that accounts for completely distinct patient subclasses. Marker State Space (MSS) defines "marker states" based on all possible patterns of high and low values among a panel of markers. Each marker state is defined as either a case state or a control state, and a sample is classified as case or control based on the state it occupies. MSS was used to define multi-marker panels that were robust in cross validation and training-set/test-set analyses and that yielded similar classification accuracy to several other classification algorithms. A three-marker panel for discriminating pancreatic cancer patients from control subjects revealed subclasses of patients based on distinct marker states. MSS provides a straightforward approach for modeling highly divergent subclasses of patients, which may be adaptable for diverse applications. © 2013 Fallon et al
Synthetic Metabolism: Engineering Biology at the Protein and Pathway Scales
Biocatalysis has become a powerful tool for the synthesis of high-value compounds, particularly so in the case of highly functionalized and/or stereoactive products. Nature has supplied thousands of enzymes and assembled them into numerous metabolic pathways. Although these native pathways can be use to produce natural bioproducts, there are many valuable and useful compounds that have no known natural biochemical route. Consequently, there is a need for both unnatural metabolic pathways and novel enzymatic activities upon which these pathways can be built. Here, we review the theoretical and experimental strategies for engineering synthetic metabolic pathways at the protein and pathway scales, and highlight the challenges that this subfield of synthetic biology currently faces.Synthetic Biology Engineering Research CenterNational Science Foundation (Grant no. 0540879
Foliar water uptake: a common water acquisition strategy for plants of the redwood forest
Evaluations of plant water use in ecosystems around the world reveal a shared capacity by many different species to absorb rain, dew, or fog water directly into their leaves or plant crowns. This mode of water uptake provides an important water subsidy that relieves foliar water stress. Our study provides the first comparative evaluation of foliar uptake capacity among the dominant plant taxa from the coast redwood ecosystem of California where crown-wetting events by summertime fog frequently occur during an otherwise drought-prone season. Previous research demonstrated that the dominant overstory tree species, Sequoia sempervirens, takes up fog water by both its roots (via drip from the crown to the soil) and directly through its leaf surfaces. The present study adds to these early findings and shows that 80% of the dominant species from the redwood forest exhibit this foliar uptake water acquisition strategy. The plants studied include canopy trees, understory ferns, and shrubs. Our results also show that foliar uptake provides direct hydration to leaves, increasing leaf water content by 2–11%. In addition, 60% of redwood forest species investigated demonstrate nocturnal stomatal conductance to water vapor. Such findings indicate that even species unable to absorb water directly into their foliage may still receive indirect benefits from nocturnal leaf wetting through suppressed transpiration. For these species, leaf-wetting events enhance the efficacy of nighttime re-equilibration with available soil water and therefore also increase pre-dawn leaf water potentials
- …