80 research outputs found

    Estimating Mixture Entropy with Pairwise Distances

    Full text link
    Mixture distributions arise in many parametric and non-parametric settings -- for example, in Gaussian mixture models and in non-parametric estimation. It is often necessary to compute the entropy of a mixture, but, in most cases, this quantity has no closed-form expression, making some form of approximation necessary. We propose a family of estimators based on a pairwise distance function between mixture components, and show that this estimator class has many attractive properties. For many distributions of interest, the proposed estimators are efficient to compute, differentiable in the mixture parameters, and become exact when the mixture components are clustered. We prove this family includes lower and upper bounds on the mixture entropy. The Chernoff α\alpha-divergence gives a lower bound when chosen as the distance function, with the Bhattacharyya distance providing the tightest lower bound for components that are symmetric and members of a location family. The Kullback-Leibler divergence gives an upper bound when used as the distance function. We provide closed-form expressions of these bounds for mixtures of Gaussians, and discuss their applications to the estimation of mutual information. We then demonstrate that our bounds are significantly tighter than well-known existing bounds using numeric simulations. This estimator class is very useful in optimization problems involving maximization/minimization of entropy and mutual information, such as MaxEnt and rate distortion problems.Comment: Corrects several errata in published version, in particular in Section V (bounds on mutual information

    Nonlinear Information Bottleneck

    Full text link
    Information bottleneck (IB) is a technique for extracting information in one random variable XX that is relevant for predicting another random variable YY. IB works by encoding XX in a compressed "bottleneck" random variable MM from which YY can be accurately decoded. However, finding the optimal bottleneck variable involves a difficult optimization problem, which until recently has been considered for only two limited cases: discrete XX and YY with small state spaces, and continuous XX and YY with a Gaussian joint distribution (in which case optimal encoding and decoding maps are linear). We propose a method for performing IB on arbitrarily-distributed discrete and/or continuous XX and YY, while allowing for nonlinear encoding and decoding maps. Our approach relies on a novel non-parametric upper bound for mutual information. We describe how to implement our method using neural networks. We then show that it achieves better performance than the recently-proposed "variational IB" method on several real-world datasets

    Caveats for information bottleneck in deterministic scenarios

    Full text link
    Information bottleneck (IB) is a method for extracting information from one random variable XX that is relevant for predicting another random variable YY. To do so, IB identifies an intermediate "bottleneck" variable TT that has low mutual information I(X;T)I(X;T) and high mutual information I(Y;T)I(Y;T). The "IB curve" characterizes the set of bottleneck variables that achieve maximal I(Y;T)I(Y;T) for a given I(X;T)I(X;T), and is typically explored by maximizing the "IB Lagrangian", I(Y;T)βI(X;T)I(Y;T) - \beta I(X;T). In some cases, YY is a deterministic function of XX, including many classification problems in supervised learning where the output class YY is a deterministic function of the input XX. We demonstrate three caveats when using IB in any situation where YY is a deterministic function of XX: (1) the IB curve cannot be recovered by maximizing the IB Lagrangian for different values of β\beta; (2) there are "uninteresting" trivial solutions at all points of the IB curve; and (3) for multi-layer classifiers that achieve low prediction error, different layers cannot exhibit a strict trade-off between compression and prediction, contrary to a recent proposal. We also show that when YY is a small perturbation away from being a deterministic function of XX, these three caveats arise in an approximate way. To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases. We demonstrate the three caveats on the MNIST dataset

    Survival outcome according to KRAS mutation status in newly diagnosed patients with stage IV non-small cell lung cancer treated with platinum doublet chemotherapy

    Get PDF
    INTRODUCTION: Mutations (MT) of the KRAS gene are the most common mutation in non-small cell lung cancer (NSCLC), seen in about 20-25% of all adenocarcinomas. Effect of KRAS MT on response to cytotoxic chemotherapy is unclear. METHODS: We undertook a single-institution retrospective analysis of 93 consecutive patients with stage IV NSCLC adenocarcinoma with known KRAS and EGFR MT status to determine the association of KRAS MT with survival. All patients were treated between January 1, 2008 and December 31, 2011 with standard platinum based chemotherapy at the University of Pennsylvania. Overall and progression free survival were analyzed using Kaplan-Meier and Cox proportional hazard methods. RESULTS: All patients in this series received platinum doublet chemotherapy, and 42 (45%) received bevacizumab. Overall survival and progression free survival for patients with KRAS MT was no worse than for patients with wild type KRAS. Median overall survival for patients with KRAS MT was 19 months (mo) vs. 15.6 mo for KRAS WT, p = 0.34, and progression-free survival was 6.2 mo in patients with KRAS MT vs. 7mo in patients with KRAS WT, p = 0.51. In multivariable analysis including age, race, gender, and ECOG PS, KRAS MT was not associated with overall survival (HR 1.12, 95% CI 0.58-2.16, p = 0.74) or progression free survival (HR 0.80, 95% CI 0.48-1.34, p = 41). Of note, receipt of bevacizumab was associated with improved overall survival only in KRAS WT patients (HR 0.34, p = 0.01). CONCLUSIONS: KRAS MT are not associated with inferior progression-free and overall survival in advanced NSCLC patients treated with standard first-line platinum-based chemotherapy

    Towards practical reinforcement learning for tokamak magnetic control

    Full text link
    Reinforcement learning (RL) has shown promising results for real-time control systems, including the domain of plasma magnetic control. However, there are still significant drawbacks compared to traditional feedback control approaches for magnetic confinement. In this work, we address key drawbacks of the RL method; achieving higher control accuracy for desired plasma properties, reducing the steady-state error, and decreasing the required time to learn new tasks. We build on top of \cite{degrave2022magnetic}, and present algorithmic improvements to the agent architecture and training procedure. We present simulation results that show up to 65\% improvement in shape accuracy, achieve substantial reduction in the long-term bias of the plasma current, and additionally reduce the training time required to learn new tasks by a factor of 3 or more. We present new experiments using the upgraded RL-based controllers on the TCV tokamak, which validate the simulation results achieved, and point the way towards routinely achieving accurate discharges using the RL approach

    Age at first birth in women is genetically associated with increased risk of schizophrenia

    Get PDF
    Prof. Paunio on PGC:n jäsenPrevious studies have shown an increased risk for mental health problems in children born to both younger and older parents compared to children of average-aged parents. We previously used a novel design to reveal a latent mechanism of genetic association between schizophrenia and age at first birth in women (AFB). Here, we use independent data from the UK Biobank (N = 38,892) to replicate the finding of an association between predicted genetic risk of schizophrenia and AFB in women, and to estimate the genetic correlation between schizophrenia and AFB in women stratified into younger and older groups. We find evidence for an association between predicted genetic risk of schizophrenia and AFB in women (P-value = 1.12E-05), and we show genetic heterogeneity between younger and older AFB groups (P-value = 3.45E-03). The genetic correlation between schizophrenia and AFB in the younger AFB group is -0.16 (SE = 0.04) while that between schizophrenia and AFB in the older AFB group is 0.14 (SE = 0.08). Our results suggest that early, and perhaps also late, age at first birth in women is associated with increased genetic risk for schizophrenia in the UK Biobank sample. These findings contribute new insights into factors contributing to the complex bio-social risk architecture underpinning the association between parental age and offspring mental health.Peer reviewe
    corecore