9 research outputs found

    The Kernel Density Integral Transformation

    Full text link
    Feature preprocessing continues to play a critical role when applying machine learning and statistical methods to tabular data. In this paper, we propose the use of the kernel density integral transformation as a feature preprocessing step. Our approach subsumes the two leading feature preprocessing methods as limiting cases: linear min-max scaling and quantile transformation. We demonstrate that, without hyperparameter tuning, the kernel density integral transformation can be used as a simple drop-in replacement for either method, offering protection from the weaknesses of each. Alternatively, with tuning of a single continuous hyperparameter, we frequently outperform both of these methods. Finally, we show that the kernel density transformation can be profitably applied to statistical data analysis, particularly in correlation analysis and univariate clustering.Comment: Published in Transactions on Machine Learning Research (10/2023

    Learning Gene Networks Underlying Clinical Phenotypes Under SNP Perturbations From Genome-Wide Data

    No full text
    Recent technologies are generating an abundance of genome sequence data and molecular and clinical phenotype data, providing an opportunity to understand thegenetic architecture and molecular mechanisms underlying diseases. Previous approaches have largely focused on the co-localization of single-nucleotide polymorphisms(SNPs) associated with clinical and expression traits, each identified from genome-wide association studies and expression quantitative trait locus (eQTL) mapping,and thus have provided only limited capabilities for uncovering the molecular mechanisms behind the SNPs influencing clinical phenotypes. Here we aim to extractrich information on the functional role of trait-perturbing SNPs that goes far beyond this simple co-localization. We introduce a computational framework called PerturbNet for learning the gene network that modulates the influence of SNPs on phenotypes, using SNPs as naturally occurring perturbation of a biological system. PerturbNet uses a probabilistic graphical model to directly model both the cascade of perturbation from SNPs to the gene network to the phenotype network and the network at each layer of molecular and clinical phenotypes. PerturbNet learns theentire model by solving a single optimization problem with an extremely fast algorithm that can analyze human genome-wide data within a few hours. In our analysisof asthma data, for a locus that was previously implicated in asthma susceptibility but for which little is known about the molecular mechanism underlying the association,PerturbNet revealed the gene network modules that mediate the influence of the SNP on asthma phenotypes. Many genes in this network module were well supported in the literature as asthma-related

    On Sparse Gaussian Chain Graph Models

    No full text
    In this paper, we address the problem of learning the structure of Gaussian chain graph models in a high-dimensional space. Chain graph models are generalizations of undirected and directed graphical models that contain a mixed set of directed and undirected edges. While the problem of sparse structure learning has been studied extensively for Gaussian graphical models and more recently for conditional Gaussian graphical models (CGGMs), there has been little previous work on the structure recovery of Gaussian chain graph models. We consider linear regression models and a reparameterization of the linear regression models using CGGMs as building blocks of chain graph models. We argue that when the goal is to recover model structures, there are many advantages of using CGGMs as chain component models over linear regression models, including convexity of the optimization problem, computational efficiency, recovery of structured sparsity, and ability to leverage the model structure for semi-supervised learning. We demonstrate our approach on simulated and genomic datasets

    Learning gene networks underlying clinical phenotypes using SNP perturbation.

    No full text
    Availability of genome sequence, molecular, and clinical phenotype data for large patient cohorts generated by recent technological advances provides an opportunity to dissect the genetic architecture of complex diseases at system level. However, previous analyses of such data have largely focused on the co-localization of SNPs associated with clinical and expression traits, each identified from genome-wide association studies and expression quantitative trait locus mapping. Thus, their description of the molecular mechanisms behind the SNPs influencing clinical phenotypes was limited to the single gene linked to the co-localized SNP. Here we introduce PerturbNet, a statistical framework for learning gene networks that modulate the influence of genetic variants on phenotypes, using genetic variants as naturally occurring perturbation of a biological system. PerturbNet uses a probabilistic graphical model to directly model the cascade of perturbation from genetic variants to the gene network to the phenotype network along with the networks at each layer of the biological system. PerturbNet learns the entire model by solving a single optimization problem with an efficient algorithm that can analyze human genome-wide data within a few hours. PerturbNet inference procedures extract a detailed description of how the gene network modulates the genetic effects on phenotypes. Using simulated and asthma data, we demonstrate that PerturbNet improves statistical power for detecting disease-linked SNPs and identifies gene networks and network modules mediating the SNP effects on traits, providing deeper insights into the underlying molecular mechanisms

    If Loud Aliens Explain Human Earliness, Quiet Aliens Are Also Rare

    Get PDF
    If life on Earth had to achieve n “hard steps“ to reach humanityʼs level, then the chance of this event rose as time to the nth power. Integrating this over habitable star formation and planet lifetime distributions predicts >99% of advanced life appears after today, unless n < 3 and max planet duration <50 Gyr. That is, we seem early. We offer this explanation: a deadline is set by loud aliens who are born according to a hard steps power law, expand at a common rate, change their volume appearances, and prevent advanced life like us from appearing in their volumes. Quiet aliens, in contrast, are much harder to see. We fit this three-parameter model of loud aliens to data: (1) birth power from the number of hard steps seen in Earth’s history, (2) birth constant by assuming a inform distribution over our rank among loud alien birth dates, and (3) expansion speed from our not seeing alien volumes in our sky. We estimate that loud alien civilizations now control 40%–50% of universe volume, each will later control ∼ 105 –3 × 107 galaxies, and we could meet them in ∼200 Myr–2 Gyr. If loud aliens arise from quiet ones, a depressingly low transition chance (<∼10−4 ) is required to expect that even one other quiet alien civilization has ever been active in our galaxy. Which seems to be bad news for the Search for Extraterrestrial Intelligence. But perhaps alien volume appearances are subtle, and their expansion speed lower, in which case we predict many long circular arcs to find in our sky

    Adaptive Block Floating-Point for Analog Deep Learning Hardware

    Full text link
    Analog mixed-signal (AMS) devices promise faster, more energy-efficient deep neural network (DNN) inference than their digital counterparts. However, recent studies show that DNNs on AMS devices with fixed-point numbers can incur an accuracy penalty because of precision loss. To mitigate this penalty, we present a novel AMS-compatible adaptive block floating-point (ABFP) number representation. We also introduce amplification (or gain) as a method for increasing the accuracy of the number representation without increasing the bit precision of the output. We evaluate the effectiveness of ABFP on the DNNs in the MLPerf datacenter inference benchmark -- realizing less than 1%1\% loss in accuracy compared to FLOAT32. We also propose a novel method of finetuning for AMS devices, Differential Noise Finetuning (DNF), which samples device noise to speed up finetuning compared to conventional Quantization-Aware Training.Comment: 13 pages including Appendix, 7 figures, under submission at IEEE Transactions on Neural Networks and Learning Systems (TNNLS

    Lactate metabolism: a new paradigm for the third millennium

    No full text
    For much of the 20th century, lactate was largely considered a dead-end waste product of glycolysis due to hypoxia, the primary cause of the O(2) debt following exercise, a major cause of muscle fatigue, and a key factor in acidosis-induced tissue damage. Since the 1970s, a ‘lactate revolution’ has occurred. At present, we are in the midst of a lactate shuttle era; the lactate paradigm has shifted. It now appears that increased lactate production and concentration as a result of anoxia or dysoxia are often the exception rather than the rule. Lactic acidosis is being re-evaluated as a factor in muscle fatigue. Lactate is an important intermediate in the process of wound repair and regeneration. The origin of elevated [lactate] in injury and sepsis is being re-investigated. There is essentially unanimous experimental support for a cell-to-cell lactate shuttle, along with mounting evidence for astrocyte–neuron, lactate–alanine, peroxisomal and spermatogenic lactate shuttles. The bulk of the evidence suggests that lactate is an important intermediary in numerous metabolic processes, a particularly mobile fuel for aerobic metabolism, and perhaps a mediator of redox state among various compartments both within and between cells. Lactate can no longer be considered the usual suspect for metabolic ‘crimes’, but is instead a central player in cellular, regional and whole body metabolism. Overall, the cell-to-cell lactate shuttle has expanded far beyond its initial conception as an explanation for lactate metabolism during muscle contractions and exercise to now subsume all of the other shuttles as a grand description of the role(s) of lactate in numerous metabolic processes and pathways
    corecore