29 research outputs found

    Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training

    Full text link
    Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely adopted training strategies basically just define the decay of the learning rate over time. This process can be interpreted as decreasing a temperature, using either a global learning rate (for the entire model) or a learning rate that varies for each parameter. This paper proposes TempBalance, a straightforward yet effective layer-wise learning rate method. TempBalance is based on Heavy-Tailed Self-Regularization (HT-SR) Theory, an approach which characterizes the implicit self-regularization of different layers in trained models. We demonstrate the efficacy of using HT-SR-motivated metrics to guide the scheduling and balancing of temperature across all network layers during model training, resulting in improved performance during testing. We implement TempBalance on CIFAR10, CIFAR100, SVHN, and TinyImageNet datasets using ResNets, VGGs, and WideResNets with various depths and widths. Our results show that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization. We also show that TempBalance outperforms a number of state-of-the-art optimizers and learning rate schedulers.Comment: NeurIPS 2023 Spotlight, first two authors contributed equall

    MLGOPerf: An ML Guided Inliner to Optimize Performance

    Full text link
    For the past 25 years, we have witnessed an extensive application of Machine Learning to the Compiler space; the selection and the phase-ordering problem. However, limited works have been upstreamed into the state-of-the-art compilers, i.e., LLVM, to seamlessly integrate the former into the optimization pipeline of a compiler to be readily deployed by the user. MLGO was among the first of such projects and it only strives to reduce the code size of a binary with an ML-based Inliner using Reinforcement Learning. This paper presents MLGOPerf; the first end-to-end framework capable of optimizing performance using LLVM's ML-Inliner. It employs a secondary ML model to generate rewards used for training a retargeted Reinforcement learning agent, previously used as the primary model by MLGO. It does so by predicting the post-inlining speedup of a function under analysis and it enables a fast training framework for the primary model which otherwise wouldn't be practical. The experimental results show MLGOPerf is able to gain up to 1.8% and 2.2% with respect to LLVM's optimization at O3 when trained for performance on SPEC CPU2006 and Cbench benchmarks, respectively. Furthermore, the proposed approach provides up to 26% increased opportunities to autotune code regions for our benchmarks which can be translated into an additional 3.7% speedup value.Comment: Version 2: Added the missing Table 6. The short version of this work is accepted at ACM/IEEE CASES 202

    Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data

    Full text link
    The search for effective and robust metrics has been the focus of recent theoretical and empirical work on generalization of deep neural networks (NNs). In this paper, we discuss the performance of natural language processing (NLP) models, and we evaluate various existing and novel generalization metrics. Compared to prior studies, we (i) focus on NLP instead of computer vision (CV), (ii) focus on generalization metrics that predict test error instead of the generalization gap, (iii) focus on generalization metrics that do not need the access to data, and (iv) focus on the heavy-tail (HT) phenomenon that has received comparatively less attention in the study of NNs. We extend recent HT-based work which focuses on power law (PL) distributions, and we study exponential and exponentially truncated power law (E-TPL) fitting to the empirical spectral densities (ESDs) of weight matrices. Our empirical studies are carried on (i) hundreds of Transformers trained in different settings, in which we systematically vary different hyperparameters, (ii) a total of 51 pretrained Transformers from eight families of Huggingface NLP models, including BERT, GPT2, etc., and (iii) a total of 28 existing and novel generalization metrics. From our empirical analyses, we show that shape metrics, or the metrics obtained from fitting the shape of the ESDs, perform uniformly better at predicting generalization performance than scale metrics commonly studied in the literature, as measured by the rank correlations with the generalization performance. We also show that among the three HT distributions considered in our paper, the E-TPL fitting of ESDs performs the most robustly when the models are trained in experimental settings, while the PL fitting achieves the best performance on well-trained Huggingface models, and that both E-TPL and PL metrics (which are both shape metrics) outperform scale metrics

    ACPO: AI-Enabled Compiler-Driven Program Optimization

    Full text link
    The key to performance optimization of a program is to decide correctly when a certain transformation should be applied by a compiler. This is an ideal opportunity to apply machine-learning models to speed up the tuning process; while this realization has been around since the late 90s, only recent advancements in ML enabled a practical application of ML to compilers as an end-to-end framework. This paper presents ACPO: \textbf{\underline{A}}I-Enabled \textbf{\underline{C}}ompiler-driven \textbf{\underline{P}}rogram \textbf{\underline{O}}ptimization; a novel framework to provide LLVM with simple and comprehensive tools to benefit from employing ML models for different optimization passes. We first showcase the high-level view, class hierarchy, and functionalities of ACPO and subsequently, demonstrate a couple of use cases of ACPO by ML-enabling the Loop Unroll and Function Inlining passes and describe how ACPO can be leveraged to optimize other passes. Experimental results reveal that ACPO model for Loop Unroll is able to gain on average 4\% compared to LLVM's O3 optimization when deployed on Polybench. Furthermore, by adding the Inliner model as well, ACPO is able to provide up to 4.5\% and 2.4\% on Polybench and Cbench compared with LLVM's O3 optimization, respectively.Comment: Preprint version of ACPO (12 pages

    Clonal expansion and epigenetic reprogramming following deletion or amplification of mutant

    Get PDF
    IDH1 mutation is the earliest genetic alteration in low-grade gliomas (LGGs), but its role in tumor recurrence is unclear. Mutant IDH1 drives overproduction of the oncometabolite d-2-hydroxyglutarate (2HG) and a CpG island (CGI) hypermethylation phenotype (G-CIMP). To investigate the role of mutant IDH1 at recurrence, we performed a longitudinal analysis of 50 IDH1 mutant LGGs. We discovered six cases with copy number alterations (CNAs) at the IDH1 locus at recurrence. Deletion or amplification of IDH1 was followed by clonal expansion and recurrence at a higher grade. Successful cultures derived from IDH1 mutant, but not IDH1 wild type, gliomas systematically deleted IDH1 in vitro and in vivo, further suggestive of selection against the heterozygous mutant state as tumors progress. Tumors and cultures with IDH1 CNA had decreased 2HG, maintenance of G-CIMP, and DNA methylation reprogramming outside CGI. Thus, while IDH1 mutation initiates gliomagenesis, in some patients mutant IDH1 and 2HG are not required for later clonal expansions

    Anti-HIV-1 Activity of a New Scorpion Venom Peptide Derivative Kn2-7

    Get PDF
    For over 30 years, HIV/AIDS has wreaked havoc in the world. In the absence of an effective vaccine for HIV, development of new anti-HIV agents is urgently needed. We previously identified the antiviral activities of the scorpion-venom-peptide-derived mucroporin-M1 for three RNA viruses (measles viruses, SARS-CoV, and H5N1). In this investigation, a panel of scorpion venom peptides and their derivatives were designed and chosen for assessment of their anti-HIV activities. A new scorpion venom peptide derivative Kn2-7 was identified as the most potent anti-HIV-1 peptide by screening assays with an EC50 value of 2.76 µg/ml (1.65 µM) and showed low cytotoxicity to host cells with a selective index (SI) of 13.93. Kn2-7 could inhibit all members of a standard reference panel of HIV-1 subtype B pseudotyped virus (PV) with CCR5-tropic and CXCR4-tropic NL4-3 PV strain. Furthermore, it also inhibited a CXCR4-tropic replication-competent strain of HIV-1 subtype B virus. Binding assay of Kn2-7 to HIV-1 PV by Octet Red system suggested the anti-HIV-1 activity was correlated with a direct interaction between Kn2-7 and HIV-1 envelope. These results demonstrated that peptide Kn2-7 could inhibit HIV-1 by direct interaction with viral particle and may become a promising candidate compound for further development of microbicide against HIV-1

    Antiinflammatory Therapy with Canakinumab for Atherosclerotic Disease

    Get PDF
    Background: Experimental and clinical data suggest that reducing inflammation without affecting lipid levels may reduce the risk of cardiovascular disease. Yet, the inflammatory hypothesis of atherothrombosis has remained unproved. Methods: We conducted a randomized, double-blind trial of canakinumab, a therapeutic monoclonal antibody targeting interleukin-1β, involving 10,061 patients with previous myocardial infarction and a high-sensitivity C-reactive protein level of 2 mg or more per liter. The trial compared three doses of canakinumab (50 mg, 150 mg, and 300 mg, administered subcutaneously every 3 months) with placebo. The primary efficacy end point was nonfatal myocardial infarction, nonfatal stroke, or cardiovascular death. RESULTS: At 48 months, the median reduction from baseline in the high-sensitivity C-reactive protein level was 26 percentage points greater in the group that received the 50-mg dose of canakinumab, 37 percentage points greater in the 150-mg group, and 41 percentage points greater in the 300-mg group than in the placebo group. Canakinumab did not reduce lipid levels from baseline. At a median follow-up of 3.7 years, the incidence rate for the primary end point was 4.50 events per 100 person-years in the placebo group, 4.11 events per 100 person-years in the 50-mg group, 3.86 events per 100 person-years in the 150-mg group, and 3.90 events per 100 person-years in the 300-mg group. The hazard ratios as compared with placebo were as follows: in the 50-mg group, 0.93 (95% confidence interval [CI], 0.80 to 1.07; P = 0.30); in the 150-mg group, 0.85 (95% CI, 0.74 to 0.98; P = 0.021); and in the 300-mg group, 0.86 (95% CI, 0.75 to 0.99; P = 0.031). The 150-mg dose, but not the other doses, met the prespecified multiplicity-adjusted threshold for statistical significance for the primary end point and the secondary end point that additionally included hospitalization for unstable angina that led to urgent revascularization (hazard ratio vs. placebo, 0.83; 95% CI, 0.73 to 0.95; P = 0.005). Canakinumab was associated with a higher incidence of fatal infection than was placebo. There was no significant difference in all-cause mortality (hazard ratio for all canakinumab doses vs. placebo, 0.94; 95% CI, 0.83 to 1.06; P = 0.31). Conclusions: Antiinflammatory therapy targeting the interleukin-1β innate immunity pathway with canakinumab at a dose of 150 mg every 3 months led to a significantly lower rate of recurrent cardiovascular events than placebo, independent of lipid-level lowering. (Funded by Novartis; CANTOS ClinicalTrials.gov number, NCT01327846.

    Efficient FIB caching using minimal non-overlapping prefixes

    No full text

    Comparative genomic and genetic analysis of glioblastoma-derived brain tumor-initiating cells and their parent tumors

    No full text
    Background: Glioblastoma (GBM) is a fatal cancer that has eluded major therapeutic advances. Failure to make progress may reflect the absence of a human GBM model that could be used to test compounds for anti-GBM activity. In this respect, the development of brain tumor-initiating cell (BTIC) cultures is a step forward because BTICs appear to capture the molecular diversity of GBM better than traditional glioma cell lines. Here, we perform a comparative genomic and genetic analysis of BTICs and their parent tumors as preliminary evaluation of the BTIC model. Methods: We assessed single nucleotide polymorphisms (SNPs), genome-wide copy number variations (CNVs), gene expression patterns, and molecular subtypes of 11 established BTIC lines and matched parent tumors. Results: Although CNV differences were noted, BTICs retained the major genomic alterations characteristic of GBM. SNP patterns were similar between BTICs and tumors. Importantly, recurring SNP or CNV alterations specific to BTICs were not seen. Comparative gene expression analysis and molecular subtyping revealed differences between BTICs and GBMs. These differences formed the basis of a 63-gene expression signature that distinguished cells from tumors; differentially expressed genes primarily involved metabolic processes. We also derived a set of 73 similarly expressed genes; these genes were not associated with specific biological functions. Conclusions: Although not identical, established BTIC lines preserve the core molecular alterations seen in their parent tumors, as well as the genomic hallmarks of GBM, without acquiring recurring BTIC-specific changes
    corecore