410 research outputs found

    Delving into Variance Transmission and Normalization: Shift of Average Gradient Makes the Network Collapse

    Full text link
    Normalization operations are essential for state-of-the-art neural networks and enable us to train a network from scratch with a large learning rate (LR). We attempt to explain the real effect of Batch Normalization (BN) from the perspective of variance transmission by investigating the relationship between BN and Weights Normalization (WN). In this work, we demonstrate that the problem of the shift of the average gradient will amplify the variance of every convolutional (conv) layer. We propose Parametric Weights Standardization (PWS), a fast and robust to mini-batch size module used for conv filters, to solve the shift of the average gradient. PWS can provide the speed-up of BN. Besides, it has less computation and does not change the output of a conv layer. PWS enables the network to converge fast without normalizing the outputs. This result enhances the persuasiveness of the shift of the average gradient and explains why BN works from the perspective of variance transmission. The code and appendix will be made available on https://github.com/lyxzzz/PWSConv.Comment: This paper has been accepted by AAAI2

    Happiness Inequality in China

    Get PDF
    Along with China becoming an upper-middle-income country from a lower-middle-income one after 2009, the happiness inequality in China has been enlarged. Based on the Chinese General Social Survey (CGSS) database (2003-2012), this paper investigates the determinants of the happiness inequality in China and explores what factors contribute to its enlargement after 2009. We find that a rise of income inequality as well as the population share of middle age cohorts can widen China’s happiness inequality, while an increase in income or education level has a reducing impact. Owning a house and being in employment also have happiness inequality reducing impacts. A decomposition analysis shows that the deterioration of China’s happiness inequality is mainly caused by coefficient effects, i.e., the relationships between happiness inequality and its influencing factors have changed, which reflects the dramatic change in the Chinese economy and society. Among the coefficient effects, regional heterogeneity plays an important role. Policies enhancing economic performance and education as well as reducing income inequality and regional inequality can help to reduce happiness inequality and improve social harmony in China

    Happiness Inequality in China

    Get PDF
    Along with China becoming an upper-middle-income country from a lower-middle-income one after 2009, the happiness inequality in China has been enlarged. Based on the Chinese General Social Survey (CGSS) database (2003-2012), this paper investigates the determinants of the happiness inequality in China and explores what factors contribute to its enlargement after 2009. We find that a rise of income inequality as well as the population share of middle age cohorts can widen China’s happiness inequality, while an increase in income or education level has a reducing impact. Owning a house and being in employment also have happiness inequality reducing impacts. A decomposition analysis shows that the deterioration of China’s happiness inequality is mainly caused by coefficient effects, i.e., the relationships between happiness inequality and its influencing factors have changed, which reflects the dramatic change in the Chinese economy and society. Among the coefficient effects, regional heterogeneity plays an important role. Policies enhancing economic performance and education as well as reducing income inequality and regional inequality can help to reduce happiness inequality and improve social harmony in China

    ScalAna: Automating Scaling Loss Detection with Graph Analysis

    Full text link
    Scaling a parallel program to modern supercomputers is challenging due to inter-process communication, Amdahl's law, and resource contention. Performance analysis tools for finding such scaling bottlenecks either base on profiling or tracing. Profiling incurs low overheads but does not capture detailed dependencies needed for root-cause analysis. Tracing collects all information at prohibitive overheads. In this work, we design ScalAna that uses static analysis techniques to achieve the best of both worlds - it enables the analyzability of traces at a cost similar to profiling. ScalAna first leverages static compiler techniques to build a Program Structure Graph, which records the main computation and communication patterns as well as the program's control structures. At runtime, we adopt lightweight techniques to collect performance data according to the graph structure and generate a Program Performance Graph. With this graph, we propose a novel approach, called backtracking root cause detection, which can automatically and efficiently detect the root cause of scaling loss. We evaluate ScalAna with real applications. Results show that our approach can effectively locate the root cause of scaling loss for real applications and incurs 1.73% overhead on average for up to 2,048 processes. We achieve up to 11.11% performance improvement by fixing the root causes detected by ScalAna on 2,048 processes.Comment: conferenc

    A Filtering Algorithm for Maneuvering Target Tracking Based on Smoothing Spline Fitting

    Get PDF
    Maneuvering target tracking is a challenge. Target's sudden speed or direction changing would make the common filtering tracker divergence. To improve the accuracy of maneuvering target tracking, we propose a tracking algorithm based on spline fitting. Curve fitting, based on historical point trace, reflects the mobility information. The innovation of this paper is assuming that there is no dynamic motion model, and prediction is only based on the curve fitting over the measured data. Monte Carlo simulation results show that, when sea targets are maneuvering, the proposed algorithm has better accuracy than the conventional Kalman filter algorithm and the interactive multiple model filtering algorithm, maintaining simple structure and small amount of storage

    Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases

    Full text link
    Large Language Models (LLMs) have demonstrated remarkable performance in code completion. However, due to the lack of domain-specific knowledge, they may not be optimal in completing code that requires intensive domain knowledge for example completing the library names. Although there are several works that have confirmed the effectiveness of fine-tuning techniques to adapt language models for code completion in specific domains. They are limited by the need for constant fine-tuning of the model when the project is in constant iteration. To address this limitation, in this paper, we propose kkNM-LM, a retrieval-augmented language model (R-LM), that integrates domain knowledge into language models without fine-tuning. Different from previous techniques, our approach is able to automatically adapt to different language models and domains. Specifically, it utilizes the in-domain code to build the retrieval-based database decoupled from LM, and then combines it with LM through Bayesian inference to complete the code. The extensive experiments on the completion of intra-project and intra-scenario have confirmed that kkNM-LM brings about appreciable enhancements when compared to CodeGPT and UnixCoder. A deep analysis of our tool including the responding speed, storage usage, specific type code completion, and API invocation completion has confirmed that kkNM-LM provides satisfactory performance, which renders it highly appropriate for domain adaptive code completion. Furthermore, our approach operates without the requirement for direct access to the language model's parameters. As a result, it can seamlessly integrate with black-box code completion models, making it easy to integrate our approach as a plugin to further enhance the performance of these models.Comment: Accepted by ASE202

    NEDD8 Modification of CUL1 Dissociates p120CAND1, an Inhibitor of CUL1-SKP1 Binding and SCF Ligases

    Get PDF
    Cullin proteins assemble a large number of RING E3 ubiquitin ligases and regulate various physiological processes. Covalent modification of cullins by the ubiquitin-like protein NEDD8 activates cullin ligases through an as yet undefined mechanism. We show here that p120(CAND1) selectively binds to unneddylated CUL1 and is dissociated by CUL1 neddylation. CAND1 formed a ternary complex with CUL1 and ROC1. CAND1 dissociated SKP1 from CUL1 and inhibited SCF ligase activity in vitro. Suppression of CAND1 in vivo increased the level of the CUL1-SKP1 complex. We suggest that by restricting SKP1-CUL1 interaction, CAND1 regulated the assembly of productive SCF ubiquitin ligases, allowing a common CUL1-ROC core to be utilized by a large number of SKP1-F box-substrate subcomplexes

    Learning the Relation between Similarity Loss and Clustering Loss in Self-Supervised Learning

    Full text link
    Self-supervised learning enables networks to learn discriminative features from massive data itself. Most state-of-the-art methods maximize the similarity between two augmentations of one image based on contrastive learning. By utilizing the consistency of two augmentations, the burden of manual annotations can be freed. Contrastive learning exploits instance-level information to learn robust features. However, the learned information is probably confined to different views of the same instance. In this paper, we attempt to leverage the similarity between two distinct images to boost representation in self-supervised learning. In contrast to instance-level information, the similarity between two distinct images may provide more useful information. Besides, we analyze the relation between similarity loss and feature-level cross-entropy loss. These two losses are essential for most deep learning methods. However, the relation between these two losses is not clear. Similarity loss helps obtain instance-level representation, while feature-level cross-entropy loss helps mine the similarity between two distinct images. We provide theoretical analyses and experiments to show that a suitable combination of these two losses can get state-of-the-art results. Code is available at https://github.com/guijiejie/ICCL.Comment: This paper is accepted by IEEE Transactions on Image Processin
    corecore