1,094 research outputs found

    Redundancy and Concept Analysis for Code-trained Language Models

    Full text link
    Code-trained language models have proven to be highly effective for various code intelligence tasks. However, they can be challenging to train and deploy for many software engineering applications due to computational bottlenecks and memory constraints. Implementing effective strategies to address these issues requires a better understanding of these 'black box' models. In this paper, we perform the first neuron-level analysis for source code models to identify \textit{important} neurons within latent representations. We achieve this by eliminating neurons that are highly similar or irrelevant to the given task. This approach helps us understand which neurons and layers can be eliminated (redundancy analysis) and where important code properties are located within the network (concept analysis). Using redundancy analysis, we make observations relevant to knowledge transfer and model optimization applications. We find that over 95\% of the neurons are redundant with respect to our code intelligence tasks and can be eliminated without significant loss in accuracy. We also discover several subsets of neurons that can make predictions with baseline accuracy. Through concept analysis, we explore the traceability and distribution of human-recognizable concepts within latent code representations which could be used to influence model predictions. We trace individual and subsets of important neurons to specific code properties and identify 'number' neurons, 'string' neurons, and higher-level 'text' neurons for token-level tasks and higher-level concepts important for sentence-level downstream tasks. This also helps us understand how decomposable and transferable task-related features are and can help devise better techniques for transfer learning, model compression, and the decomposition of deep neural networks into modules.Comment: 4 figures, 6 table

    Network compression via network memory: fundamental performance limits

    Get PDF
    The amount of information that is churned out daily around the world is staggering, and hence, future technological advancements are contingent upon development of scalable acquisition, inference, and communication mechanisms for this massive data. This Ph.D. dissertation draws upon mathematical tools from information theory and statistics to understand the fundamental performance limits of universal compression of this massive data at the packet level using universal compression just above layer 3 of the network when the intermediate network nodes are enabled with the capability of memorizing the previous traffic. Universality of compression imposes an inevitable redundancy (overhead) to the compression performance of universal codes, which is due to the learning of the unknown source statistics. In this work, the previous asymptotic results about the redundancy of universal compression are generalized to consider the performance of universal compression at the finite-length regime (that is applicable to small network packets). Further, network compression via memory is proposed as a compression-based solution for the compression of relatively small network packets whenever the network nodes (i.e., the encoder and the decoder) are equipped with memory and have access to massive amounts of previous communication. In a nutshell, network compression via memory learns the patterns and statistics of the payloads of the packets and uses it for compression and reduction of the traffic. Network compression via memory, with the cost of increasing the computational overhead in the network nodes, significantly reduces the transmission cost in the network. This leads to huge performance improvement as the cost of transmitting one bit is by far greater than the cost of processing it.Ph.D

    On optimality of data clustering for packet-level memory-assisted compression of network traffic

    Get PDF
    Abstract-Recently, we proposed a framework called memory-assisted compression that learns the statistical proper ties of the sequence-generating server at intermediate network nodes and then leverages the learnt models to overcome the inevitable redundancy (overhead) in the universal compression of the payloads of the short-length network packets. In this paper, we prove that when the content-generating server is comprised of a mixture of parametric sources, label-based clustering of the data to their original sequence-generating models from the mixture is optimal almost surely as it achieves the mixture entropy (which is the lower bound on the average codeword length). Motivated by this result, we present a K-means clustering technique as the proof of concept to demonstrate the benefits of memory-assisted compression performance. Simulation results confirm the effectiveness of the proposed approach by matching the expected improvements predicted by theory on man-made mixture sources. Finally, the benefits of the cluster-based memory-assisted compression are validated on real data traflic traces demonstrating more than 50% traffic reduction on average in data gathered from wireless users

    Evaluating and Interpreting Deep Convolutional Neural Networks via Non-negative Matrix Factorization

    Get PDF
    With ever greater computational resources and more accessible software, deep neural networks have become ubiquitous across industry and academia. Their remarkable ability to generalize to new samples defies the conventional view, which holds that complex, over-parameterized networks would be prone to overfitting. This apparent discrepancy is exacerbated by our inability to inspect and interpret the high-dimensional, non-linear, latent representations they learn, which has led many to refer to neural networks as ``black-boxes''. The Law of Parsimony states that ``simpler solutions are more likely to be correct than complex ones''. Since they perform quite well in practice, a natural question to ask, then, is in what way are neural networks simple? We propose that compression is the answer. Since good generalization requires invariance to irrelevant variations in the input, it is necessary for a network to discard this irrelevant information. As a result, semantically similar samples are mapped to similar representations in neural network deep feature space, where they form simple, low-dimensional structures. Conversely, a network that overfits relies on memorizing individual samples. Such a network cannot discard information as easily. In this thesis we characterize the difference between such networks using the non-negative rank of activation matrices. Relying on the non-negativity of rectified-linear units, the non-negative rank is the smallest number that admits an exact non-negative matrix factorization. We derive an upper bound on the amount of memorization in terms of the non-negative rank, and show it is a natural complexity measure for rectified-linear units. With a focus on deep convolutional neural networks trained to perform object recognition, we show that the two non-negative factors derived from deep network layers decompose the information held therein in an interpretable way. The first of these factors provides heatmaps which highlight similarly encoded regions within an input image or image set. We find that these networks learn to detect semantic parts and form a hierarchy, such that parts are further broken down into sub-parts. We quantitatively evaluate the semantic quality of these heatmaps by using them to perform semantic co-segmentation and co-localization. In spite of the convolutional network we use being trained solely with image-level labels, we achieve results comparable or better than domain-specific state-of-the-art methods for these tasks. The second non-negative factor provides a bag-of-concepts representation for an image or image set. We use this representation to derive global image descriptors for images in a large collection. With these descriptors in hand, we perform two variations content-based image retrieval, i.e. reverse image search. Using information from one of the non-negative matrix factors we obtain descriptors which are suitable for finding semantically related images, i.e., belonging to the same semantic category as the query image. Combining information from both non-negative factors, however, yields descriptors that are suitable for finding other images of the specific instance depicted in the query image, where we again achieve state-of-the-art performance

    Modeling, Predicting and Capturing Human Mobility

    Get PDF
    Realistic models of human mobility are critical for modern day applications, specifically for recommendation systems, resource planning and process optimization domains. Given the rapid proliferation of mobile devices equipped with Internet connectivity and GPS functionality today, aggregating large sums of individual geolocation data is feasible. The thesis focuses on methodologies to facilitate data-driven mobility modeling by drawing parallels between the inherent nature of mobility trajectories, statistical physics and information theory. On the applied side, the thesis contributions lie in leveraging the formulated mobility models to construct prediction workflows by adopting a privacy-by-design perspective. This enables end users to derive utility from location-based services while preserving their location privacy. Finally, the thesis presents several approaches to generate large-scale synthetic mobility datasets by applying machine learning approaches to facilitate experimental reproducibility

    Deep Learning based Recommender System: A Survey and New Perspectives

    Full text link
    With the ever-growing volume of online information, recommender systems have been an effective strategy to overcome such information overload. The utility of recommender systems cannot be overstated, given its widespread adoption in many web applications, along with its potential impact to ameliorate many problems related to over-choice. In recent years, deep learning has garnered considerable interest in many research fields such as computer vision and natural language processing, owing not only to stellar performance but also the attractive property of learning feature representations from scratch. The influence of deep learning is also pervasive, recently demonstrating its effectiveness when applied to information retrieval and recommender systems research. Evidently, the field of deep learning in recommender system is flourishing. This article aims to provide a comprehensive review of recent research efforts on deep learning based recommender systems. More concretely, we provide and devise a taxonomy of deep learning based recommendation models, along with providing a comprehensive summary of the state-of-the-art. Finally, we expand on current trends and provide new perspectives pertaining to this new exciting development of the field.Comment: The paper has been accepted by ACM Computing Surveys. https://doi.acm.org/10.1145/328502

    Transpose Attack: Stealing Datasets with Bidirectional Training

    Full text link
    Deep neural networks are normally executed in the forward direction. However, in this work, we identify a vulnerability that enables models to be trained in both directions and on different tasks. Adversaries can exploit this capability to hide rogue models within seemingly legitimate models. In addition, in this work we show that neural networks can be taught to systematically memorize and retrieve specific samples from datasets. Together, these findings expose a novel method in which adversaries can exfiltrate datasets from protected learning environments under the guise of legitimate models. We focus on the data exfiltration attack and show that modern architectures can be used to secretly exfiltrate tens of thousands of samples with high fidelity, high enough to compromise data privacy and even train new models. Moreover, to mitigate this threat we propose a novel approach for detecting infected models.Comment: NDSS24 pape
    corecore