355 research outputs found

    Reconstructive Neuron Pruning for Backdoor Defense

    Full text link
    Deep neural networks (DNNs) have been found to be vulnerable to backdoor attacks, raising security concerns about their deployment in mission-critical applications. While existing defense methods have demonstrated promising results, it is still not clear how to effectively remove backdoor-associated neurons in backdoored DNNs. In this paper, we propose a novel defense called \emph{Reconstructive Neuron Pruning} (RNP) to expose and prune backdoor neurons via an unlearning and then recovering process. Specifically, RNP first unlearns the neurons by maximizing the model's error on a small subset of clean samples and then recovers the neurons by minimizing the model's error on the same data. In RNP, unlearning is operated at the neuron level while recovering is operated at the filter level, forming an asymmetric reconstructive learning procedure. We show that such an asymmetric process on only a few clean samples can effectively expose and prune the backdoor neurons implanted by a wide range of attacks, achieving a new state-of-the-art defense performance. Moreover, the unlearned model at the intermediate step of our RNP can be directly used to improve other backdoor defense tasks including backdoor removal, trigger recovery, backdoor label detection, and backdoor sample detection. Code is available at \url{https://github.com/bboylyg/RNP}.Comment: Accepted by ICML2

    Optimization landscape of deep neural networks

    Get PDF
    It has been empirically observed in deep learning that the training problem of deep over-parameterized neural networks does not seem to have a big problem with suboptimal local minima despite all hardness results proven in the literature. In many cases, local search algorithms such as (stochastic) gradient descent frequently converge to a globally optimal solution. In an attempt to better understand this phenomenon, this thesis studies sufficient conditions on the network architecture so that the landscape of the associated loss function is guaranteed to be well-behaved, which could be favorable to local search algorithms. Our analysis touches upon fundamental aspects of the problem such as existence of solutions with zero training error, global optimality of critical points, topology of level sets and sublevel sets of the loss. Gaining insight from this analysis, we come up with a new class of network architectures that are practically relevant and have a strong theoretical guarantee on the loss surface. We empirically investigate the generalization ability of these networks and other related phenomena observed in deep learning such as implicit bias of stochastic gradient descent. Finally, we study limitations of deep and narrow neural networks in learning connected decision regions, and draw connections to adversarial manipulation problems. Our results and analysis presented in this thesis suggest that having a sufficiently wide layer in the architecture is not only helpful to make the loss surface become well-behaved but also important to the expressive power of neural networks.Es wurde empirisch beobachtet, dass beim Trainieren von überparametrisierten tiefen, neuronalen Netzen keine Probleme mit lokalen Minima auftreten, trotz den Schwerheits-Resultaten in der Literatur. In vielen Fällen konvergieren lokale Suchalgorithmen wie (stochastischer) Gradientenabstieg oft zu einer global optimalen Lösung. In einem Versuch dieses Phänomen besser zu verstehen, diskutiert diese Arbeit hinreichende Bedingungen an die Netzwerkarchitektur, so dass die Funktionslandschaft der assozierten Verlustfunktion sich garantiert gut verhält, was günstig für lokale Suchalgorithmen ist. Unsere Analyse bezieht sich auf grundlegende Aspekte des Problems wie z.B. Existenz von Lösungen mit null Trainingsfehlern, globale Optimalität der kritischen Punkte und Topologie der Niveau- und Unterniveau-Mengen der Verlustfunktion. Aus den in dieser Analyse gewonnenen Erkenntnisse entwickeln wir eine neue Klasse von Netzwerkarchitekturen, die praxisrelevant sind und die starke theoretische Garantien über die Oberfläche der Verlustfunktion erlauben. Wir untersuchen empirisch die Generalisierungsfähigkeit dieser Netzwerke und anderer verwandter Phänomene, die beim tiefen Lernen beobachtet wurden, wie z.B. der implizite Bias des stochastischen Gradientenabstiegs. Weiter diskutieren wir Einschränkungen tiefer und schmaler neuronaler Netze beim Lernen von miteinander verbundenen Entscheidungsregionen und stellen eine Verbindung zum Problem der bösartigen Manipulation her. Unsere Ergebnisse und Analysen, die in dieser Arbeit vorgestellt werden, legen nahe, dass eine ausreichend breite Schicht in der Architektur nicht nur hilfreich ist, damit die Verlustoberfläche wohlbehalten ist, aber auch wichtig ist für die Ausdrucksstärke von neuronalen Netzen

    Data-driven deep-learning methods for the accelerated simulation of Eulerian fluid dynamics

    Get PDF
    Deep-learning (DL) methods for the fast inference of the temporal evolution of fluid-dynamics systems, based on the previous recognition of features underlying large sets of fluid-dynamics data, have been studied. Specifically, models based on convolution neural networks (CNNs) and graph neural networks (GNNs) were proposed and discussed. A U-Net, a popular fully-convolutional architecture, was trained to infer wave dynamics on liquid surfaces surrounded by walls, given as input the system state at previous time-points. A term for penalising the error of the spatial derivatives was added to the loss function, which resulted in a suppression of spurious oscillations and a more accurate location and length of the predicted wavefronts. This model proved to accurately generalise to complex wall geometries not seen during training. As opposed to the image data-structures processed by CNNs, graphs offer higher freedom on how data is organised and processed. This motivated the use of graphs to represent the state of fluid-dynamic systems discretised by unstructured sets of nodes, and GNNs to process such graphs. Graphs have enabled more accurate representations of curvilinear geometries and higher resolution placement exclusively in areas where physics is more challenging to resolve. Two novel GNN architectures were designed for fluid-dynamics inference: the MuS-GNN, a multi-scale GNN, and the REMuS-GNN, a rotation-equivariant multi-scale GNN. Both architectures work by repeatedly passing messages from each node to its nearest nodes in the graph. Additionally, lower-resolutions graphs, with a reduced number of nodes, are defined from the original graph, and messages are also passed from finer to coarser graphs and vice-versa. The low-resolution graphs allowed for efficiently capturing physics encompassing a range of lengthscales. Advection and fluid flow, modelled by the incompressible Navier-Stokes equations, were the two types of problems used to assess the proposed GNNs. Whereas a single-scale GNN was sufficient to achieve high generalisation accuracy in advection simulations, flow simulation highly benefited from an increasing number of low-resolution graphs. The generalisation and long-term accuracy of these simulations were further improved by the REMuS-GNN architecture, which processes the system state independently of the orientation of the coordinate system thanks to a rotation-invariant representation and carefully designed components. To the best of the author’s knowledge, the REMuS-GNN architecture was the first rotation-equivariant and multi-scale GNN. The simulations were accelerated between one (in a CPU) and three (in a GPU) orders of magnitude with respect to a CPU-based numerical solver. Additionally, the parallelisation of multi-scale GNNs resulted in a close-to-linear speedup with the number of CPU cores or GPUs.Open Acces

    Network Specialization to Explain the Performance of Sparse Neural Networks

    Get PDF
    Recently it has been shown that sparse neural networks perform better than dense networks with similar number of parameters. In addition, large overparameterized networks have been shown to contain sparse networks which, while trained in isolation, reach or exceed the performance of the large model. However, the methods to explain the success of sparse networks are still lacking. In this work I study the performance of sparse networks using network’s activation regions and patterns, concepts from the neural network expressivity literature. I define network specialization, a novel concept that considers how distinctly a feed forward neural network (FFNN) has learned to processes high level features in the data. I propose Minimal Blanket Hypervolume (MBH) algorithm to measure the specialization of a FFNN. It finds parts of the input space that the network associates with some user-defined high level feature, and compares their hypervolume to the hypervolume of the input space. My hypothesis is that sparse networks specialize more to high level features than dense networks with the same number of hidden network parameters. Network specialization and MBH also contribute to the interpretability of deep neural networks (DNNs). The capability to learn representations on several levels of abstraction is at the core of deep learning, and MBH enables numerical evaluation of how specialized a FFNN is w.r.t. any abstract concept (a high level feature) that can be embodied in an input. MBH can be applied to FFNNs in any problem domain, e.g. visual object recognition, natural language processing, or speech recognition. It also enables comparison between FFNNs with different architectures, since the metric is calculated in the common input space. I test different pruning and initialization scenarios on the MNIST Digits and Fashion datasets. I find that sparse networks approximate more complex functions, exploit redundancy in the data, and specialize to high level features better than dense, fully parameterized networks with the same number of hidden network parameters

    Strategies for controlled electronic doping of colloidal quantum dots

    Get PDF
    Over the last several years tremendous progressed progress has been made in incorporating Colloidal Quantum Dots (CQDs) as photoactive components in optoelectronic devices. A significant part of that progress is associated with significant advancements made in achievingon controlled electronic doping of the CQDs and thus improving the electronic properties of CQDs solids. Today, a variety of strategies exists towards that purpose and this minireview aims at surveying major published works in this subject. Additional attention is given to the many challenges associated with the task of doping CQDs as well as to the optoelectronic functionalities and applications being realized when successfully achieving light and heavy electronic doping of CQDs.Peer ReviewedPostprint (author's final draft

    Data-driven solutions to enhance planning, operation and design tools in Industry 4.0 context

    Get PDF
    This thesis proposes three different data-driven solutions to be combined to state-of-the-art solvers and tools in order to primarily enhance their computational performances. The problem of efficiently designing the open sea floating platforms on which wind turbines can be mount on will be tackled, as well as the tuning of a data-driven engine's monitoring tool for maritime transportation. Finally, the activities of SAT and ASP solvers will be thoroughly studied and a deep learning architecture will be proposed to enhance the heuristics-based solving approach adopted by such software. The covered domains are different and the same is true for their respective targets. Nonetheless, the proposed Artificial Intelligence and Machine Learning algorithms are shared as well as the overall picture: promote Industrial AI and meet the constraints imposed by Industry 4.0 vision. The lesser presence of human-in-the-loop, a data-driven approach to discover causalities otherwise ignored, a special attention to the environmental impact of industries' emissions, a real and efficient exploitation of the Big Data available today are just a subset of the latter. Hence, from a broader perspective, the experiments carried out within this thesis are driven towards the aforementioned targets and the resulting outcomes are satisfactory enough to potentially convince the research community and industrialists that they are not just "visions" but they can be actually put into practice. However, it is still an introduction to the topic and the developed models are at what can be defined a "pilot" stage. Nonetheless, the results are promising and they pave the way towards further improvements and the consolidation of the dictates of Industry 4.0

    Nurturing the Next Generation of Discoverers, Creators and Thinkers

    Get PDF
    Over 154,000 babies are born in Illinois each year. By the end of my talk today, 10 children will be born. These children will graduate high school in 2036. Imagine the kind of world these children will inherit. How do we prepare them for this future world? Consider just three exponential technologies that are affecting us today and imagine where these technological improvements may be in the year 2036

    Deep Anomaly Detection in Text

    Full text link
    Deep anomaly detection methods have become increasingly popular in recent years, with methods like Stacked Autoencoders, Variational Autoencoders, and Generative Adversarial Networks greatly improving the state-of-the-art. Other methods rely on augmenting classical models (such as the One-Class Support Vector Machine), by learning an appropriate kernel function using Neural Networks. Recent developments in representation learning by self-supervision are proving to be very beneficial in the context of anomaly detection. Inspired by the advancements in anomaly detection using self-supervised learning in the field of computer vision, this thesis aims to develop a method for detecting anomalies by exploiting pretext tasks tailored for text corpora. This approach greatly improves the state-of-the-art on two datasets, 20Newsgroups, and AG News, for both semi-supervised and unsupervised anomaly detection, thus proving the potential for self-supervised anomaly detectors in the field of natural language processing.Comment: M.Sc. thesis, University of Bucharest, Faculty of Mathematics and Computer Sciences, 202
    corecore