822 research outputs found

    Photonic integrated reconfigurable linear processors as neural network accelerators

    Get PDF
    Reconfigurable linear optical processors can be used to perform linear transformations and are instrumental in effectively computing matrix–vector multiplications required in each neural network layer. In this paper, we characterize and compare two thermally tuned photonic integrated processors realized in silicon-on-insulator and silicon nitride platforms suited for extracting feature maps in convolutional neural networks. The reduction in bit resolution when crossing the processor is mainly due to optical losses, in the range 2.3–3.3 for the silicon-on-insulator chip and in the range 1.3–2.4 for the silicon nitride chip. However, the lower extinction ratio of Mach–Zehnder elements in the latter platform limits their expressivity (i.e., the capacity to implement any transformation) to 75%, compared to 97% of the former. Finally, the silicon-on-insulator processor outperforms the silicon nitride one in terms of footprint and energy efficiency

    Photonic Integrated Reconfigurable Linear Processors as Neural Network Accelerators

    Get PDF
    Reconfigurable linear optical processors can be used to perform linear transformations and are instrumental in effectively computing matrix-vector multiplications required in each neural network layer. In this paper, we characterize and compare two thermally tuned photonic integrated processors realized in silicon-on-insulator and silicon nitride platforms suited for extracting feature maps in convolutional neural networks. The reduction in bit resolution when crossing the processor is mainly due to optical losses, in the range 2.3-3.3 for the silicon-on-insulator chip and in the range 1.3-2.4 for the silicon nitride chip. However, the lower extinction ratio of Mach-Zehnder elements in the latter platform limits their expressivity (i.e., the capacity to implement any transformation) to 75%, compared to 97% of the former. Finally, the silicon-on-insulator processor outperforms the silicon nitride one in terms of footprint and energy efficiency

    Optimization landscape of deep neural networks

    Get PDF
    It has been empirically observed in deep learning that the training problem of deep over-parameterized neural networks does not seem to have a big problem with suboptimal local minima despite all hardness results proven in the literature. In many cases, local search algorithms such as (stochastic) gradient descent frequently converge to a globally optimal solution. In an attempt to better understand this phenomenon, this thesis studies sufficient conditions on the network architecture so that the landscape of the associated loss function is guaranteed to be well-behaved, which could be favorable to local search algorithms. Our analysis touches upon fundamental aspects of the problem such as existence of solutions with zero training error, global optimality of critical points, topology of level sets and sublevel sets of the loss. Gaining insight from this analysis, we come up with a new class of network architectures that are practically relevant and have a strong theoretical guarantee on the loss surface. We empirically investigate the generalization ability of these networks and other related phenomena observed in deep learning such as implicit bias of stochastic gradient descent. Finally, we study limitations of deep and narrow neural networks in learning connected decision regions, and draw connections to adversarial manipulation problems. Our results and analysis presented in this thesis suggest that having a sufficiently wide layer in the architecture is not only helpful to make the loss surface become well-behaved but also important to the expressive power of neural networks.Es wurde empirisch beobachtet, dass beim Trainieren von überparametrisierten tiefen, neuronalen Netzen keine Probleme mit lokalen Minima auftreten, trotz den Schwerheits-Resultaten in der Literatur. In vielen Fällen konvergieren lokale Suchalgorithmen wie (stochastischer) Gradientenabstieg oft zu einer global optimalen Lösung. In einem Versuch dieses Phänomen besser zu verstehen, diskutiert diese Arbeit hinreichende Bedingungen an die Netzwerkarchitektur, so dass die Funktionslandschaft der assozierten Verlustfunktion sich garantiert gut verhält, was günstig für lokale Suchalgorithmen ist. Unsere Analyse bezieht sich auf grundlegende Aspekte des Problems wie z.B. Existenz von Lösungen mit null Trainingsfehlern, globale Optimalität der kritischen Punkte und Topologie der Niveau- und Unterniveau-Mengen der Verlustfunktion. Aus den in dieser Analyse gewonnenen Erkenntnisse entwickeln wir eine neue Klasse von Netzwerkarchitekturen, die praxisrelevant sind und die starke theoretische Garantien über die Oberfläche der Verlustfunktion erlauben. Wir untersuchen empirisch die Generalisierungsfähigkeit dieser Netzwerke und anderer verwandter Phänomene, die beim tiefen Lernen beobachtet wurden, wie z.B. der implizite Bias des stochastischen Gradientenabstiegs. Weiter diskutieren wir Einschränkungen tiefer und schmaler neuronaler Netze beim Lernen von miteinander verbundenen Entscheidungsregionen und stellen eine Verbindung zum Problem der bösartigen Manipulation her. Unsere Ergebnisse und Analysen, die in dieser Arbeit vorgestellt werden, legen nahe, dass eine ausreichend breite Schicht in der Architektur nicht nur hilfreich ist, damit die Verlustoberfläche wohlbehalten ist, aber auch wichtig ist für die Ausdrucksstärke von neuronalen Netzen

    Attacking Graph Neural Networks with Bit Flips: Weisfeiler and Lehman Go Indifferent

    Full text link
    Prior attacks on graph neural networks have mostly focused on graph poisoning and evasion, neglecting the network's weights and biases. Traditional weight-based fault injection attacks, such as bit flip attacks used for convolutional neural networks, do not consider the unique properties of graph neural networks. We propose the Injectivity Bit Flip Attack, the first bit flip attack designed specifically for graph neural networks. Our attack targets the learnable neighborhood aggregation functions in quantized message passing neural networks, degrading their ability to distinguish graph structures and losing the expressivity of the Weisfeiler-Lehman test. Our findings suggest that exploiting mathematical properties specific to certain graph neural network architectures can significantly increase their vulnerability to bit flip attacks. Injectivity Bit Flip Attacks can degrade the maximal expressive Graph Isomorphism Networks trained on various graph property prediction datasets to random output by flipping only a small fraction of the network's bits, demonstrating its higher destructive power compared to a bit flip attack transferred from convolutional neural networks. Our attack is transparent and motivated by theoretical insights which are confirmed by extensive empirical results
    • …
    corecore