127 research outputs found

    StrassenNets: Deep Learning with a Multiplication Budget

    Get PDF
    A large fraction of the arithmetic operations required to evaluate deep neural networks (DNNs) consists of matrix multiplications, in both convolution and fully connected layers. We perform end-to-end learning of low-cost approximations of matrix multiplications in DNN layers by casting matrix multiplications as 2-layer sum-product networks (SPNs) (arithmetic circuits) and learning their (ternary) edge weights from data. The SPNs disentangle multiplication and addition operations and enable us to impose a budget on the number of multiplication operations. Combining our method with knowledge distillation and applying it to image classification DNNs (trained on ImageNet) and language modeling DNNs (using LSTMs), we obtain a first-of-a-kind reduction in number of multiplications (over 99.5%) while maintaining the predictive performance of the full-precision models. Finally, we demonstrate that the proposed framework is able to rediscover Strassen’s matrix multiplication algorithm, learning to multiply 2×2 matrices using only 7 multiplications instead of 8

    Basic concepts in quantum computation

    Get PDF
    Section headings: 1 Qubits, gates and networks 2 Quantum arithmetic and function evaluations 3 Algorithms and their complexity 4 From interferometers to computers 5 The first quantum algorithms 6 Quantum search 7 Optimal phase estimation 8 Periodicity and quantum factoring 9 Cryptography 10 Conditional quantum dynamics 11 Decoherence and recoherence 12 Concluding remarksComment: 37 pages, lectures given at les Houches Summer School on "Coherent Matter Waves", July-August 199

    StrassenNets: Deep Learning with a Multiplication Budget

    Get PDF
    A large fraction of the arithmetic operations required to evaluate deep neural networks (DNNs) consists of matrix multiplications, in both convolution and fully connected layers. We perform end-to-end learning of low-cost approximations of matrix multiplications in DNN layers by casting matrix multiplications as 2-layer sum-product networks (SPNs) (arithmetic circuits) and learning their (ternary) edge weights from data. The SPNs disentangle multiplication and addition operations and enable us to impose a budget on the number of multiplication operations. Combining our method with knowledge distillation and applying it to image classification DNNs (trained on ImageNet) and language modeling DNNs (using LSTMs), we obtain a first-of-a-kind reduction in number of multiplications (over 99.5%) while maintaining the predictive performance of the full-precision models. Finally, we demonstrate that the proposed framework is able to rediscover Strassen’s matrix multiplication algorithm, learning to multiply 2×2 matrices using only 7 multiplications instead of 8

    PYDAC: A DISTRIBUTED RUNTIME SYSTEM AND PROGRAMMING MODEL FOR A HETEROGENEOUS MANY-CORE ARCHITECTURE

    Get PDF
    Heterogeneous many-core architectures that consist of big, fast cores and small, energy-efficient cores are very promising for future high-performance computing (HPC) systems. These architectures offer a good balance between single-threaded perfor- mance and multithreaded throughput. Such systems impose challenges on the design of programming model and runtime system. Specifically, these challenges include (a) how to fully utilize the chip’s performance, (b) how to manage heterogeneous, un- reliable hardware resources, and (c) how to generate and manage a large amount of parallel tasks. This dissertation proposes and evaluates a Python-based programming framework called PyDac. PyDac supports a two-level programming model. At the high level, a programmer creates a very large number of tasks, using the divide-and-conquer strategy. At the low level, tasks are written in imperative programming style. The runtime system seamlessly manages the parallel tasks, system resilience, and inter- task communication with architecture support. PyDac has been implemented on both an field-programmable gate array (FPGA) emulation of an unconventional het- erogeneous architecture and a conventional multicore microprocessor. To evaluate the performance, resilience, and programmability of the proposed system, several micro-benchmarks were developed. We found that (a) the PyDac abstracts away task communication and achieves programmability, (b) the micro-benchmarks are scalable on the hardware prototype, but (predictably) serial operation limits some micro-benchmarks, and (c) the degree of protection versus speed could be varied in redundant threading that is transparent to programmers

    ENNigma: A Framework for Private Neural Networks

    Get PDF
    The increasing concerns about data privacy and the stringent enforcement of data protection laws are placing growing pressure on organizations to secure large datasets. The challenge of ensuring data privacy becomes even more complex in the domains of Artificial Intelligence and Machine Learning due to their requirement for large amounts of data. While approaches like differential privacy and secure multi-party computation allow data to be used with some privacy guarantees, they often compromise data integrity or accessibility as a tradeoff. In contrast, when using encryption-based strategies, this is not the case. While basic encryption only protects data during transmission and storage, Homomorphic Encryption (HE) is able to preserve data privacy during its processing on a centralized server. Despite its advantages, the computational overhead HE introduces is notably challenging when integrated into Neural Networks (NNs), which are already computationally expensive. In this work, we present a framework called ENNigma, which is a Private Neural Network (PNN) that uses HE for data privacy preservation. Unlike some state-of-the-art approaches, ENNigma guarantees data security throughout every operation, maintaining this guarantee even if the server is compromised. The impact of this privacy preservation layer on the NN performance is minimal, with the only major drawback being its computational cost. Several optimizations were implemented to maximize the efficiency of ENNigma, leading to occasional computational time reduction above 50%. In the context of the Network Intrusion Detection System application domain, particularly within the sub-domain of Distributed Denial of Service attack detection, several models were developed and employed to assess ENNigma’s performance in a real-world scenario. These models demonstrated comparable performance to non-private NNs while also achiev ing the two-and-a-half-minute inference latency mark. This suggests that our framework is approaching a state where it can be effectively utilized in real-time applications. The key takeaway is that ENNigma represents a significant advancement in the field of PNN as it ensures data privacy with minimal impact on NN performance. While it is not yet ready for real-world deployment due to its computational complexity, this framework serves as a milestone toward realizing fully private and efficient NNs.As preocupações crescentes com a privacidade de dados e a implementação de leis que visam endereçar este problema, estão a pressionar as organizações para assegurar a segurança das suas bases de dados. Este desafio torna-se ainda mais complexo nos domínios da Inteligência Artificial e Machine Learning, que dependem do acesso a grandes volumes de dados para obterem bons resultados. As abordagens existentes, tal como Differential Privacy e Secure Multi-party Computation, já permitem o uso de dados com algumas garantias de privacidade. No entanto, na maioria das vezes, comprometem a integridade ou a acessibilidade aos mesmos. Por outro lado, ao usar estratégias baseadas em cifras, isso não ocorre. Ao contrário das cifras mais tradicionais, que apenas protegem os dados durante a transmissão e armazenamento, as cifras homomórficas são capazes de preservar a privacidade dos dados durante o seu processamento. Nomeadamente se o mesmo for centralizado num único servidor. Apesar das suas vantagens, o custo computacional introduzido por este tipo de cifras é bastante desafiador quando integrado em Redes Neurais que, por natureza, já são computacionalmente pesadas. Neste trabalho, apresentamos uma biblioteca chamada ENNigma, que é uma Rede Neural Privada construída usando cifras homomórficas para preservar a privacidade dos dados. Ao contrário de algumas abordagens estado-da-arte, a ENNigma garante a segurança dos dados em todas as operações, mantendo essa garantia mesmo que o servidor seja comprometido. O impacto da introdução desta camada de segurança, no desempenho da rede neural, é mínimo, sendo a sua única grande desvantagem o seu custo computacional. Foram ainda implementadas diversas otimizações para maximizar a eficiência da biblioteca apresentada, levando a reduções ocasionais no tempo computacional acima de 50%. No contexto do domínio de aplicação de Sistemas de Detecção de Intrusão em Redes de Computadores, em particular dentro do subdomínio de detecção de ataques do tipo Distributed Denial of Service, vários modelos foram desenvolvidos para avaliar o desempenho da ENNigma num cenário real. Estes modelos demonstraram desempenho comparável às redes neurais não privadas, ao mesmo tempo que alcançaram uma latência de inferência de dois minutos e meio. Isso sugere que a biblioteca apresentada está a aproximar-se de um estado em que pode ser utilizada em aplicações em tempo real. A principal conclusão é que a biblioteca ENNigma representa um avanço significativo na área das Redes Neurais Privadas, pois assegura a privacidade dos dados com um impacto mínimo no desempenho da rede neural. Embora esta ferramenta ainda não esteja pronta para utilização no mundo real, devido à sua complexidade computacional, serve como um marco importante para o desenvolvimento de redes neurais totalmente privadas e eficientes

    An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

    Get PDF
    We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.Comment: To appear at the DSN 2020 conferenc
    • …
    corecore