127 research outputs found
StrassenNets: Deep Learning with a Multiplication Budget
A large fraction of the arithmetic operations required to evaluate deep neural networks (DNNs) consists of matrix multiplications, in both convolution and fully connected layers. We perform end-to-end learning of low-cost approximations of matrix multiplications in DNN layers by casting matrix multiplications as 2-layer sum-product networks (SPNs) (arithmetic circuits) and learning their (ternary) edge weights from data. The SPNs disentangle multiplication and addition operations and enable us to impose a budget on the number of multiplication operations. Combining our method with knowledge distillation and applying it to image classification DNNs (trained on ImageNet) and language modeling DNNs (using LSTMs), we obtain a first-of-a-kind reduction in number of multiplications (over 99.5%) while maintaining the predictive performance of the full-precision models. Finally, we demonstrate that the proposed framework is able to rediscover Strassen’s matrix multiplication algorithm, learning to multiply 2×2 matrices using only 7 multiplications instead of 8
Basic concepts in quantum computation
Section headings: 1 Qubits, gates and networks 2 Quantum arithmetic and
function evaluations 3 Algorithms and their complexity 4 From interferometers
to computers 5 The first quantum algorithms 6 Quantum search 7 Optimal phase
estimation 8 Periodicity and quantum factoring 9 Cryptography 10 Conditional
quantum dynamics 11 Decoherence and recoherence 12 Concluding remarksComment: 37 pages, lectures given at les Houches Summer School on "Coherent
Matter Waves", July-August 199
StrassenNets: Deep Learning with a Multiplication Budget
A large fraction of the arithmetic operations required to evaluate deep neural networks (DNNs) consists of matrix multiplications, in both convolution and fully connected layers. We perform end-to-end learning of low-cost approximations of matrix multiplications in DNN layers by casting matrix multiplications as 2-layer sum-product networks (SPNs) (arithmetic circuits) and learning their (ternary) edge weights from data. The SPNs disentangle multiplication and addition operations and enable us to impose a budget on the number of multiplication operations. Combining our method with knowledge distillation and applying it to image classification DNNs (trained on ImageNet) and language modeling DNNs (using LSTMs), we obtain a first-of-a-kind reduction in number of multiplications (over 99.5%) while maintaining the predictive performance of the full-precision models. Finally, we demonstrate that the proposed framework is able to rediscover Strassen’s matrix multiplication algorithm, learning to multiply 2×2 matrices using only 7 multiplications instead of 8
PYDAC: A DISTRIBUTED RUNTIME SYSTEM AND PROGRAMMING MODEL FOR A HETEROGENEOUS MANY-CORE ARCHITECTURE
Heterogeneous many-core architectures that consist of big, fast cores and small, energy-efficient cores are very promising for future high-performance computing (HPC) systems. These architectures offer a good balance between single-threaded perfor- mance and multithreaded throughput. Such systems impose challenges on the design of programming model and runtime system. Specifically, these challenges include (a) how to fully utilize the chip’s performance, (b) how to manage heterogeneous, un- reliable hardware resources, and (c) how to generate and manage a large amount of parallel tasks.
This dissertation proposes and evaluates a Python-based programming framework called PyDac. PyDac supports a two-level programming model. At the high level, a programmer creates a very large number of tasks, using the divide-and-conquer strategy. At the low level, tasks are written in imperative programming style. The runtime system seamlessly manages the parallel tasks, system resilience, and inter- task communication with architecture support. PyDac has been implemented on both an field-programmable gate array (FPGA) emulation of an unconventional het- erogeneous architecture and a conventional multicore microprocessor. To evaluate the performance, resilience, and programmability of the proposed system, several micro-benchmarks were developed. We found that (a) the PyDac abstracts away task communication and achieves programmability, (b) the micro-benchmarks are scalable on the hardware prototype, but (predictably) serial operation limits some micro-benchmarks, and (c) the degree of protection versus speed could be varied in redundant threading that is transparent to programmers
ENNigma: A Framework for Private Neural Networks
The increasing concerns about data privacy and the stringent enforcement of data protection
laws are placing growing pressure on organizations to secure large datasets. The challenge
of ensuring data privacy becomes even more complex in the domains of Artificial Intelligence
and Machine Learning due to their requirement for large amounts of data. While approaches
like differential privacy and secure multi-party computation allow data to be used with some
privacy guarantees, they often compromise data integrity or accessibility as a tradeoff. In
contrast, when using encryption-based strategies, this is not the case. While basic encryption
only protects data during transmission and storage, Homomorphic Encryption (HE) is able
to preserve data privacy during its processing on a centralized server. Despite its advantages,
the computational overhead HE introduces is notably challenging when integrated into Neural
Networks (NNs), which are already computationally expensive.
In this work, we present a framework called ENNigma, which is a Private Neural Network
(PNN) that uses HE for data privacy preservation. Unlike some state-of-the-art approaches,
ENNigma guarantees data security throughout every operation, maintaining this guarantee
even if the server is compromised. The impact of this privacy preservation layer on the
NN performance is minimal, with the only major drawback being its computational cost.
Several optimizations were implemented to maximize the efficiency of ENNigma, leading to
occasional computational time reduction above 50%.
In the context of the Network Intrusion Detection System application domain, particularly
within the sub-domain of Distributed Denial of Service attack detection, several models
were developed and employed to assess ENNigma’s performance in a real-world scenario.
These models demonstrated comparable performance to non-private NNs while also achiev ing the two-and-a-half-minute inference latency mark. This suggests that our framework is
approaching a state where it can be effectively utilized in real-time applications.
The key takeaway is that ENNigma represents a significant advancement in the field of PNN
as it ensures data privacy with minimal impact on NN performance. While it is not yet ready
for real-world deployment due to its computational complexity, this framework serves as a
milestone toward realizing fully private and efficient NNs.As preocupações crescentes com a privacidade de dados e a implementação de leis que visam
endereçar este problema, estão a pressionar as organizações para assegurar a segurança das
suas bases de dados. Este desafio torna-se ainda mais complexo nos domÃnios da Inteligência
Artificial e Machine Learning, que dependem do acesso a grandes volumes de dados para
obterem bons resultados. As abordagens existentes, tal como Differential Privacy e Secure
Multi-party Computation, já permitem o uso de dados com algumas garantias de privacidade.
No entanto, na maioria das vezes, comprometem a integridade ou a acessibilidade aos
mesmos. Por outro lado, ao usar estratégias baseadas em cifras, isso não ocorre. Ao
contrário das cifras mais tradicionais, que apenas protegem os dados durante a transmissão
e armazenamento, as cifras homomórficas são capazes de preservar a privacidade dos dados
durante o seu processamento. Nomeadamente se o mesmo for centralizado num único
servidor. Apesar das suas vantagens, o custo computacional introduzido por este tipo de
cifras é bastante desafiador quando integrado em Redes Neurais que, por natureza, já são
computacionalmente pesadas.
Neste trabalho, apresentamos uma biblioteca chamada ENNigma, que é uma Rede Neural
Privada construÃda usando cifras homomórficas para preservar a privacidade dos dados. Ao
contrário de algumas abordagens estado-da-arte, a ENNigma garante a segurança dos dados
em todas as operações, mantendo essa garantia mesmo que o servidor seja comprometido.
O impacto da introdução desta camada de segurança, no desempenho da rede neural, é
mÃnimo, sendo a sua única grande desvantagem o seu custo computacional. Foram ainda
implementadas diversas otimizações para maximizar a eficiência da biblioteca apresentada,
levando a reduções ocasionais no tempo computacional acima de 50%.
No contexto do domÃnio de aplicação de Sistemas de Detecção de Intrusão em Redes de
Computadores, em particular dentro do subdomÃnio de detecção de ataques do tipo Distributed Denial of Service, vários modelos foram desenvolvidos para avaliar o desempenho
da ENNigma num cenário real. Estes modelos demonstraram desempenho comparável à s
redes neurais não privadas, ao mesmo tempo que alcançaram uma latência de inferência de
dois minutos e meio. Isso sugere que a biblioteca apresentada está a aproximar-se de um
estado em que pode ser utilizada em aplicações em tempo real.
A principal conclusão é que a biblioteca ENNigma representa um avanço significativo na
área das Redes Neurais Privadas, pois assegura a privacidade dos dados com um impacto
mÃnimo no desempenho da rede neural. Embora esta ferramenta ainda não esteja pronta
para utilização no mundo real, devido à sua complexidade computacional, serve como um
marco importante para o desenvolvimento de redes neurais totalmente privadas e eficientes
An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration
We empirically evaluate an undervolting technique, i.e., underscaling the
circuit supply voltage below the nominal level, to improve the power-efficiency
of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable
Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing
faults due to excessive circuit latency increase. We evaluate the
reliability-power trade-off for such accelerators. Specifically, we
experimentally study the reduced-voltage operation of multiple components of
real FPGAs, characterize the corresponding reliability behavior of CNN
accelerators, propose techniques to minimize the drawbacks of reduced-voltage
operation, and combine undervolting with architectural CNN optimization
techniques, i.e., quantization and pruning. We investigate the effect of
environmental temperature on the reliability-power trade-off of such
accelerators. We perform experiments on three identical samples of modern
Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification
CNN benchmarks. This approach allows us to study the effects of our
undervolting technique for both software and hardware variability. We achieve
more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain
is the result of eliminating the voltage guardband region, i.e., the safe
voltage region below the nominal level that is set by FPGA vendor to ensure
correct functionality in worst-case environmental and circuit conditions. 43%
of the power-efficiency gain is due to further undervolting below the
guardband, which comes at the cost of accuracy loss in the CNN accelerator. We
evaluate an effective frequency underscaling technique that prevents this
accuracy loss, and find that it reduces the power-efficiency gain from 43% to
25%.Comment: To appear at the DSN 2020 conferenc
- …