Search CORE

596 research outputs found

Combining high productivity and high performance in image processing using single assignment C on multi-core CPUs and many-core GPUs

Author: Bernecky R.
Grelck C.
Guo J.
Haslinger P.
Korzeniowski F
Moser B.
Scholz S.-B.
Wieser V.
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/2012
Field of study

International Migration, Integration and Social Cohesion online publications

Static Analysis of Shape in TensorFlow Programs

Author: Antoniadis Anastasios
Dolby Julian
Grech Neville
Lagouvardos Sifis
Smaragdakis Yannis
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th European Conference on Object-Oriented Programming (ECOOP 2020)
Publication date: 01/01/2020
Field of study

Machine learning has been widely adopted in diverse science and engineering domains, aided by reusable libraries and quick development patterns. The TensorFlow library is probably the best-known representative of this trend and most users employ the Python API to its powerful back-end. TensorFlow programs are susceptible to several systematic errors, especially in the dynamic typing setting of Python. We present Pythia, a static analysis that tracks the shapes of tensors across Python library calls and warns of several possible mismatches. The key technical aspects are a close modeling of library semantics with respect to tensor shape, and an identification of violations and error-prone patterns. Pythia is powerful enough to statically detect (with 84.62% precision) 11 of the 14 shape-related TensorFlow bugs in the recent Zhang et al. empirical study - an independent slice of real-world bugs

Dagstuhl Research Online Publication Server

Combining high productivity and high performance in image processing using Single Assignment C on multi-core CPUs and many-core GPUs

Author: Bradski
Schölkopf
Volkmar Wieser
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date
Field of study

Crossref

Real-time implementation of 3D LiDAR point cloud semantic segmentation in an FPGA

Author: Delgado Pedro Paulo Fontes
Publication venue
Publication date: 05/01/2023
Field of study

Dissertação de mestrado em Informatics EngineeringIn the last few years, the automotive industry has relied heavily on deep learning applications for perception solutions. With data-heavy sensors, such as LiDAR, becoming a standard, the task of developing low-power and real-time applications has become increasingly more challenging. To obtain the maximum computational efficiency, no longer can one focus solely on the software aspect of such applications, while disregarding the underlying hardware. In this thesis, a hardware-software co-design approach is used to implement an inference application leveraging the SqueezeSegV3, a LiDAR-based convolutional neural network, on the Versal ACAP VCK190 FPGA. Automotive requirements carefully drive the development of the proposed solution, with real-time performance and low power consumption being the target metrics. A first experiment validates the suitability of Xilinx’s Vitis-AI tool for the deployment of deep convolutional neural networks on FPGAs. Both the ResNet-18 and SqueezeNet neural networks are deployed to the Zynq UltraScale+ MPSoC ZCU104 and Versal ACAP VCK190 FPGAs. The results show that both networks achieve far more than the real-time requirements while consuming low power. Compared to an NVIDIA RTX 3090 GPU, the performance per watt during both network’s inference is 12x and 47.8x higher and 15.1x and 26.6x higher respectively for the Zynq UltraScale+ MPSoC ZCU104 and the Versal ACAP VCK190 FPGA. These results are obtained with no drop in accuracy in the quantization step. A second experiment builds upon the results of the first by deploying a real-time application containing the SqueezeSegV3 model using the Semantic-KITTI dataset. A framerate of 11 Hz is achieved with a peak power consumption of 78 Watts. The quantization step results in a minimal accuracy and IoU degradation of 0.7 and 1.5 points respectively. A smaller version of the same model is also deployed achieving a framerate of 19 Hz and a peak power consumption of 76 Watts. The application performs semantic segmentation over all the point cloud with a field of view of 360°.Nos últimos anos a indústria automóvel tem cada vez mais aplicado deep learning para solucionar problemas de perceção. Dado que os sensores que produzem grandes quantidades de dados, como o LiDAR, se têm tornado standard, a tarefa de desenvolver aplicações de baixo consumo energético e com capacidades de reagir em tempo real tem-se tornado cada vez mais desafiante. Para obter a máxima eficiência computacional, deixou de ser possível focar-se apenas no software aquando do desenvolvimento de uma aplicação deixando de lado o hardware subjacente. Nesta tese, uma abordagem de desenvolvimento simultâneo de hardware e software é usada para implementar uma aplicação de inferência usando o SqueezeSegV3, uma rede neuronal convolucional profunda, na FPGA Versal ACAP VCK190. São os requisitos automotive que guiam o desenvolvimento da solução proposta, sendo a performance em tempo real e o baixo consumo energético, as métricas alvo principais. Uma primeira experiência valida a aptidão da ferramenta Vitis-AI para a implantação de redes neuronais convolucionais profundas em FPGAs. As redes ResNet-18 e SqueezeNet são ambas implantadas nas FPGAs Zynq UltraScale+ MPSoC ZCU104 e Versal ACAP VCK190. Os resultados mostram que ambas as redes ultrapassam os requisitos de tempo real consumindo pouca energia. Comparado com a GPU NVIDIA RTX 3090, a performance por Watt durante a inferência de ambas as redes é superior em 12x e 47.8x e 15.1x e 26.6x respetivamente na Zynq UltraScale+ MPSoC ZCU104 e na Versal ACAP VCK190. Estes resultados foram obtidos sem qualquer perda de accuracy na etapa de quantização. Uma segunda experiência é feita no seguimento dos resultados da primeira, implantando uma aplicação de inferência em tempo real contendo o modelo SqueezeSegV3 e usando o conjunto de dados Semantic-KITTI. Um framerate de 11 Hz é atingido com um pico de consumo energético de 78 Watts. O processo de quantização resulta numa perda mínima de accuracy e IoU com valores de 0.7 e 1.5 pontos respetivamente. Uma versão mais pequena do mesmo modelo é também implantada, atingindo uma framerate de 19 Hz e um pico de consumo energético de 76 Watts. A aplicação desenvolvida executa segmentação semântica sobre a totalidade das nuvens de pontos LiDAR, com um campo de visão de 360°

Universidade do Minho: RepositoriUM

Data layout types : a type-based approach to automatic data layout transformations for improved SIMD vectorisation

Author: Šinkarovs Artjoms
Publication venue: Mathematical and Computer Sciences
Publication date: 01/08/2015
Field of study

The increasing complexity of modern hardware requires sophisticated programming techniques for programs to run efficiently. At the same time, increased power of modern hardware enables more advanced analyses to be included in compilers. This thesis focuses on one particular optimisation technique that improves utilisation of vector units. The foundation of this technique is the ability to chose memory mappings for data structures of a given program. Usually programming languages use a fixed layout for logical data structures in physical memory. Such a static mapping often has a negative effect on usability of vector units. In this thesis we consider a compiler for a programming language that allows every data structure in a program to have its own data layout. We make sure that data layouts across the program are sound, and most importantly we solve a problem of automatic data layout reconstruction. To consistently do this, we formulate this as a type inference problem, where type encodes a data layout for a given structure as well as implied program transformations. We prove that type-implied transformations preserve semantics of the original programs and we demonstrate significant performance improvements when targeting SIMD-capable architectures

ROS: The Research Output Service. Heriot-Watt University Edinburgh

Cellular automata for dynamic S-boxes in cryptography.

Author: Luckett William Matthew
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/09/2007
Field of study

In today\u27s world of private information and mass communication, there is an ever increasing need for new methods of maintaining and protecting privacy and integrity of information. This thesis attempts to combine the chaotic world of cellular automata and the paranoid world of cryptography to enhance the S-box of many Substitution Permutation Network (SPN) ciphers, specifically Rijndael/AES. The success of this enhancement is measured in terms of security and performance. The results show that it is possible to use Cellular Automata (CA) to enhance the security of an 8-bit S-box by further randomizing the structure. This secure use of CA to scramble the S-box, removes the 9-term algebraic expression [20] [21] that typical Galois generated S-boxes share. This cryptosystem securely uses a Margolis class, partitioned block, uniform gas, cellular automata to create unique S-boxes for each block of data to be processed. The system improves the base Rijndael algorithm in the following ways. First, it utilizes a new S-box for each block of data. This effectively limits the amount of data that can be gathered for statistical analysis to the blocksize being used. Secondly, the S-boxes are not stored in the compiled binary, which protects against an S-box Blanking [22] attack. Thirdly, the algebraic expression hidden within each galois generated S-box is destroyed after one CA generation, which also modifies key expansion results. Finally, the thesis succeeds in combining Cellular Automata and Cryptography securely, though it is not the most efficient solution to dynamic S-boxes

University of Louisville

A Critical Analysis of Payload Anomaly-Based Intrusion Detection Systems

Author: Mercurio Anthony F.
Publication venue: ePublications at Regis University
Publication date: 30/10/2010
Field of study

Examining payload content is an important aspect of network security, particularly in today\u27s volatile computing environment. An Intrusion Detection System (IDS) that simply analyzes packet header information cannot adequately secure a network from malicious attacks. The alternative is to perform deep-packet analysis using n-gram language parsing and neural network technology. Self Organizing Map (SOM), PAYL over Self-Organizing Maps for Intrusion Detection (POSEIDON), Anomalous Payload-based Network Intrusion Detection (PAYL), and Anagram are next-generation unsupervised payload anomaly-based IDSs. This study examines the efficacy of each system using the design-science research methodology. A collection of quantitative data and qualitative features exposes their strengths and weaknesses

ePublications at Regis University

Programmiersprachen und Rechenkonzepte

Author
Publication venue
Publication date: 01/01/2004
Field of study

Die GI-Fachgruppe 2.1.4 "Programmiersprachen und Rechenkonzepte" veranstaltete vom 3. bis 5. Mai 2004 im Physikzentrum Bad Honnef ihren jährlichen Workshop. Dieser Bericht enthält eine Zusammenstellung der Beiträge. Das Treffen diente wie in jedem Jahr gegenseitigem Kennenlernen, der Vertiefung gegenseitiger Kontakte, der Vorstellung neuer Arbeiten und Ergebnisse und vor allem der intensiven Diskussion. Ein breites Spektrum von Beiträgen, von theoretischen Grundlagen über Programmentwicklung, Sprachdesign, Softwaretechnik und Objektorientierung bis hin zur überraschend langen Geschichte der Rechenautomaten seit der Antike bildete ein interessantes und abwechlungsreiches Programm. Unter anderem waren imperative, funktionale und funktional-logische Sprachen, Software/Hardware-Codesign, Semantik, Web-Programmierung und Softwaretechnik, generative Programmierung, Aspekte und formale Testunterstützung Thema. Interessante Beiträge zu diesen und weiteren Themen gaben Anlaß zu Erfahrungsaustausch und Fachgesprächen auch mit den Teilnehmern des zeitgleich im Physikzentrum Bad Honnef stattfindenden Workshops "Reengineering". Allen Teilnehmern möchte ich dafür danken, daß sie mit ihren Vorträgen und konstruktiven Diskussionsbeiträgen zum Gelingen des Workshops beigetragen haben. Dank für die Vielfalt und Qualität der Beiträge gebührt den Autoren. Ein Wort des Dankes gebührt ebenso den Mitarbeitern und der Leitung des Physikzentrums Bad Honnef für die gewohnte angenehme und anregende Atmosphäre und umfassende Betreuung

MACAU: Open Access Repository of Kiel University

Hardware Architectures for Post-Quantum Cryptography

Author: Wang Wen
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 01/04/2021
Field of study

The rapid development of quantum computers poses severe threats to many commonly-used cryptographic algorithms that are embedded in different hardware devices to ensure the security and privacy of data and communication. Seeking for new solutions that are potentially resistant against attacks from quantum computers, a new research field called Post-Quantum Cryptography (PQC) has emerged, that is, cryptosystems deployed in classical computers conjectured to be secure against attacks utilizing large-scale quantum computers. In order to secure data during storage or communication, and many other applications in the future, this dissertation focuses on the design, implementation, and evaluation of efficient PQC schemes in hardware. Four PQC algorithms, each from a different family, are studied in this dissertation. The first hardware architecture presented in this dissertation is focused on the code-based scheme Classic McEliece. The research presented in this dissertation is the first that builds the hardware architecture for the Classic McEliece cryptosystem. This research successfully demonstrated that complex code-based PQC algorithm can be run efficiently on hardware. Furthermore, this dissertation shows that implementation of this scheme on hardware can be easily tuned to different configurations by implementing support for flexible choices of security parameters as well as configurable hardware performance parameters. The successful prototype of the Classic McEliece scheme on hardware increased confidence in this scheme, and helped Classic McEliece to get recognized as one of seven finalists in the third round of the NIST PQC standardization process. While Classic McEliece serves as a ready-to-use candidate for many high-end applications, PQC solutions are also needed for low-end embedded devices. Embedded devices play an important role in our daily life. Despite their typically constrained resources, these devices require strong security measures to protect them against cyber attacks. Towards securing this type of devices, the second research presented in this dissertation focuses on the hash-based digital signature scheme XMSS. This research is the first that explores and presents practical hardware based XMSS solution for low-end embedded devices. In the design of XMSS hardware, a heterogenous software-hardware co-design approach was adopted, which combined the flexibility of the soft core with the acceleration from the hard core. The practicability and efficiency of the XMSS software-hardware co-design is further demonstrated by providing a hardware prototype on an open-source RISC-V based System-on-a-Chip (SoC) platform. The third research direction covered in this dissertation focuses on lattice-based cryptography, which represents one of the most promising and popular alternatives to today\u27s widely adopted public key solutions. Prior research has presented hardware designs targeting the computing blocks that are necessary for the implementation of lattice-based systems. However, a recurrent issue in most existing designs is that these hardware designs are not fully scalable or parameterized, hence limited to specific cryptographic primitives and security parameter sets. The research presented in this dissertation is the first that develops hardware accelerators that are designed to be fully parameterized to support different lattice-based schemes and parameters. Further, these accelerators are utilized to realize the first software-harware co-design of provably-secure instances of qTESLA, which is a lattice-based digital signature scheme. This dissertation demonstrates that even demanding, provably-secure schemes can be realized efficiently with proper use of software-hardware co-design. The final research presented in this dissertation is focused on the isogeny-based scheme SIKE, which recently made it to the final round of the PQC standardization process. This research shows that hardware accelerators can be designed to offload compute-intensive elliptic curve and isogeny computations to hardware in a versatile fashion. These hardware accelerators are designed to be fully parameterized to support different security parameter sets of SIKE as well as flexible hardware configurations targeting different user applications. This research is the first that presents versatile hardware accelerators for SIKE that can be mapped efficiently to both FPGA and ASIC platforms. Based on these accelerators, an efficient software-hardwareco-design is constructed for speeding up SIKE. In the end, this dissertation demonstrates that, despite being embedded with expensive arithmetic, the isogeny-based SIKE scheme can be run efficiently by exploiting specialized hardware. These four research directions combined demonstrate the practicability of building efficient hardware architectures for complex PQC algorithms. The exploration of efficient PQC solutions for different hardware platforms will eventually help migrate high-end servers and low-end embedded devices towards the post-quantum era

Yale University

Efficient Storage of Genomic Sequences in High Performance Computing Systems

Author: Guerra Soler Aníbal José
Publication venue: Medellín, Colombia
Publication date: 01/01/2019
Field of study

ABSTRACT: In this dissertation, we address the challenges of genomic data storage in high performance computing systems. In particular, we focus on developing a referential compression approach for Next Generation Sequence data stored in FASTQ format files. The amount of genomic data available for researchers to process has increased exponentially, bringing enormous challenges for its efficient storage and transmission. General-purpose compressors can only offer limited performance for genomic data, thus the need for specialized compression solutions. Two trends have emerged as alternatives to harness the particular properties of genomic data: non-referential and referential compression. Non-referential compressors offer higher compression rations than general purpose compressors, but still below of what a referential compressor could theoretically achieve. However, the effectiveness of referential compression depends on selecting a good reference and on having enough computing resources available. This thesis presents one of the first referential compressors for FASTQ files. We first present a comprehensive analytical and experimental evaluation of the most relevant tools for genomic raw data compression, which led us to identify the main needs and opportunities in this field. As a consequence, we propose a novel compression workflow that aims at improving the usability of referential compressors. Subsequently, we discuss the implementation and performance evaluation for the core of the proposed workflow: a referential compressor for reads in FASTQ format that combines local read-to-reference alignments with a specialized binary-encoding strategy. The compression algorithm, named UdeACompress, achieved very competitive compression ratios when compared to the best compressors in the current state of the art, while showing reasonable execution times and memory use. In particular, UdeACompress outperformed all competitors when compressing long reads, typical of the newest sequencing technologies. Finally, we study the main aspects of the data-level parallelism in the Intel AVX-512 architecture, in order to develop a parallel version of the UdeACompress algorithms to reduce the runtime. Through the use of SIMD programming, we managed to significantly accelerate the main bottleneck found in UdeACompress, the Suffix Array Construction

Biblioteca Digital del Sistema de Bibliotecas de la Universidad de Antioquia