1,269 research outputs found

    Multi-Tenant Cloud FPGA: A Survey on Security

    Full text link
    With the exponentially increasing demand for performance and scalability in cloud applications and systems, data center architectures evolved to integrate heterogeneous computing fabrics that leverage CPUs, GPUs, and FPGAs. FPGAs differ from traditional processing platforms such as CPUs and GPUs in that they are reconfigurable at run-time, providing increased and customized performance, flexibility, and acceleration. FPGAs can perform large-scale search optimization, acceleration, and signal processing tasks compared with power, latency, and processing speed. Many public cloud provider giants, including Amazon, Huawei, Microsoft, Alibaba, etc., have already started integrating FPGA-based cloud acceleration services. While FPGAs in cloud applications enable customized acceleration with low power consumption, it also incurs new security challenges that still need to be reviewed. Allowing cloud users to reconfigure the hardware design after deployment could open the backdoors for malicious attackers, potentially putting the cloud platform at risk. Considering security risks, public cloud providers still don't offer multi-tenant FPGA services. This paper analyzes the security concerns of multi-tenant cloud FPGAs, gives a thorough description of the security problems associated with them, and discusses upcoming future challenges in this field of study

    Hardware Acceleration of Network Intrusion Detection System Using FPGA

    Get PDF
    This thesis presents new algorithms and hardware designs for Signature-based Network Intrusion Detection System (SB-NIDS) optimisation exploiting a hybrid hardwaresoftware co-designed embedded processing platform. The work describe concentrates on optimisation of a complete SB-NIDS Snort application software on a FPGA based hardware-software target rather than on the implementation of a single functional unit for hardware acceleration. Pattern Matching Hardware Accelerator (PMHA) based on Bloom filter was designed to optimise SB-NIDS performance for execution on a Xilinx MicroBlaze soft-core processor. The Bloom filter approach enables the potentially large number of network intrusion attack patterns to be efficiently represented and searched primarily using accesses to FPGA on-chip memory. The thesis demonstrates, the viability of hybrid hardware-software co-designed approach for SB-NIDS. Future work is required to investigate the effects of later generation FPGA technology and multi-core processors in order to clearly prove the benefits over conventional processor platforms for SB-NIDS. The strengths and weaknesses of the hardware accelerators and algorithms are analysed, and experimental results are examined to determine the effectiveness of the implementation. Experimental results confirm that the PMHA is capable of performing network packet analysis for gigabit rate network traffic. Experimental test results indicate that our SB-NIDS prototype implementation on relatively low clock rate embedded processing platform performance is approximately 1.7 times better than Snort executing on a general purpose processor on PC when comparing processor cycles rather than wall clock time

    Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

    Full text link
    Modern computer chips comprise many components, including microprocessor cores, memory modules, on-chip networks, and accelerators. Such system-on-chip (SoC) designs are deployed in a variety of computing devices: from internet-of-things, to smartphones, to personal computers, to data centers. In this dissertation, we discuss evasive errors in SoC designs and how these errors can be addressed efficiently. In particular, we focus on two types of errors: design bugs and permanent faults. Design bugs originate from the limited amount of time allowed for design verification and validation. Thus, they are often found in functional features that are rarely activated. Complete functional verification, which can eliminate design bugs, is extremely time-consuming, thus impractical in modern complex SoC designs. Permanent faults are caused by failures of fragile transistors in nano-scale semiconductor manufacturing processes. Indeed, weak transistors may wear out unexpectedly within the lifespan of the design. Hardware structures that reduce the occurrence of permanent faults incur significant silicon area or performance overheads, thus they are infeasible for most cost-sensitive SoC designs. To tackle and overcome these evasive errors efficiently, we propose to leverage the principle of decomposition to lower the complexity of the software analysis or the hardware structures involved. To this end, we present several decomposition techniques, specific to major SoC components. We first focus on microprocessor cores, by presenting a lightweight bug-masking analysis that decomposes a program into individual instructions to identify if a design bug would be masked by the program's execution. We then move to memory subsystems: there, we offer an efficient memory consistency testing framework to detect buggy memory-ordering behaviors, which decomposes the memory-ordering graph into small components based on incremental differences. We also propose a microarchitectural patching solution for memory subsystem bugs, which augments each core node with a small distributed programmable logic, instead of including a global patching module. In the context of on-chip networks, we propose two routing reconfiguration algorithms that bypass faulty network resources. The first computes short-term routes in a distributed fashion, localized to the fault region. The second decomposes application-aware routing computation into simple routing rules so to quickly find deadlock-free, application-optimized routes in a fault-ridden network. Finally, we consider general accelerator modules in SoC designs. When a system includes many accelerators, there are a variety of interactions among them that must be verified to catch buggy interactions. To this end, we decompose such inter-module communication into basic interaction elements, which can be reassembled into new, interesting tests. Overall, we show that the decomposition of complex software algorithms and hardware structures can significantly reduce overheads: up to three orders of magnitude in the bug-masking analysis and the application-aware routing, approximately 50 times in the routing reconfiguration latency, and 5 times on average in the memory-ordering graph checking. These overhead reductions come with losses in error coverage: 23% undetected bug-masking incidents, 39% non-patchable memory bugs, and occasionally we overlook rare patterns of multiple faults. In this dissertation, we discuss the ideas and their trade-offs, and present future research directions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147637/1/doowon_1.pd

    Digital Design of New Chaotic Ciphers for Ethernet Traffic

    Get PDF
    Durante los últimos años, ha habido un gran desarrollo en el campo de la criptografía, y muchos algoritmos de encriptado así como otras funciones criptográficas han sido propuestos.Sin embargo, a pesar de este desarrollo, hoy en día todavía existe un gran interés en crear nuevas primitivas criptográficas o mejorar las ya existentes. Algunas de las razones son las siguientes:• Primero, debido el desarrollo de las tecnologías de la comunicación, la cantidad de información que se transmite está constantemente incrementándose. En este contexto, existen numerosas aplicaciones que requieren encriptar una gran cantidad de datos en tiempo real o en un intervalo de tiempo muy reducido. Un ejemplo de ello puede ser el encriptado de videos de alta resolución en tiempo real. Desafortunadamente, la mayoría de los algoritmos de encriptado usados hoy en día no son capaces de encriptar una gran cantidad de datos a alta velocidad mientras mantienen altos estándares de seguridad.• Debido al gran aumento de la potencia de cálculo de los ordenadores, muchos algoritmos que tradicionalmente se consideraban seguros, actualmente pueden ser atacados por métodos de “fuerza bruta” en una cantidad de tiempo razonable. Por ejemplo, cuando el algoritmo de encriptado DES (Data Encryption Standard) fue lanzado por primera vez, el tamaño de la clave era sólo de 56 bits mientras que, hoy en día, el NIST (National Institute of Standards and Technology) recomienda que los algoritmos de encriptado simétricos tengan una clave de, al menos, 112 bits. Por otro lado, actualmente se está investigando y logrando avances significativos en el campo de la computación cuántica y se espera que, en el futuro, se desarrollen ordenadores cuánticos a gran escala. De ser así, se ha demostrado que algunos algoritmos que se usan actualmente como el RSA (Rivest Shamir Adleman) podrían ser atacados con éxito.• Junto al desarrollo en el campo de la criptografía, también ha habido un gran desarrollo en el campo del criptoanálisis. Por tanto, se están encontrando nuevas vulnerabilidades y proponiendo nuevos ataques constantemente. Por consiguiente, es necesario buscar nuevos algoritmos que sean robustos frente a todos los ataques conocidos para sustituir a los algoritmos en los que se han encontrado vulnerabilidades. En este aspecto, cabe destacar que algunos algoritmos como el RSA y ElGamal están basados en la suposición de que algunos problemas como la factorización del producto de dos números primos o el cálculo de logaritmos discretos son difíciles de resolver. Sin embargo, no se ha descartado que, en el futuro, se puedan desarrollar algoritmos que resuelvan estos problemas de manera rápida (en tiempo polinomial).• Idealmente, las claves usadas para encriptar los datos deberían ser generadas de manera aleatoria para ser completamente impredecibles. Dado que las secuencias generadas por generadores pseudoaleatorios, PRNGs (Pseudo Random Number Generators) son predecibles, son potencialmente vulnerables al criptoanálisis. Por tanto, las claves suelen ser generadas usando generadores de números aleatorios verdaderos, TRNGs (True Random Number Generators). Desafortunadamente, los TRNGs normalmente generan los bits a menor velocidad que los PRNGs y, además, las secuencias generadas suelen tener peores propiedades estadísticas, lo que hace necesario que pasen por una etapa de post-procesado. El usar un TRNG de baja calidad para generar claves, puede comprometer la seguridad de todo el sistema de encriptado, como ya ha ocurrido en algunas ocasiones. Por tanto, el diseño de nuevos TRNGs con buenas propiedades estadísticas es un tema de gran interés.En resumen, es claro que existen numerosas líneas de investigación en el ámbito de la criptografía de gran importancia. Dado que el campo de la criptografía es muy amplio, esta tesis se ha centra en tres líneas de investigación: el diseño de nuevos TRNGs, el diseño de nuevos cifradores de flujo caóticos rápidos y seguros y, finalmente, la implementación de nuevos criptosistemas para comunicaciones ópticas Gigabit Ethernet a velocidades de 1 Gbps y 10 Gbps. Dichos criptosistemas han estado basados en los algoritmos caóticos propuestos, pero se han adaptado para poder realizar el encriptado en la capa física, manteniendo el formato de la codificación. De esta forma, se ha logrado que estos sistemas sean capaces no sólo de encriptar los datos sino que, además, un atacante no pueda saber si se está produciendo una comunicación o no. Los principales aspectos cubiertos en esta tesis son los siguientes:• Estudio del estado del arte, incluyendo los algoritmos de encriptado que se usan actualmente. En esta parte se analizan los principales problemas que presentan los algoritmos de encriptado standard actuales y qué soluciones han sido propuestas. Este estudio es necesario para poder diseñar nuevos algoritmos que resuelvan estos problemas.• Propuesta de nuevos TRNGs adecuados para la generación de claves. Se exploran dos diferentes posibilidades: el uso del ruido generado por un acelerómetro MEMS (Microelectromechanical Systems) y el ruido generado por DNOs (Digital Nonlinear Oscillators). Ambos casos se analizan en detalle realizando varios análisis estadísticos a secuencias obtenidas a distintas frecuencias de muestreo. También se propone y se implementa un algoritmo de post-procesado simple para mejorar la aleatoriedad de las secuencias generadas. Finalmente, se discute la posibilidad de usar estos TRNGs como generadores de claves. • Se proponen nuevos algoritmos de encriptado que son rápidos, seguros y que pueden implementarse usando una cantidad reducida de recursos. De entre todas las posibilidades, esta tesis se centra en los sistemas caóticos ya que, gracias a sus propiedades intrínsecas como la ergodicidad o su comportamiento similar al comportamiento aleatorio, pueden ser una buena alternativa a los sistemas de encriptado clásicos. Para superar los problemas que surgen cuando estos sistemas son digitalizados, se proponen y estudian diversas estrategias: usar un sistema de multi-encriptado, cambiar los parámetros de control de los sistemas caóticos y perturbar las órbitas caóticas.• Se implementan los algoritmos propuestos. Para ello, se usa una FPGA Virtex 7. Las distintas implementaciones son analizadas y comparadas, teniendo en cuenta diversos aspectos tales como el consumo de potencia, uso de área, velocidad de encriptado y nivel de seguridad obtenido. Uno de estos diseños, se elige para ser implementado en un ASIC (Application Specific Integrate Circuit) usando una tecnología de 0,18 um. En cualquier caso, las soluciones propuestas pueden ser también implementadas en otras plataformas y otras tecnologías.• Finalmente, los algoritmos propuestos se adaptan y aplican a comunicaciones ópticas Gigabit Ethernet. En particular, se implementan criptosistemas que realizan el encriptado al nivel de la capa física para velocidades de 1 Gbps y 10 Gbps. Para realizar el encriptado en la capa física, los algoritmos propuestos en las secciones anteriores se adaptan para que preserven el formato de la codificación, 8b/10b en el caso de 1 Gb Ethernet y 64b/10b en el caso de 10 Gb Ethernet. En ambos casos, los criptosistemas se implementan en una FPGA Virtex 7 y se diseña un set experimental, que incluye dos módulos SFP (Small Form-factor Pluggable) capaces de transmitir a una velocidad de hasta 10.3125 Gbps sobre una fibra multimodo de 850 nm. Con este set experimental, se comprueba que los sistemas de encriptado funcionan correctamente y de manera síncrona. Además, se comprueba que el encriptado es bueno (pasa todos los test de seguridad) y que el patrón del tráfico de datos está oculto.<br /

    Embracing Low-Power Systems with Improvement in Security and Energy-Efficiency

    Get PDF
    As the economies around the world are aligning more towards usage of computing systems, the global energy demand for computing is increasing rapidly. Additionally, the boom in AI based applications and services has already invited the pervasion of specialized computing hardware architectures for AI (accelerators). A big chunk of research in the industry and academia is being focused on providing energy efficiency to all kinds of power hungry computing architectures. This dissertation adds to these efforts. Aggressive voltage underscaling of chips is one the effective low power paradigms of providing energy efficiency. This dissertation identifies and deals with the reliability and performance problems associated with this paradigm and innovates novel energy efficient approaches. Specifically, the properties of a low power security primitive have been improved and, higher performance has been unlocked in an AI accelerator (Google TPU) in an aggressively voltage underscaled environment. And, novel power saving opportunities have been unlocked by characterizing the usage pattern of a baseline TPU with rigorous mathematical analysis

    Beyond buttons: Explorations in creative storytelling

    Get PDF
    None provided

    Hardware-Aware Algorithm Designs for Efficient Parallel and Distributed Processing

    Get PDF
    The introduction and widespread adoption of the Internet of Things, together with emerging new industrial applications, bring new requirements in data processing. Specifically, the need for timely processing of data that arrives at high rates creates a challenge for the traditional cloud computing paradigm, where data collected at various sources is sent to the cloud for processing. As an approach to this challenge, processing algorithms and infrastructure are distributed from the cloud to multiple tiers of computing, closer to the sources of data. This creates a wide range of devices for algorithms to be deployed on and software designs to adapt to.In this thesis, we investigate how hardware-aware algorithm designs on a variety of platforms lead to algorithm implementations that efficiently utilize the underlying resources. We design, implement and evaluate new techniques for representative applications that involve the whole spectrum of devices, from resource-constrained sensors in the field, to highly parallel servers. At each tier of processing capability, we identify key architectural features that are relevant for applications and propose designs that make use of these features to achieve high-rate, timely and energy-efficient processing.In the first part of the thesis, we focus on high-end servers and utilize two main approaches to achieve high throughput processing: vectorization and thread parallelism. We employ vectorization for the case of pattern matching algorithms used in security applications. We show that re-thinking the design of algorithms to better utilize the resources available in the platforms they are deployed on, such as vector processing units, can bring significant speedups in processing throughout. We then show how thread-aware data distribution and proper inter-thread synchronization allow scalability, especially for the problem of high-rate network traffic monitoring. We design a parallelization scheme for sketch-based algorithms that summarize traffic information, which allows them to handle incoming data at high rates and be able to answer queries on that data efficiently, without overheads.In the second part of the thesis, we target the intermediate tier of computing devices and focus on the typical examples of hardware that is found there. We show how single-board computers with embedded accelerators can be used to handle the computationally heavy part of applications and showcase it specifically for pattern matching for security-related processing. We further identify key hardware features that affect the performance of pattern matching algorithms on such devices, present a co-evaluation framework to compare algorithms, and design a new algorithm that efficiently utilizes the hardware features.In the last part of the thesis, we shift the focus to the low-power, resource-constrained tier of processing devices. We target wireless sensor networks and study distributed data processing algorithms where the processing happens on the same devices that generate the data. Specifically, we focus on a continuous monitoring algorithm (geometric monitoring) that aims to minimize communication between nodes. By deploying that algorithm in action, under realistic environments, we demonstrate that the interplay between the network protocol and the application plays an important role in this layer of devices. Based on that observation, we co-design a continuous monitoring application with a modern network stack and augment it further with an in-network aggregation technique. In this way, we show that awareness of the underlying network stack is important to realize the full potential of the continuous monitoring algorithm.The techniques and solutions presented in this thesis contribute to better utilization of hardware characteristics, across a wide spectrum of platforms. We employ these techniques on problems that are representative examples of current and upcoming applications and contribute with an outlook of emerging possibilities that can build on the results of the thesis

    Data layout types : a type-based approach to automatic data layout transformations for improved SIMD vectorisation

    Get PDF
    The increasing complexity of modern hardware requires sophisticated programming techniques for programs to run efficiently. At the same time, increased power of modern hardware enables more advanced analyses to be included in compilers. This thesis focuses on one particular optimisation technique that improves utilisation of vector units. The foundation of this technique is the ability to chose memory mappings for data structures of a given program. Usually programming languages use a fixed layout for logical data structures in physical memory. Such a static mapping often has a negative effect on usability of vector units. In this thesis we consider a compiler for a programming language that allows every data structure in a program to have its own data layout. We make sure that data layouts across the program are sound, and most importantly we solve a problem of automatic data layout reconstruction. To consistently do this, we formulate this as a type inference problem, where type encodes a data layout for a given structure as well as implied program transformations. We prove that type-implied transformations preserve semantics of the original programs and we demonstrate significant performance improvements when targeting SIMD-capable architectures
    corecore