14 research outputs found

    Architectures for soft-decision decoding of non-binary codes

    Full text link
    En esta tesis se estudia el dise¿no de decodificadores no-binarios para la correcci'on de errores en sistemas de comunicaci'on modernos de alta velocidad. El objetivo es proponer soluciones de baja complejidad para los algoritmos de decodificaci'on basados en los c'odigos de comprobaci'on de paridad de baja densidad no-binarios (NB-LDPC) y en los c'odigos Reed-Solomon, con la finalidad de implementar arquitecturas hardware eficientes. En la primera parte de la tesis se analizan los cuellos de botella existentes en los algoritmos y en las arquitecturas de decodificadores NB-LDPC y se proponen soluciones de baja complejidad y de alta velocidad basadas en el volteo de s'¿mbolos. En primer lugar, se estudian las soluciones basadas en actualizaci'on por inundaci 'on con el objetivo de obtener la mayor velocidad posible sin tener en cuenta la ganancia de codificaci'on. Se proponen dos decodificadores diferentes basados en clipping y t'ecnicas de bloqueo, sin embargo, la frecuencia m'axima est'a limitada debido a un exceso de cableado. Por este motivo, se exploran algunos m'etodos para reducir los problemas de rutado en c'odigos NB-LDPC. Como soluci'on se propone una arquitectura basada en difusi'on parcial para algoritmos de volteo de s'¿mbolos que mitiga la congesti'on por rutado. Como las soluciones de actualizaci 'on por inundaci'on de mayor velocidad son sub-'optimas desde el punto de vista de capacidad de correci'on, decidimos dise¿nar soluciones para la actualizaci'on serie, con el objetivo de alcanzar una mayor velocidad manteniendo la ganancia de codificaci'on de los algoritmos originales de volteo de s'¿mbolo. Se presentan dos algoritmos y arquitecturas de actualizaci'on serie, reduciendo el 'area y aumentando de la velocidad m'axima alcanzable. Por 'ultimo, se generalizan los algoritmos de volteo de s'¿mbolo y se muestra como algunos casos particulares puede lograr una ganancia de codificaci'on cercana a los algoritmos Min-sum y Min-max con una menor complejidad. Tambi'en se propone una arquitectura eficiente, que muestra que el 'area se reduce a la mitad en comparaci'on con una soluci'on de mapeo directo. En la segunda parte de la tesis, se comparan algoritmos de decodificaci'on Reed- Solomon basados en decisi'on blanda, concluyendo que el algoritmo de baja complejidad Chase (LCC) es la soluci'on m'as eficiente si la alta velocidad es el objetivo principal. Sin embargo, los esquemas LCC se basan en la interpolaci'on, que introduce algunas limitaciones hardware debido a su complejidad. Con el fin de reducir la complejidad sin modificar la capacidad de correcci'on, se propone un esquema de decisi'on blanda para LCC basado en algoritmos de decisi'on dura. Por 'ultimo se dise¿na una arquitectura eficiente para este nuevo esquemaGarcía Herrero, FM. (2013). Architectures for soft-decision decoding of non-binary codes [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/33753TESISPremiad

    Error-correction on non-standard communication channels

    Get PDF
    Many communication systems are poorly modelled by the standard channels assumed in the information theory literature, such as the binary symmetric channel or the additive white Gaussian noise channel. Real systems suffer from additional problems including time-varying noise, cross-talk, synchronization errors and latency constraints. In this thesis, low-density parity-check codes and codes related to them are applied to non-standard channels. First, we look at time-varying noise modelled by a Markov channel. A low-density parity-check code decoder is modified to give an improvement of over 1dB. Secondly, novel codes based on low-density parity-check codes are introduced which produce transmissions with Pr(bit = 1) ≠ Pr(bit = 0). These non-linear codes are shown to be good candidates for multi-user channels with crosstalk, such as optical channels. Thirdly, a channel with synchronization errors is modelled by random uncorrelated insertion or deletion events at unknown positions. Marker codes formed from low-density parity-check codewords with regular markers inserted within them are studied. It is shown that a marker code with iterative decoding has performance close to the bounds on the channel capacity, significantly outperforming other known codes. Finally, coding for a system with latency constraints is studied. For example, if a telemetry system involves a slow channel some error correction is often needed quickly whilst the code should be able to correct remaining errors later. A new code is formed from the intersection of a convolutional code with a high rate low-density parity-check code. The convolutional code has good early decoding performance and the high rate low-density parity-check code efficiently cleans up remaining errors after receiving the entire block. Simulations of the block code show a gain of 1.5dB over a standard NASA code

    Decryption Failure Attacks on Post-Quantum Cryptography

    Get PDF
    This dissertation discusses mainly new cryptanalytical results related to issues of securely implementing the next generation of asymmetric cryptography, or Public-Key Cryptography (PKC).PKC, as it has been deployed until today, depends heavily on the integer factorization and the discrete logarithm problems.Unfortunately, it has been well-known since the mid-90s, that these mathematical problems can be solved due to Peter Shor's algorithm for quantum computers, which achieves the answers in polynomial time.The recently accelerated pace of R&D towards quantum computers, eventually of sufficient size and power to threaten cryptography, has led the crypto research community towards a major shift of focus.A project towards standardization of Post-quantum Cryptography (PQC) was launched by the US-based standardization organization, NIST. PQC is the name given to algorithms designed for running on classical hardware/software whilst being resistant to attacks from quantum computers.PQC is well suited for replacing the current asymmetric schemes.A primary motivation for the project is to guide publicly available research toward the singular goal of finding weaknesses in the proposed next generation of PKC.For public key encryption (PKE) or digital signature (DS) schemes to be considered secure they must be shown to rely heavily on well-known mathematical problems with theoretical proofs of security under established models, such as indistinguishability under chosen ciphertext attack (IND-CCA).Also, they must withstand serious attack attempts by well-renowned cryptographers both concerning theoretical security and the actual software/hardware instantiations.It is well-known that security models, such as IND-CCA, are not designed to capture the intricacies of inner-state leakages.Such leakages are named side-channels, which is currently a major topic of interest in the NIST PQC project.This dissertation focuses on two things, in general:1) how does the low but non-zero probability of decryption failures affect the cryptanalysis of these new PQC candidates?And 2) how might side-channel vulnerabilities inadvertently be introduced when going from theory to the practice of software/hardware implementations?Of main concern are PQC algorithms based on lattice theory and coding theory.The primary contributions are the discovery of novel decryption failure side-channel attacks, improvements on existing attacks, an alternative implementation to a part of a PQC scheme, and some more theoretical cryptanalytical results

    A new Architecture for High Speed, Low Latency NB-LDPC Check Node Processing

    No full text
    International audience—Non-binary low-density parity-check codes have superior communications performance compared to their binary counterparts. However, to be an option for future standards, efficient hardware architectures must be developed. State-of-the-art decoding algorithms lead to architectures suffering from low throughput and high latency. The check node function accounts for the largest part of the decoders overall complexity. In this paper a new hardware aware check node algorithm and its architecture is proposed. It has state-of-the-art communications performance while reducing the decoding complexity. The presented architecture has a 14 times higher area efficiency, increases the energy efficiency by factor 2.5 and reduces the latency by factor of 3.5 compared to a state-of-the-art architecture

    Design and implementation of decoders for error correction in high-speed communication systems

    Full text link
    This thesis is focused on the design and implementation of binary low-density parity-check (LDPC) code decoders for high-speed modern communication systems. The basic of LDPC codes and the performance and bottlenecks, in terms of complexity and hardware efficiency, of the main soft-decision and hard-decision decoding algorithms (such as Min-Sum, Optimized 2-bit Min-Sum and Reliability-based iterative Majority-Logic) are analyzed. The complexity and performance of those algorithms are improved to allow efficient hardware architectures. A new decoding algorithm called One-Minimum Min-Sum is proposed. It reduces considerably the complexity of the check node update equations of the Min-Sum algorithm. The second minimum is estimated from the first minimum value by a means of a linear approximation that allows a dynamic adjustment. The Optimized 2-bit Min-Sum algorithm is modified to initialize it with the complete LLR values and to introduce the extrinsic information in the messages sent from the variable nodes. Its variable node equation is reformulated to reduce its complexity. Both algorithms were tested for the (2048,1723) RS-based LDPC code and (16129,15372) LDPC code using an FPGA-based hardware emulator. They exhibit BER performance very close to Min-Sum algorithm and do not introduce early error-floor. In order to show the hardware advantages of the proposed algorithms, hardware decoders were implemented in a 90 nm CMOS process and FPGA devices based on two types of architectures: full-parallel and partial-parallel one with horizontal layered schedule. The results show that the decoders are more area-time efficient than other published decoders and that the low-complexity of the Modified Optimized 2-bit Min-Sum allows the implementation of 10 Gbps decoders in current FPGA devices. Finally, a new hard-decision decoding algorithm, the Historical-Extrinsic Reliability-Based Iterative Decoder, is presented. This algorithm introduces the new idea of considering hard-decision votes as soft-decision to compute the extrinsic information of previous iterations. It is suitable for high-rate codes and improves the BER performance of the previous RBI-MLGD algorithms, with similar complexity.Esta tesis se ha centrado en el diseño e implementación de decodificadores binarios basados en códigos de comprobación de paridad de baja densidad (LDPC) válidos para los sistemas de comunicación modernos de alta velocidad. Los conceptos básicos de códigos LDPC, sus prestaciones y cuellos de botella, en términos de complejidad y eficiencia hardware, fueron analizados para los principales algoritmos de decisión soft y decisión hard (como Min-Sum, Optimized 2-bit Min-Sum y Reliability-based iterative Majority-Logic). La complejidad y prestaciones de estos algoritmos se han mejorado para conseguir arquitecturas hardware eficientes. Se ha propuesto un nuevo algoritmo de decodificación llamado One-Minimum Min-Sum. Éste reduce considerablemente la complejidad de las ecuaciones de actualización del nodo de comprobación del algoritmo Min-Sum. El segundo mínimo se ha estimado a partir del valor del primer mínimo por medio de una aproximación lineal, la cuál permite un ajuste dinámico. El algoritmo Optimized 2-bit Min-Sum se ha modificado para ser inicializado con los valores LLR e introducir la información extrínseca en los mensajes enviados desde los nodos variables. La ecuación del nodo variable de este algoritmo ha sido reformulada para reducir su complejidad. Ambos algoritmos fueron probados para el código (2048,1723) RS-based LDPC y para el código (16129,15372) LDPC utilizando un emulador hardware implementado en un dispositivo FPGA. Éstos han alcanzado unas prestaciones de BER muy cercanas a las del algoritmo Min-Sum evitando, además, la aparición temprana del fenómeno denominado suelo del error. Con el objetivo de mostrar las ventajas hardware de los algoritmos propuestos, los decodificadores se implementaron en hardware utilizando tecnología CMOS de 90 nm y en dispositivos FPGA basados en dos tipos de arquitecturas: completamente paralela y parcialmente paralela utilizando el método de actualización por capas horizontales. Los resultados muestran que los decodificadores propuestos e implementados son más eficientes en área-tiempo que otros decodificadores publicados y que la baja complejidad del algoritmo Modified Optimized 2-bit Min-Sum permite la implementación de decodificadores en los dispositivos FPGA actuales consiguiendo una tasa de 10 Gbps. Finalmente, se ha presentado un nuevo algoritmo de decodificación de decisión hard, el Historical-Extrinsic Reliability-Based Iterative Decoder. Este algoritmo introduce la nueva idea de considerar los votos de decisión hard como decisión soft para calcular la información extrínseca de iteracions anteriores. Este algoritmo es adecuado para códigos de alta velocidad y mejora el rendimiento BER de los algoritmos RBI-MLGD anteriores, con una complejidad similar.Aquesta tesi s'ha centrat en el disseny i implementació de descodificadors binaris basats en codis de comprovació de paritat de baixa densitat (LDPC) vàlids per als sistemes de comunicació moderns d'alta velocitat. Els conceptes bàsics de codis LDPC, les seues prestacions i colls de botella, en termes de complexitat i eficiència hardware, van ser analitzats pels principals algoritmes de decisió soft i decisió hard (com el Min-Sum, Optimized 2-bit Min-Sum y Reliability-based iterative Majority-Logic). La complexitat i prestacions d'aquests algoritmes s'han millorat per aconseguir arquitectures hardware eficients. S'ha proposat un nou algoritme de descodificació anomenat One-Minimum Min-Sum. Aquest redueix considerablement la complexitat de les equacions d'actualització del node de comprovació del algoritme Min-Sum. El segon mínim s'ha estimat a partir del valor del primer mínim per mitjà d'una aproximació lineal, la qual permet un ajust dinàmic. L'algoritme Optimized 2-bit Min-Sum s'ha modificat per ser inicialitzat amb els valors LLR i introduir la informació extrínseca en els missatges enviats des dels nodes variables. L'equació del node variable d'aquest algoritme ha sigut reformulada per reduir la seva complexitat. Tots dos algoritmes van ser provats per al codi (2048,1723) RS-based LDPC i per al codi (16129,15372) LDPC utilitzant un emulador hardware implementat en un dispositiu FPGA. Aquests han aconseguit unes prestacions BER molt properes a les del algoritme Min-Sum evitant, a més, l'aparició primerenca del fenomen denominat sòl de l'error. Per tal de mostrar els avantatges hardware dels algoritmes proposats, els descodificadors es varen implementar en hardware utilitzan una tecnologia CMOS d'uns 90 nm i en dispositius FPGA basats en dos tipus d'arquitectures: completament paral·lela i parcialment paral·lela utilitzant el mètode d'actualització per capes horitzontals. Els resultats mostren que els descodificadors proposats i implementats són més eficients en àrea-temps que altres descodificadors publicats i que la baixa complexitat del algoritme Modified Optimized 2-bit Min-Sum permet la implementació de decodificadors en els dispositius FPGA actuals obtenint una taxa de 10 Gbps. Finalment, s'ha presentat un nou algoritme de descodificació de decisió hard, el Historical-Extrinsic Reliability-Based Iterative Decoder. Aquest algoritme presenta la nova idea de considerar els vots de decisió hard com decisió soft per calcular la informació extrínseca d'iteracions anteriors. Aquest algoritme és adequat per als codis d'alta taxa i millora el rendiment BER dels algoritmes RBI-MLGD anteriors, amb una complexitat similar.Català Pérez, JM. (2017). Design and implementation of decoders for error correction in high-speed communication systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/86152TESI

    VLSI algorithms and architectures for non-binary-LDPC decoding

    Full text link
    Tesis por compendio[EN] This thesis studies the design of low-complexity soft-decision Non-Binary Low-Density Parity-Check (NB-LDPC) decoding algorithms and their corresponding hardware architectures suitable for decoding high-rate codes at high throughput (hundreds of Mbps and Gbps). In the first part of the thesis the main aspects concerning to the NB-LDPC codes are analyzed, including a study of the main bottlenecks of conventional softdecision decoding algorithms (Q-ary Sum of Products (QSPA), Extended Min-Sum (EMS), Min-Max and Trellis-Extended Min-Sum (T-EMS)) and their corresponding hardware architectures. Despite the limitations of T-EMS algorithm (high complexity in the Check Node (CN) processor, wiring congestion due to the high number of exchanged messages between processors and the inability to implement decoders over high-order Galois fields due to the high decoder complexity), it was selected as starting point for this thesis due to its capability to reach high-throughput. Taking into account the identified limitations of the T-EMS algorithm, the second part of the thesis includes six papers with the results of the research made in order to mitigate the T-EMS disadvantages, offering solutions that reduce the area, the latency and increase the throughput compared to previous proposals from literature without sacrificing coding gain. Specifically, five low-complexity decoding algorithms are proposed, which introduce simplifications in different parts of the decoding process. Besides, five complete decoder architectures are designed and implemented on a 90nm Complementary Metal-Oxide-Semiconductor (CMOS) technology. The results show an achievement in throughput higher than 1Gbps and an area less than 10 mm2. The increase in throughput is 120% and the reduction in area is 53% compared to previous implementations of T-EMS, for the (837,726) NB-LDPC code over GF(32). The proposed decoders reduce the CN area, latency, wiring between CN and Variable Node (VN) processor and the number of storage elements required in the decoder. Considering that these proposals improve both area and speed, the efficiency parameter (Mbps / Million NAND gates) is increased in almost five times compared to other proposals from literature. The improvements in terms of area allow us to implement NB-LDPC decoders over high-order fields which had not been possible until now due to the highcomplexity of decoders previously proposed in literature. Therefore, we present the first post-place and route report for high-rate codes over high-order fields higher than Galois Field (GF)(32). For example, for the (1536,1344) NB-LDPC code over GF(64) the throughput is 1259Mbps occupying an area of 28.90 mm2. On the other hand, a decoder architecture is implemented on a Field Programmable Gate Array (FPGA) device achieving 630 Mbps for the high-rate (2304,2048) NB-LDPC code over GF(16). To the best knowledge of the author, these results constitute the highest ones presented in literature for similar codes and implemented on the same technologies.[ES] En esta tesis se aborda el estudio del diseño de algoritmos de baja complejidad para la decodificación de códigos de comprobación de paridad de baja densidad no binarios (NB-LDPC) y sus correspondientes arquitecturas apropiadas para decodificar códigos de alta tasa a altas velocidades (cientos de Mbps y Gbps). En la primera parte de la tesis los principales aspectos concernientes a los códigos NB-LDPC son analizados, incluyendo un estudio de los principales cuellos de botella presentes en los algoritmos de decodificación convencionales basados en decisión blanda (QSPA, EMS, Min-Max y T-EMS) y sus correspondientes arquitecturas hardware. A pesar de las limitaciones del algoritmo T-EMS (alta complejidad en el procesador del nodo de chequeo de paridad (CN), congestión en el rutado debido al intercambio de mensajes entre procesadores y la incapacidad de implementar decodificadores para campos de Galois de orden elevado debido a la elevada complejidad), éste fue seleccionado como punto de partida para esta tesis debido a su capacidad para alcanzar altas velocidades. Tomando en cuenta las limitaciones identificadas en el algoritmo T-EMS, la segunda parte de la tesis incluye seis artículos con los resultados de la investigación realizada con la finalidad de mitigar las desventajas del algoritmo T-EMS, ofreciendo soluciones que reducen el área, la latencia e incrementando la velocidad comparado con propuestas previas de la literatura sin sacrificar la ganancia de codificación. Especificamente, cinco algoritmos de decodificación de baja complejidad han sido propuestos, introduciendo simplificaciones en diferentes partes del proceso de decodificación. Además, arquitecturas completas de decodificadores han sido diseñadas e implementadas en una tecnologia CMOS de 90nm consiguiéndose una velocidad mayor a 1Gbps con un área menor a 10 mm2, aumentando la velocidad en 120% y reduciendo el área en 53% comparado con previas implementaciones del algoritmo T-EMS para el código (837,726) implementado sobre campo de Galois GF(32). Las arquitecturas propuestas reducen el área del CN, latencia, número de mensajes intercambiados entre el nodo de comprobación de paridad (CN) y el nodo variable (VN) y el número de elementos de almacenamiento en el decodificador. Considerando que estas propuestas mejoran tanto el área comola velocidad, el parámetro de eficiencia (Mbps / Millones de puertas NAND) se ha incrementado en casi cinco veces comparado con otras propuestas de la literatura. Las mejoras en términos de área nos ha permitido implementar decodificadores NBLDPC sobre campos de Galois de orden elevado, lo cual no habia sido posible hasta ahora debido a la alta complejidad de los decodificadores anteriormente propuestos en la literatura. Por lo tanto, en esta tesis se presentan los primeros resultados incluyendo el emplazamiento y rutado para códigos de alta tasa sobre campos finitos de orden mayor a GF(32). Por ejemplo, para el código (1536,1344) sobre GF(64) la velocidad es 1259 Mbps ocupando un área de 28.90 mm2. Por otro lado, una arquitectura de decodificador ha sido implementada en un dispositivo FPGA consiguiendo 660 Mbps de velocidad para el código de alta tasa (2304,2048) sobre GF(16). Estos resultados constituyen, según el mejor conocimiento del autor, los mayores presentados en la literatura para códigos similares implementados para las mismas tecnologías.[CA] En esta tesi s'aborda l'estudi del disseny d'algoritmes de baixa complexitat per a la descodificació de codis de comprovació de paritat de baixa densitat no binaris (NB-LDPC), i les seues corresponents arquitectures per a descodificar codis d'alta taxa a altes velocitats (centenars de Mbps i Gbps). En la primera part de la tesi els principals aspectes concernent als codis NBLDPC són analitzats, incloent un estudi dels principals colls de botella presents en els algoritmes de descodificació convencionals basats en decisió blana (QSPA, EMS, Min-Max i T-EMS) i les seues corresponents arquitectures. A pesar de les limitacions de l'algoritme T-EMS (alta complexitat en el processador del node de revisió de paritat (CN), congestió en el rutat a causa de l'intercanvi de missatges entre processadors i la incapacitat d'implementar descodificadors per a camps de Galois d'orde elevat a causa de l'elevada complexitat), este va ser seleccionat com a punt de partida per a esta tesi degut a la seua capacitat per a aconseguir altes velocitats. Tenint en compte les limitacions identificades en l'algoritme T-EMS, la segona part de la tesi inclou sis articles amb els resultats de la investigació realitzada amb la finalitat de mitigar els desavantatges de l'algoritme T-EMS, oferint solucions que redueixen l'àrea, la latència i incrementant la velocitat comparat amb propostes prèvies de la literatura sense sacrificar el guany de codificació. Específicament, s'han proposat cinc algoritmes de descodificació de baixa complexitat, introduint simplificacions en diferents parts del procés de descodificació. A més, s'han dissenyat arquitectures completes de descodificadors i s'han implementat en una tecnologia CMOS de 90nm aconseguint-se una velocitat major a 1Gbps amb una àrea menor a 10 mm2, augmentant la velocitat en 120% i reduint l'àrea en 53% comparat amb prèvies implementacions de l'algoritme T-EMS per al codi (837,726) implementat sobre camp de Galois GF(32). Les arquitectures proposades redueixen l'àrea del CN, la latència, el nombre de missatges intercanviats entre el node de comprovació de paritat (CN) i el node variable (VN) i el nombre d'elements d'emmagatzemament en el descodificador. Considerant que estes propostes milloren tant l'àrea com la velocitat, el paràmetre d'eficiència (Mbps / Milions deportes NAND) s'ha incrementat en quasi cinc vegades comparat amb altres propostes de la literatura. Les millores en termes d'àrea ens ha permès implementar descodificadors NBLDPC sobre camps de Galois d'orde elevat, la qual cosa no havia sigut possible fins ara a causa de l'alta complexitat dels descodificadors anteriorment proposats en la literatura. Per tant, nosaltres presentem els primers reports després de l'emplaçament i rutat per a codis d'alta taxa sobre camps finits d'orde major a GF(32). Per exemple, per al codi (1536,1344) sobre GF(64) la velocitat és 1259 Mbps ocupant una àrea de 28.90 mm2. D'altra banda, una arquitectura de descodificador ha sigut implementada en un dispositiu FPGA aconseguint 660 Mbps de velocitat per al codi d'alta taxa (2304,2048) sobre GF(16). Estos resultats constitueixen, per al millor coneixement de l'autor, els millors presentats en la literatura per a codis semblants implementats per a les mateixes tecnologies.Lacruz Jucht, JO. (2016). VLSI algorithms and architectures for non-binary-LDPC decoding [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/73266TESISCompendi

    SCA-LDPC: A Code-Based Framework for Key-Recovery Side-Channel Attacks on Post-Quantum Encryption Schemes

    Get PDF
    Whereas theoretical attacks on standardized crypto primitives rarely lead to actual practical attacks, the situation is different for side-channel attacks. Improvements in the performance of side-channel attacks are of utmost importance. In this paper, we propose a framework to be used in key-recovery side-channel attacks on CCA-secure post-quantum encryption schemes. The basic idea is to construct chosen ciphertext queries to a plaintext checking oracle that collects information on a set of secret variables in a single query. Then a large number of such queries is considered, each related to a different set of secret variables, and they are modeled as a low-density parity-check code (LDPC code). Secret variables are finally determined through efficient iterative decoding methods, such as belief propagation, using soft information. The utilization of LDPC codes offers efficient decoding, source compression, and error correction benefits. It has been demonstrated that this approach provides significant improvements compared to previous work by reducing the required number of queries, such as the number of traces in a power attack. The framework is demonstrated and implemented in two different cases. On one hand, we attack implementations of HQC in a timing attack, lowering the number of required traces considerably compared to attacks in previous work. On the other hand, we describe and implement a full attack on a masked implementation of Kyber using power analysis. Using the ChipWhisperer evaluation platform, our real-world attacks recover the long-term secret key of a first-order masked implementation of Kyber-768 with an average of only 12 power traces

    Flash Memory Devices

    Get PDF
    Flash memory devices have represented a breakthrough in storage since their inception in the mid-1980s, and innovation is still ongoing. The peculiarity of such technology is an inherent flexibility in terms of performance and integration density according to the architecture devised for integration. The NOR Flash technology is still the workhorse of many code storage applications in the embedded world, ranging from microcontrollers for automotive environment to IoT smart devices. Their usage is also forecasted to be fundamental in emerging AI edge scenario. On the contrary, when massive data storage is required, NAND Flash memories are necessary to have in a system. You can find NAND Flash in USB sticks, cards, but most of all in Solid-State Drives (SSDs). Since SSDs are extremely demanding in terms of storage capacity, they fueled a new wave of innovation, namely the 3D architecture. Today “3D” means that multiple layers of memory cells are manufactured within the same piece of silicon, easily reaching a terabit capacity. So far, Flash architectures have always been based on "floating gate," where the information is stored by injecting electrons in a piece of polysilicon surrounded by oxide. On the contrary, emerging concepts are based on "charge trap" cells. In summary, flash memory devices represent the largest landscape of storage devices, and we expect more advancements in the coming years. This will require a lot of innovation in process technology, materials, circuit design, flash management algorithms, Error Correction Code and, finally, system co-design for new applications such as AI and security enforcement

    Design Techniques for Energy-Quality Scalable Digital Systems

    Get PDF
    Energy efficiency is one of the key design goals in modern computing. Increasingly complex tasks are being executed in mobile devices and Internet of Things end-nodes, which are expected to operate for long time intervals, in the orders of months or years, with the limited energy budgets provided by small form-factor batteries. Fortunately, many of such tasks are error resilient, meaning that they can toler- ate some relaxation in the accuracy, precision or reliability of internal operations, without a significant impact on the overall output quality. The error resilience of an application may derive from a number of factors. The processing of analog sensor inputs measuring quantities from the physical world may not always require maximum precision, as the amount of information that can be extracted is limited by the presence of external noise. Outputs destined for human consumption may also contain small or occasional errors, thanks to the limited capabilities of our vision and hearing systems. Finally, some computational patterns commonly found in domains such as statistics, machine learning and operational research, naturally tend to reduce or eliminate errors. Energy-Quality (EQ) scalable digital systems systematically trade off the quality of computations with energy efficiency, by relaxing the precision, the accuracy, or the reliability of internal software and hardware components in exchange for energy reductions. This design paradigm is believed to offer one of the most promising solutions to the impelling need for low-energy computing. Despite these high expectations, the current state-of-the-art in EQ scalable design suffers from important shortcomings. First, the great majority of techniques proposed in literature focus only on processing hardware and software components. Nonetheless, for many real devices, processing contributes only to a small portion of the total energy consumption, which is dominated by other components (e.g. I/O, memory or data transfers). Second, in order to fulfill its promises and become diffused in commercial devices, EQ scalable design needs to achieve industrial level maturity. This involves moving from purely academic research based on high-level models and theoretical assumptions to engineered flows compatible with existing industry standards. Third, the time-varying nature of error tolerance, both among different applications and within a single task, should become more central in the proposed design methods. This involves designing “dynamic” systems in which the precision or reliability of operations (and consequently their energy consumption) can be dynamically tuned at runtime, rather than “static” solutions, in which the output quality is fixed at design-time. This thesis introduces several new EQ scalable design techniques for digital systems that take the previous observations into account. Besides processing, the proposed methods apply the principles of EQ scalable design also to interconnects and peripherals, which are often relevant contributors to the total energy in sensor nodes and mobile systems respectively. Regardless of the target component, the presented techniques pay special attention to the accurate evaluation of benefits and overheads deriving from EQ scalability, using industrial-level models, and on the integration with existing standard tools and protocols. Moreover, all the works presented in this thesis allow the dynamic reconfiguration of output quality and energy consumption. More specifically, the contribution of this thesis is divided in three parts. In a first body of work, the design of EQ scalable modules for processing hardware data paths is considered. Three design flows are presented, targeting different technologies and exploiting different ways to achieve EQ scalability, i.e. timing-induced errors and precision reduction. These works are inspired by previous approaches from the literature, namely Reduced-Precision Redundancy and Dynamic Accuracy Scaling, which are re-thought to make them compatible with standard Electronic Design Automation (EDA) tools and flows, providing solutions to overcome their main limitations. The second part of the thesis investigates the application of EQ scalable design to serial interconnects, which are the de facto standard for data exchanges between processing hardware and sensors. In this context, two novel bus encodings are proposed, called Approximate Differential Encoding and Serial-T0, that exploit the statistical characteristics of data produced by sensors to reduce the energy consumption on the bus at the cost of controlled data approximations. The two techniques achieve different results for data of different origins, but share the common features of allowing runtime reconfiguration of the allowed error and being compatible with standard serial bus protocols. Finally, the last part of the manuscript is devoted to the application of EQ scalable design principles to displays, which are often among the most energy- hungry components in mobile systems. The two proposals in this context leverage the emissive nature of Organic Light-Emitting Diode (OLED) displays to save energy by altering the displayed image, thus inducing an output quality reduction that depends on the amount of such alteration. The first technique implements an image-adaptive form of brightness scaling, whose outputs are optimized in terms of balance between power consumption and similarity with the input. The second approach achieves concurrent power reduction and image enhancement, by means of an adaptive polynomial transformation. Both solutions focus on minimizing the overheads associated with a real-time implementation of the transformations in software or hardware, so that these do not offset the savings in the display. For each of these three topics, results show that the aforementioned goal of building EQ scalable systems compatible with existing best practices and mature for being integrated in commercial devices can be effectively achieved. Moreover, they also show that very simple and similar principles can be applied to design EQ scalable versions of different system components (processing, peripherals and I/O), and to equip these components with knobs for the runtime reconfiguration of the energy versus quality tradeoff
    corecore