266 research outputs found

    Hardware Obfuscation for Finite Field Algorithms

    Get PDF
    With the rise of computing devices, the security robustness of the devices has become of utmost importance. Companies invest huge sums of money, time and effort in security analysis and vulnerability testing of their software products. Bug bounty programs are held which incentivize security researchers for finding security holes in software. Once holes are found, software firms release security patches for their products. The semiconductor industry has flourished with accelerated innovation. Fabless manufacturing has reduced the time-to-market and lowered the cost of production of devices. Fabless paradigm has introduced trust issues among the hardware designers and manufacturers. Increasing dependence on computing devices in personal applications as well as in critical infrastructure has given a rise to hardware attacks on the devices in the last decade. Reverse engineering and IP theft are major challenges that have emerged for the electronics industry. Integrated circuit design companies experience a loss of billions of dollars because of malicious acts by untrustworthy parties involved in the design and fabrication process, and because of attacks by adversaries on the electronic devices in which the chips are embedded. To counter these attacks, researchers have been working extensively towards finding strong countermeasures. Hardware obfuscation techniques make the reverse engineering of device design and functionality difficult for the adversary. The goal is to conceal or lock the underlying intellectual property of the integrated circuit. Obfuscation in hardware circuits can be implemented to hide the gate-level design, layout and the IP cores. Our work presents a novel hardware obfuscation design through reconfigurable finite field arithmetic units, which can be employed in various error correction and cryptographic algorithms. The effectiveness and efficiency of the proposed methods are verified by an obfuscated Reformulated Inversion-less Berlekamp-Massey (RiBM) architecture based Reed-Solomon decoder. Our experimental results show the hardware implementation of RiBM based Reed-Solomon decoder built using reconfigurable field multiplier designs. The proposed design provides only very low overhead with improved security by obfuscating the functionality and the outputs. The design proposed in our work can also be implemented in hardware designs of other algorithms that are based on finite field arithmetic. However, our main motivation was to target encryption and decryption circuits which store and process sensitive data and are used in critical applications

    Optimization of multi-channel BCH error decoding for common cases

    Full text link

    Architectures for soft-decision decoding of non-binary codes

    Full text link
    En esta tesis se estudia el dise¿no de decodificadores no-binarios para la correcci'on de errores en sistemas de comunicaci'on modernos de alta velocidad. El objetivo es proponer soluciones de baja complejidad para los algoritmos de decodificaci'on basados en los c'odigos de comprobaci'on de paridad de baja densidad no-binarios (NB-LDPC) y en los c'odigos Reed-Solomon, con la finalidad de implementar arquitecturas hardware eficientes. En la primera parte de la tesis se analizan los cuellos de botella existentes en los algoritmos y en las arquitecturas de decodificadores NB-LDPC y se proponen soluciones de baja complejidad y de alta velocidad basadas en el volteo de s'¿mbolos. En primer lugar, se estudian las soluciones basadas en actualizaci'on por inundaci 'on con el objetivo de obtener la mayor velocidad posible sin tener en cuenta la ganancia de codificaci'on. Se proponen dos decodificadores diferentes basados en clipping y t'ecnicas de bloqueo, sin embargo, la frecuencia m'axima est'a limitada debido a un exceso de cableado. Por este motivo, se exploran algunos m'etodos para reducir los problemas de rutado en c'odigos NB-LDPC. Como soluci'on se propone una arquitectura basada en difusi'on parcial para algoritmos de volteo de s'¿mbolos que mitiga la congesti'on por rutado. Como las soluciones de actualizaci 'on por inundaci'on de mayor velocidad son sub-'optimas desde el punto de vista de capacidad de correci'on, decidimos dise¿nar soluciones para la actualizaci'on serie, con el objetivo de alcanzar una mayor velocidad manteniendo la ganancia de codificaci'on de los algoritmos originales de volteo de s'¿mbolo. Se presentan dos algoritmos y arquitecturas de actualizaci'on serie, reduciendo el 'area y aumentando de la velocidad m'axima alcanzable. Por 'ultimo, se generalizan los algoritmos de volteo de s'¿mbolo y se muestra como algunos casos particulares puede lograr una ganancia de codificaci'on cercana a los algoritmos Min-sum y Min-max con una menor complejidad. Tambi'en se propone una arquitectura eficiente, que muestra que el 'area se reduce a la mitad en comparaci'on con una soluci'on de mapeo directo. En la segunda parte de la tesis, se comparan algoritmos de decodificaci'on Reed- Solomon basados en decisi'on blanda, concluyendo que el algoritmo de baja complejidad Chase (LCC) es la soluci'on m'as eficiente si la alta velocidad es el objetivo principal. Sin embargo, los esquemas LCC se basan en la interpolaci'on, que introduce algunas limitaciones hardware debido a su complejidad. Con el fin de reducir la complejidad sin modificar la capacidad de correcci'on, se propone un esquema de decisi'on blanda para LCC basado en algoritmos de decisi'on dura. Por 'ultimo se dise¿na una arquitectura eficiente para este nuevo esquemaGarcía Herrero, FM. (2013). Architectures for soft-decision decoding of non-binary codes [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/33753TESISPremiad

    What broke where for distributed and parallel applications — a whodunit story

    Get PDF
    Detection, diagnosis and mitigation of performance problems in today\u27s large-scale distributed and parallel systems is a difficult task. These large distributed and parallel systems are composed of various complex software and hardware components. When the system experiences some performance or correctness problem, developers struggle to understand the root cause of the problem and fix in a timely manner. In my thesis, I address these three components of the performance problems in computer systems. First, we focus on diagnosing performance problems in large-scale parallel applications running on supercomputers. We developed techniques to localize the performance problem for root-cause analysis. Parallel applications, most of which are complex scientific simulations running in supercomputers, can create up to millions of parallel tasks that run on different machines and communicate using the message passing paradigm. We developed a highly scalable and accurate automated debugging tool called PRODOMETER, which uses sophisticated algorithms to first, create a logical progress dependency graph of the tasks to highlight how the problem spread through the system manifesting as a system-wide performance issue. Second, uses this logical progress dependence graph to identify the task where the problem originated. Finally, PRODOMETER pinpoints the code region corresponding to the origin of the bug. Second, we developed a tool-chain that can detect performance anomaly using machine-learning techniques and can achieve very low false positive rate. Our input-aware performance anomaly detection system consists of a scalable data collection framework to collect performance related metrics from different granularity of code regions, an offline model creation and prediction-error characterization technique, and a threshold based anomaly-detection-engine for production runs. Our system requires few training runs and can handle unknown inputs and parameter combinations by dynamically calibrating the anomaly detection threshold according to the characteristics of the input data and the characteristics of the prediction-error of the models. Third, we developed performance problem mitigation scheme for erasure-coded distributed storage systems. Repair operations of the failed blocks in erasure-coded distributed storage system take really long time in networked constrained data-centers. The reason being, during the repair operation for erasure-coded distributed storage, a lot of data from multiple nodes are gathered into a single node and then a mathematical operation is performed to reconstruct the missing part. This process severely congests the links toward the destination where newly recreated data is to be hosted. We proposed a novel distributed repair technique, called Partial-Parallel-Repair (PPR) that performs this reconstruction in parallel on multiple nodes and eliminates network bottlenecks, and as a result, greatly speeds up the repair process. Fourth, we study how for a class of applications, performance can be improved (or performance problems can be mitigated) by selectively approximating some of the computations. For many applications, the main computation happens inside a loop that can be logically divided into a few temporal segments, we call phases. We found that while approximating the initial phases might severely degrade the quality of the results, approximating the computation for the later phases have very small impact on the final quality of the result. Based on this observation, we developed an optimization framework that for a given budget of quality-loss, would find the best approximation settings for each phase in the execution

    ClusterRAID: Architecture and Prototype of a Distributed Fault-Tolerant Mass Storage System for Clusters

    Get PDF
    During the past few years clusters built from commodity off-the-shelf (COTS) components have emerged as the predominant supercomputer architecture. Typically comprising a collection of standard PCs or workstations and an interconnection network, they have replaced the traditionally used integrated systems due to their better price/performance ratio. As paradigms shift from mere computing intensive to I/O intensive applications, mass storage solutions for cluster installations become a more and more crucial aspect of these systems. The inherent unreliability of the underlying components is one of the reasons why no system has been established as a standard storage solution for clusters yet. This thesis sets out the architecture and prototype implementation of a novel distributed mass storage system for commodity off-the-shelf clusters and addresses the issue of the unreliable constituent components. The key concept of the presented system is the conversion of the local hard disk drive of a cluster node into a reliable device while preserving the block device interface. By the deployment of sophisticated erasure-correcting codes, the system allows the adjustment of the number of tolerable failures and thus the overall reliability. In addition, the applied data layout considers the access behaviour of a broad range of applications and minimizes the number of required network transactions. Extensive measurements and functionality tests of the prototype, both stand-alone and in conjunction with local or distributed file systems, show the validity of the concept

    Application of Multi-core and GPU Architectures on Signal Processing: Case Studies

    Get PDF
    In this article part of the techniques and developments we are carrying out within the INCO2 group are reported. Results follow the interdisciplinary approach with which we tackle signal processing applications. Chosen case studies show different stages of development: We present algorithms already completed which are being used in practical applications as well as new ideas that may represent a starting point, and which are expected to deliver good results in a short and medium term
    corecore