Search CORE

257 research outputs found

VThreads: A novel VLIW chip multiprocessor with hardware-assisted PThreads

Author: David Stevens (4254274)
Vassilios Chouliaras (1251600)
Vincent Dwyer (1251447)
Publication venue
Publication date: 01/01/2016
Field of study

We discuss VThreads, a novel VLIW CMP with hardware-assisted shared-memory Thread support. VThreads supports Instruction Level Parallelism via static multiple-issue and Thread Level Parallelism via hardware-assisted POSIX Threads along with extensive customization. It allows the instantiation of tightlycoupled streaming accelerators and supports up to 7-address Multiple-Input, Multiple-Output instruction extensions. VThreads is designed in technology-independent Register-Transfer-Level VHDL and prototyped on 40 nm and 28 nm Field-Programmable gate arrays. It was evaluated against a PThreads-based multiprocessor based on the Sparc-V8 ISA. On a 65 nm ASIC implementation VThreads achieves up to x7.2 performance increase on synthetic benchmarks, x5 on a parallel Mandelbrot implementation, 66% better on a threaded JPEG implementation, 79% better on an edge-detection benchmark and ~13% improvement on DES compared to the Leon3MP CMP. In the range of 2 to 8 cores VThreads demonstrates a post-route (statistical) power reduction between 65% to 57% at an area increase of 1.2%-10% for 1-8 cores, compared to a similarly-configured Leon3MP CMP. This combination of micro-architectural features, scalability, extensibility, hardware support for low-latency PThreads, power efficiency and area make the processor an attractive proposition for low-power, deeply-embedded applications requiring minimum OS support

Loughborough University Institutional Repository

Securing Arm Platform: From Software-Based To Hardware-Based Approaches

Author: Ning Zhenyu
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2020
Field of study

With the rapid proliferation of the ARM architecture on smart mobile phones and Internet of Things (IoT) devices, the security of ARM platform becomes an emerging problem. In recent years, the number of malware identified on ARM platforms, especially on Android, shows explosive growth. Evasion techniques are also used in these malware to escape from being detected by existing analysis systems. In our research, we first present a software-based mechanism to increase the accuracy of existing static analysis tools by reassembleable bytecode extraction. Our solution collects bytecode and data at runtime, and then reassemble them offline to help static analysis tools to reveal the hidden behavior in an application. Further, we implement a hardware-based transparent malware analysis framework for general ARM platforms to defend against the traditional evasion techniques. Our framework leverages hardware debugging features and Trusted Execution Environment (TEE) to achieve transparent tracing and debugging with reasonable overhead. To learn the security of the involved hardware debugging features, we perform a comprehensive study on the ARM debugging features and summarize the security implications. Based on the implications, we design a novel attack scenario that achieves privilege escalation via misusing the debugging features in inter-processor debugging model. The attack has raised our concern on the security of TEEs and Cyber-physical System (CPS). For a better understanding of the security of TEEs, we investigate the security of various TEEs on different architectures and platforms, and state the security challenges. A study of the deploying the TEEs on edge platform is also presented. For the security of the CPS, we conduct an analysis on the real-world traffic signal infrastructure and summarize the security problems

Digital Commons@Wayne State University

Multilevel Runtime Verification for Safety and Security Critical Cyber Physical Systems from a Model Based Engineering Perspective

Author: Gautham Smitha Muralidhar
Publication venue: VCU Scholars Compass
Publication date: 01/01/2020
Field of study

Advanced embedded system technology is one of the key driving forces behind the rapid growth of Cyber-Physical System (CPS) applications. CPS consists of multiple coordinating and cooperating components, which are often software-intensive and interact with each other to achieve unprecedented tasks. Such highly integrated CPSs have complex interaction failures, attack surfaces, and attack vectors that we have to protect and secure against. This dissertation advances the state-of-the-art by developing a multilevel runtime monitoring approach for safety and security critical CPSs where there are monitors at each level of processing and integration. Given that computation and data processing vulnerabilities may exist at multiple levels in an embedded CPS, it follows that solutions present at the levels where the faults or vulnerabilities originate are beneficial in timely detection of anomalies. Further, increasing functional and architectural complexity of critical CPSs have significant safety and security operational implications. These challenges are leading to a need for new methods where there is a continuum between design time assurance and runtime or operational assurance. Towards this end, this dissertation explores Model Based Engineering methods by which design assurance can be carried forward to the runtime domain, creating a shared responsibility for reducing the overall risk associated with the system at operation. Therefore, a synergistic combination of Verification & Validation at design time and runtime monitoring at multiple levels is beneficial in assuring safety and security of critical CPS. Furthermore, we realize our multilevel runtime monitor framework on hardware using a stream-based runtime verification language

VCU Scholars Compass

VThreads: A novel VLIW chip multiprocessor with hardware-assisted PThreads

Author: Agron
Andrews
Arvind
Brodersen
Chouliaras
Chouliaras
Chouliaras
Chouliaras
Colwell
Cong
D. Stevens
de Dinechin
De Micheli
Faraboschi
Gupta
Hubener
Kathail
Lin
Lin
Lübbers
Mandelbrot
Milward
Muck
Oliveira
Owaida
Papakonstantinou
Robert Thomson
Rooholamin
Schlansker
Stevens
Stevens
Thomson
Tullsen
V.A. Chouliaras
V.M. Dwyer
Villarreal
Watson
Windh
Ziavras
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Flexible Hardware-based Security-aware Mechanisms and Architectures

Author: Dessouky Ghada
Publication venue
Publication date: 01/01/2023
Field of study

For decades, software security has been the primary focus in securing our computing platforms. Hardware was always assumed trusted, and inherently served as the foundation, and thus the root of trust, of our systems. This has been further leveraged in developing hardware-based dedicated security extensions and architectures to protect software from attacks exploiting software vulnerabilities such as memory corruption. However, the recent outbreak of microarchitectural attacks has shaken these long-established trust assumptions in hardware entirely, thereby threatening the security of all of our computing platforms and bringing hardware and microarchitectural security under scrutiny. These attacks have undeniably revealed the grave consequences of hardware/microarchitecture security flaws to the entire platform security, and how they can even subvert the security guarantees promised by dedicated security architectures. Furthermore, they shed light on the sophisticated challenges particular to hardware/microarchitectural security; it is more critical (and more challenging) to extensively analyze the hardware for security flaws prior to production, since hardware, unlike software, cannot be patched/updated once fabricated. Hardware cannot reliably serve as the root of trust anymore, unless we develop and adopt new design paradigms where security is proactively addressed and scrutinized across the full stack of our computing platforms, at all hardware design and implementation layers. Furthermore, novel flexible security-aware design mechanisms are required to be incorporated in processor microarchitecture and hardware-assisted security architectures, that can practically address the inherent conflict between performance and security by allowing that the trade-off is configured to adapt to the desired requirements. In this thesis, we investigate the prospects and implications at the intersection of hardware and security that emerge across the full stack of our computing platforms and System-on-Chips (SoCs). On one front, we investigate how we can leverage hardware and its advantages, in contrast to software, to build more efficient and effective security extensions that serve security architectures, e.g., by providing execution attestation and enforcement, to protect the software from attacks exploiting software vulnerabilities. We further propose that they are microarchitecturally configured at runtime to provide different types of security services, thus adapting flexibly to different deployment requirements. On another front, we investigate how we can protect these hardware-assisted security architectures and extensions themselves from microarchitectural and software attacks that exploit design flaws that originate in the hardware, e.g., insecure resource sharing in SoCs. More particularly, we focus in this thesis on cache-based side-channel attacks, where we propose sophisticated cache designs, that fundamentally mitigate these attacks, while still preserving performance by enabling that the performance security trade-off is configured by design. We also investigate how these can be incorporated into flexible and customizable security architectures, thus complementing them to further support a wide spectrum of emerging applications with different performance/security requirements. Lastly, we inspect our computing platforms further beneath the design layer, by scrutinizing how the actual implementation of these mechanisms is yet another potential attack surface. We explore how the security of hardware designs and implementations is currently analyzed prior to fabrication, while shedding light on how state-of-the-art hardware security analysis techniques are fundamentally limited, and the potential for improved and scalable approaches

TUbiblio

tuprints

Recommended from our members

Repurposing Software Defenses with Specialized Hardware

Author: Sinha Kanad
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Computer security has largely been the domain of software for the last few decades. Although this approach has been moderately successful during this period, its problems have started becoming more apparent recently because of one primary reason — performance. Software solutions typically exact a significant toll in terms of program slowdown, especially when applied to large, complex software. In the past, when chips became exponentially faster, this growing burden could be accommodated almost for free. But as Moore’s law winds down, security-related slowdowns become more apparent, increasingly intolerable, and subsequently abandoned. As a result, the community has started looking elsewhere for continued protection, as attacks continue to become progressively more sophisticated. One way to mitigate this problem is to complement these defenses in hardware. Despite lacking the semantic perspective of high-level software, specialized hardware typically is not only faster, but also more energy-efficient. However, hardware vendors also have to factor in the cost of integrating security solutions from the perspective of effectiveness, longevity, and cost of development, while allaying the customer’s concerns of performance. As a result, although numerous hardware solutions have been proposed in the past, the fact that so few of them have actually transitioned into practice implies that they were unable to strike an optimal balance of the above qualities. This dissertation proposes the thesis that it is possible to add hardware features that complement and improve program security, traditionally provided by software, without requiring extensive modifications to existing hardware microarchitecture. As such, it marries the collective concerns of not only users and software developers, who demand performant but secure products, but also that of hardware vendors, since implementation simplicity directly relates to reduction in time and cost of development and deployment. To support this thesis, this dissertation discusses two hardware security features aimed at securing program code and data separately and details their full system implementations, and a study of a negative result where the design was deemed practically infeasible, given its high implementation complexity. Firstly, the dissertation discusses code protection by reviving instruction set randomization (ISR), an idea originally proposed for countering code injection and considered impractical in the face of modern attack vectors that employ reuse of existing program code (also known as code reuse attacks). With Polyglot, we introduce ISR with strong AES encryption along with basic code randomization that disallows code decryption at runtime, thus countering most forms of state-of-the-art dynamic code reuse attacks, that read the code at runtime prior to building the code reuse payload. Through various optimizations and corner case workarounds, we show how Polyglot enables code execution with minimal hardware changes while maintaining a small attack surface and incurring nominal overheads even when the code is strongly encrypted in the binary and memory. Next, the dissertation presents REST, a hardware primitive that allows programs to mark memory regions invalid for regular memory accesses. This is achieved simply by storing a large, pre-determined random value at those locations with a special store instruction and then, detecting incoming values at the data cache for matches to the predetermined value. Subsequently, we show how this primitive can be used to protect data from common forms of spatial and temporal memory safety attacks. Notably, because of the simplicity of the primitive, REST requires trivial microarchitectural modifications and hence, is easy to implement, and exhibits negligible performance overheads. Additionally, we demonstrate how it is able to provide practical heap safety even for legacy binaries. For the above proposals, we also detail their hardware implementations on FPGAs, and discuss how each fits within a complete multiprocess system. This serves to give the reader an idea of usage and deployment challenges on a broader scale that goes beyond just the technique’s effectiveness within the context of a single program. Lastly, the dissertation discusses an alternative to the virtual address space, that randomizes the sequence of addresses in a manner invisible to even the program, thus achieving transparent randomization of the entire address space at a very fine granularity. The biggest challenge is to achieve this with minimal microarchitectural changes while accommodating linear data structures in the program (e.g., arrays, structs), both of which are fundamentally based on a linear address space. As a result, this modified address space subsumes the benefits of most other spatial randomization schemes, with the additional benefit of ideally making traversal from one data structure to another impossible. Our study of this idea concludes that although valuable, current memory safety techniques are cheaper to implement and secure enough, so that there are no perceivable use cases for this model of address space safety

Columbia University Academic Commons

Low-Impact System Performance Analysis Using Hardware Assisted Tracing Techniques

Author: Sharma Suchakrapani Datt
Publication venue
Publication date: 01/02/2017
Field of study

RÉSUMÉ Les applications modernes sont difficiles à diagnostiquer avec les outils de débogage et de profilage traditionnels. Dans les systèmes de production, la première priorité est de minimiser la perturbation sur l'application cible. Les outils de traçage sont très appropriés pour l'étude des performances de tels systèmes car les événements sont enregistrés et l'analyse se fait a posteriori. Une des principales exigences des systèmes de traçage est le faible surcoût. L'activation d'un nombre réduit d'événements aide à respecter cette exigence, mais au prix de la diminution de la granularité de la trace. Dans cette thèse, nous présentons notre travail de recherche qui traite du problème de la granularité limitée des traces en maintenant un faible surcoût sur les applications cibles. Nous présentons de nouvelles techniques et algorithmes qui abordent le problème en se basant d'abord sur une approche de filtrage logiciel et de traçage coopératif, puis en explorant des mécanismes plus avancés de traçage matériel. Nous avons proposé une approche efficace de traçage conditionnel dans l'espace noyau et utilisateur qui se base sur des mécanismes de filtrages compilés en code natif. Afin d'atteindre l'objectif d'avoir une trace détaillée du système, nous expliquons que les processeurs modernes contiennent des blocs de traçage matériel qui n'ont pas encore été entièrement exploités dans le domaine du traçage. Nous caractérisons leur performance et nous analysons les paquets de traces, leur relation avec l'exécution du logiciel, et les possibilités de les utiliser pour une trace détaillée. Nous proposons des techniques à faible surcoût, assistées par le matériel, rendant possible une analyse détaillée permettant la détection des latences d'interruption et des appels systèmes. Nous présentons aussi une nouvelle technique qui se base sur les paquets de trace à bas niveau du processeur pour analyser efficacement les processus et les ressources utilisées dans une machine virtuelle. De plus, nous avons identifié et solutionné des problèmes reliés au traçage matériel en utilisant l'assistance logicielle du système d'exploitation, ouvrant ainsi la voie à des recherches plus approfondies sur les approches coopératives de traçage matériel-logiciel. Comme nos techniques sont axées sur les exigences du traçage à haute vitesse dans les systèmes embarqués et de production traitant des transactions à haute fréquence, nous avons constaté que nos progrès dans le domaine du traçage matériel-logiciel se sont avérés très utiles pour détecter la contention des ressources et la latence dans les systèmes.----------ABSTRACT Modern applications are becoming hard to diagnose using traditional debugging and profiling tools. For production systems the first priority is to have minimal disturbance on the target application. To analyze performance of such systems, tracing tools are imperative where events can be logged and analyzed post-execution. One of the key requirements for tracing solutions however, is low overhead. A generic solution can be to select only a few events to trace, but at the cost of trace granularity. In this thesis, we present our research work that deals with the problem of lack of high granularity in traces while maintaining a low-overhead on target applications. We present our new techniques and algorithms that approach the problem initially from a software filtering and co-operative tracing approach, and then explore more advanced hardware tracing mechanisms that can be used. We have proposed an efficient kernel and userspace conditional tracing approach, with an enhanced native compiled filtering mechanism. Continuing towards our goal to have a detailed trace of a system, we further discuss how modern processors contain new hardware tracing blocks that have not yet been fully explored and exploited in the tracing domain. We characterize their performance and analyze the trace packets, their relation with software executions and opportunities to utilize them for a detailed trace. We therefore propose low-overhead hardware assisted techniques that allow a fine grained instruction based interrupt and system call latency detection mechanism. We also present a new algorithm that shows how such low-level trace packets coming directly from the processor, can be effectively utilized to analyze even the processes or resources consumed inside a VM. We have also identified and improved upon issues related to hardware tracing itself using software assistance from operating systems thus laying out ground for further research in hardware-software co-operative tracing approaches. As our techniques are focused towards requirements of high speed tracing in embedded or production systems, catering high frequency transactions, we have found that our advancements in the hardware-software domain have proved to be invaluable in detecting resource contention and latency in systems

PolyPublie

Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

Author: Lee Doowon
Publication venue
Publication date
Field of study

Modern computer chips comprise many components, including microprocessor cores, memory modules, on-chip networks, and accelerators. Such system-on-chip (SoC) designs are deployed in a variety of computing devices: from internet-of-things, to smartphones, to personal computers, to data centers. In this dissertation, we discuss evasive errors in SoC designs and how these errors can be addressed efficiently. In particular, we focus on two types of errors: design bugs and permanent faults. Design bugs originate from the limited amount of time allowed for design verification and validation. Thus, they are often found in functional features that are rarely activated. Complete functional verification, which can eliminate design bugs, is extremely time-consuming, thus impractical in modern complex SoC designs. Permanent faults are caused by failures of fragile transistors in nano-scale semiconductor manufacturing processes. Indeed, weak transistors may wear out unexpectedly within the lifespan of the design. Hardware structures that reduce the occurrence of permanent faults incur significant silicon area or performance overheads, thus they are infeasible for most cost-sensitive SoC designs. To tackle and overcome these evasive errors efficiently, we propose to leverage the principle of decomposition to lower the complexity of the software analysis or the hardware structures involved. To this end, we present several decomposition techniques, specific to major SoC components. We first focus on microprocessor cores, by presenting a lightweight bug-masking analysis that decomposes a program into individual instructions to identify if a design bug would be masked by the program's execution. We then move to memory subsystems: there, we offer an efficient memory consistency testing framework to detect buggy memory-ordering behaviors, which decomposes the memory-ordering graph into small components based on incremental differences. We also propose a microarchitectural patching solution for memory subsystem bugs, which augments each core node with a small distributed programmable logic, instead of including a global patching module. In the context of on-chip networks, we propose two routing reconfiguration algorithms that bypass faulty network resources. The first computes short-term routes in a distributed fashion, localized to the fault region. The second decomposes application-aware routing computation into simple routing rules so to quickly find deadlock-free, application-optimized routes in a fault-ridden network. Finally, we consider general accelerator modules in SoC designs. When a system includes many accelerators, there are a variety of interactions among them that must be verified to catch buggy interactions. To this end, we decompose such inter-module communication into basic interaction elements, which can be reassembled into new, interesting tests. Overall, we show that the decomposition of complex software algorithms and hardware structures can significantly reduce overheads: up to three orders of magnitude in the bug-masking analysis and the application-aware routing, approximately 50 times in the routing reconfiguration latency, and 5 times on average in the memory-ordering graph checking. These overhead reductions come with losses in error coverage: 23% undetected bug-masking incidents, 39% non-patchable memory bugs, and occasionally we overlook rare patterns of multiple faults. In this dissertation, we discuss the ideas and their trade-offs, and present future research directions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147637/1/doowon_1.pd

Deep Blue Documents at the University of Michigan

Real-Time Trace Decoding and Monitoring for Safety and Security in Embedded Systems

Author: Wankler Hoppe Augusto
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2021
Field of study

Integrated circuits and systems can be found almost everywhere in today’s world. As their use increases, they need to be made safer and more perfor mant to meet current demands in processing power. FPGA integrated SoCs can provide the ideal trade-off between performance, adaptability, and energy usage. One of today’s vital challenges lies in updating existing fault tolerance techniques for these new systems while utilizing all available processing capa bilities, such as multi-core and heterogeneous processing units. Control-flow monitoring is one of the primary mechanisms described for error detection at the software architectural level for the highest grade of hazard level clas sifications (e.g., ASIL D) described in industry safety standards ISO-26262. Control-flow errors are also known to compose the majority of detected errors for ICs and embedded systems in safety-critical and risk-susceptible environ ments [5]. Software-based monitoring methods remain the most popular [6–8]. However, recent studies show that the overheads they impose make actual reliability gains negligible [9, 10]. This work proposes and demonstrates a new control flow checking method implemented in FPGA for multi-core embedded systems called control-flow trace checker (CFTC). CFTC uses existing trace and debug subsystems of modern processors to rebuild their execution states. It can iden tify any errors in real-time by comparing executed states to a set of permitted state transitions determined statically. This novel implementation weighs hardware resource trade-offs to target mul tiple independent tasks in multi-core embedded applications, as well as single core systems. The proposed system is entirely implemented in hardware and isolated from all monitored software components, requiring 2.4% of the target FPGA platform resources to protect an execution unit in its entirety. There fore, it avoids undesired overheads and maintains deterministic error detection latencies, which guarantees reliability improvements without impairing the target software system. Finally, CFTC is evaluated under different software i Resumo fault-injection scenarios, achieving detection rates of 100% of all control-flow errors to wrong destinations and 98% of all injected faults to program binaries. All detection times are further analyzed and precisely described by a model based on the monitor’s resources and speed and the software application’s control-flow structure and binary characteristics.Circuitos integrados estão presentes em quase todos sistemas complexos do mundo moderno. Conforme sua frequência de uso aumenta, eles precisam se tornar mais seguros e performantes para conseguir atender as novas demandas em potência de processamento. Sistemas em Chip integrados com FPGAs conseguem prover o balanço perfeito entre desempenho, adaptabilidade, e uso de energia. Um dos maiores desafios agora é a necessidade de atualizar técnicas de tolerância à falhas para estes novos sistemas, aproveitando os novos avanços em capacidade de processamento. Monitoramento de fluxo de controle é um dos principais mecanismos para a detecção de erros em nível de software para sistemas classificados como de alto risco (e.g. ASIL D), descrito em padrões de segurança como o ISO-26262. Estes erros são conhecidos por compor a maioria dos erros detectados em sistemas integrados [5]. Embora métodos de monitoramento baseados em software continuem sendo os mais populares [6–8], estudos recentes mostram que seus custos adicionais, em termos de performance e área, diminuem consideravelmente seus ganhos reais em confiabilidade [9, 10]. Propomos aqui um novo método de monitora mento de fluxo de controle implementado em FPGA para sistemas embarcados multi-core. Este método usa subsistemas de trace e execução de código para reconstruir o estado atual do processador, identificando erros através de com parações entre diferentes estados de execução da CPU. Propomos uma implementação que considera trade-offs no uso de recuros de sistema para monitorar múltiplas tarefas independetes. Nossa abordagem suporta o monitoramento de sistemas simples e também de sistemas multi-core multitarefa. Por fim, nossa técnica é totalmente implementada em hardware, evitando o uso de unidades de processamento de software que possa adicionar custos indesejáveis à aplicação em perda de confiabilidade. Propomos, assim, um mecanismo de verificação de fluxo de controle, escalável e extensível, para proteção de sistemas embarcados críticos e multi-core

KITopen

Lume 5.8