82 research outputs found

    Implicit Cache Lockdown on ARM: An Accidental Countermeasure to Cache-Timing Attacks

    Get PDF
    As Moore`s law continues to reduce the cost of computation at an exponential rate, embedded computing capabilities spread to ever-expanding application scenarios, such as smartphones, the Internet of Things, and automation, among many others. This trend has naturally caused the underlying technology to evolve and has introduced increasingly complex microarchitectures into embedded processors in attempts to optimize for performance. While other microarchitectures, like those used in personal computers, have been extensively studied, there has been relatively less research done on embedded microarchitectures. This is especially true in terms of their security, which is growing more important as widespread adoption increases. This thesis explores an undocumented cache behavior found in ARM Cortex processors that we call implicit cache lockdown. While it was presumably implemented for performance reasons, it has a large impact on the recently popular class of cybersecurity attacks that utilize cache-timing side-channels. These attacks leverage the underlying hardware, specifically, the small timing differences between algorithm executions due to CPU caches, to glean sensitive information from a victim process. Since the affected processors are found in an overwhelming majority of smart phones, this sensitive information can include cryptographic secrets, credit card information, and passwords. As the name implies, implicit cache lockdown limits the ability for an attacker to evict certain data from a CPU`s cache. Since this is precisely what known cache-timing attacks rely on, they are rendered ineffective in their current form. This thesis analyzes implicit cache lockdown in great detail, including the methodology we used to discover it, its implications on all existing cache-timing attacks, and how it can be circumvented by an attacker

    Power efficient resilient microarchitectures for PVT variability mitigation

    Get PDF
    Nowadays, the high power density and the process, voltage, and temperature variations became the most critical issues that limit the performance of the digital integrated circuits because of the continuous scaling of the fabrication technology. Dynamic voltage and frequency scaling technique is used to reduce the power consumption while different time relaxation techniques and error recovery microarchitectures are used to tolerate the process, voltage, and temperature variations. These techniques reduce the throughput by scaling down the frequency or flushing and restarting the errant pipeline. This thesis presents a novel resilient microarchitecture which is called ERSUT-based resilient microarchitecture to tolerate the induced delays generated by the voltage scaling or the process, voltage, and temperature variations. The resilient microarchitecture detects and recovers the induced errors without flushing the pipeline and without scaling down the operating frequency. An ERSUT-based resilient 16 × 16 bit MAC unit, implemented using Global Foundries 65 nm technology and ARM standard cells library, is introduced as a case study with 18.26% area overhead and up to 1.5x speedup. At the typical conditions, the maximum frequency of the conventional MAC unit is about 375 MHz while the resilient MAC unit operates correctly at a frequency up to 565 MHz. In case of variations, the resilient MAC unit tolerates induced delays up to 50% of the clock period while keeping its throughput equal to the conventional MAC unit’s maximum throughput. At 375 MHz, the resilient MAC unit is able to scale down the supply voltage from 1.2 V to 1.0 V saving about 29% of the power consumed by the conventional MAC unit. A double-edge-triggered microarchitecture is also introduced to reduce the power consumption extremely by reducing the frequency of the clock tree to the half while preserving the same maximum throughput. This microarchitecture is applied to different ISCAS’89 benchmark circuits in addition to the 16x16 bit MAC unit and the average power reduction of all these circuits is 63.58% while the average area overhead is 31.02%. All these circuits are designed using Global Foundries 65nm technology and ARM standard cells library. Towards the full automation of the ERSUT-based resilient microarchitecture, an ERSUT-based algorithm is introduced in C++ to accelerate the design process of the ERSUT-based microarchitecture. The developed algorithm reduces the design-time efforts dramatically and allows the ERSUT-based microarchitecture to be adopted by larger industrial designs. Depending on the ERSUT-based algorithm, a validation study about applying the ERSUT-based microarchitecture on the MAC unit and different ISCAS’89 benchmark circuits with different complexity weights is introduced. This study shows that 72% of these circuits tolerates more than 14% of their clock periods and 54.5% of these circuits tolerates more than 20% while 27% of these circuits tolerates more than 30%. Consequently, the validation study proves that the ERSUT-based resilient microarchitecture is a valid applicable solution for different circuits with different complexity weights

    An automated OpenCL FPGA compilation framework targeting a configurable, VLIW chip multiprocessor

    Get PDF
    Modern system-on-chips augment their baseline CPU with coprocessors and accelerators to increase overall computational capacity and power efficiency, and thus have evolved into heterogeneous systems. Several languages have been developed to enable this paradigm shift, including CUDA and OpenCL. This thesis discusses a unified compilation environment to enable heterogeneous system design through the use of OpenCL and a customised VLIW chip multiprocessor (CMP) architecture, known as the LE1. An LLVM compilation framework was researched and a prototype developed to enable the execution of OpenCL applications on the LE1 CPU. The framework fully automates the compilation flow and supports work-item coalescing to better utilise the CPU cores and alleviate the effects of thread divergence. This thesis discusses in detail both the software stack and target hardware architecture and evaluates the scalability of the proposed framework on a highly precise cycle-accurate simulator. This is achieved through the execution of 12 benchmarks across 240 different machine configurations, as well as further results utilising an incomplete development branch of the compiler. It is shown that the problems generally scale well with the LE1 architecture, up to eight cores, when the memory system becomes a serious bottleneck. Results demonstrate superlinear performance on certain benchmarks (x9 for the bitonic sort benchmark with 8 dual-issue cores) with further improvements from compiler optimisations (x14 for bitonic with the same configuration

    Design for Reliability and Low Power in Emerging Technologies

    Get PDF
    Die fortlaufende Verkleinerung von Transistor-Strukturgrößen ist einer der wichtigsten Antreiber für das Wachstum in der Halbleitertechnologiebranche. Seit Jahrzehnten erhöhen sich sowohl Integrationsdichte als auch Komplexität von Schaltkreisen und zeigen damit einen fortlaufenden Trend, der sich über alle modernen Fertigungsgrößen erstreckt. Bislang ging das Verkleinern von Transistoren mit einer Verringerung der Versorgungsspannung einher, was zu einer Reduktion der Leistungsaufnahme führte und damit eine gleichbleibenden Leistungsdichte sicherstellte. Doch mit dem Beginn von Strukturgrößen im Nanometerbreich verlangsamte sich die fortlaufende Skalierung. Viele Schwierigkeiten, sowie das Erreichen von physikalischen Grenzen in der Fertigung und Nicht-Idealitäten beim Skalieren der Versorgungsspannung, führten zu einer Zunahme der Leistungsdichte und, damit einhergehend, zu erschwerten Problemen bei der Sicherstellung der Zuverlässigkeit. Dazu zählen, unter anderem, Alterungseffekte in Transistoren sowie übermäßige Hitzeentwicklung, nicht zuletzt durch stärkeres Auftreten von Selbsterhitzungseffekten innerhalb der Transistoren. Damit solche Probleme die Zuverlässigkeit eines Schaltkreises nicht gefährden, werden die internen Signallaufzeiten üblicherweise sehr pessimistisch kalkuliert. Durch den so entstandenen zeitlichen Sicherheitsabstand wird die korrekte Funktionalität des Schaltkreises sichergestellt, allerdings auf Kosten der Performance. Alternativ kann die Zuverlässigkeit des Schaltkreises auch durch andere Techniken erhöht werden, wie zum Beispiel durch Null-Temperatur-Koeffizienten oder Approximate Computing. Wenngleich diese Techniken einen Großteil des üblichen zeitlichen Sicherheitsabstandes einsparen können, bergen sie dennoch weitere Konsequenzen und Kompromisse. Bleibende Herausforderungen bei der Skalierung von CMOS Technologien führen außerdem zu einem verstärkten Fokus auf vielversprechende Zukunftstechnologien. Ein Beispiel dafür ist der Negative Capacitance Field-Effect Transistor (NCFET), der eine beachtenswerte Leistungssteigerung gegenüber herkömmlichen FinFET Transistoren aufweist und diese in Zukunft ersetzen könnte. Des Weiteren setzen Entwickler von Schaltkreisen vermehrt auf komplexe, parallele Strukturen statt auf höhere Taktfrequenzen. Diese komplexen Modelle benötigen moderne Power-Management Techniken in allen Aspekten des Designs. Mit dem Auftreten von neuartigen Transistortechnologien (wie zum Beispiel NCFET) müssen diese Power-Management Techniken neu bewertet werden, da sich Abhängigkeiten und Verhältnismäßigkeiten ändern. Diese Arbeit präsentiert neue Herangehensweisen, sowohl zur Analyse als auch zur Modellierung der Zuverlässigkeit von Schaltkreisen, um zuvor genannte Herausforderungen auf mehreren Designebenen anzugehen. Diese Herangehensweisen unterteilen sich in konventionelle Techniken ((a), (b), (c) und (d)) und unkonventionelle Techniken ((e) und (f)), wie folgt: (a)\textbf{(a)} Analyse von Leistungszunahmen in Zusammenhang mit der Maximierung von Leistungseffizienz beim Betrieb nahe der Transistor Schwellspannung, insbesondere am optimalen Leistungspunkt. Das genaue Ermitteln eines solchen optimalen Leistungspunkts ist eine besondere Herausforderung bei Multicore Designs, da dieser sich mit den jeweiligen Optimierungszielsetzungen und der Arbeitsbelastung verschiebt. (b)\textbf{(b)} Aufzeigen versteckter Interdependenzen zwischen Alterungseffekten bei Transistoren und Schwankungen in der Versorgungsspannung durch „IR-drops“. Eine neuartige Technik wird vorgestellt, die sowohl Über- als auch Unterschätzungen bei der Ermittlung des zeitlichen Sicherheitsabstands vermeidet und folglich den kleinsten, dennoch ausreichenden Sicherheitsabstand ermittelt. (c)\textbf{(c)} Eindämmung von Alterungseffekten bei Transistoren durch „Graceful Approximation“, eine Technik zur Erhöhung der Taktfrequenz bei Bedarf. Der durch Alterungseffekte bedingte zeitlich Sicherheitsabstand wird durch Approximate Computing Techniken ersetzt. Des Weiteren wird Quantisierung verwendet um ausreichend Genauigkeit bei den Berechnungen zu gewährleisten. (d)\textbf{(d)} Eindämmung von temperaturabhängigen Verschlechterungen der Signallaufzeit durch den Betrieb nahe des Null-Temperatur Koeffizienten (N-ZTC). Der Betrieb bei N-ZTC minimiert temperaturbedingte Abweichungen der Performance und der Leistungsaufnahme. Qualitative und quantitative Vergleiche gegenüber dem traditionellen zeitlichen Sicherheitsabstand werden präsentiert. (e)\textbf{(e)} Modellierung von Power-Management Techniken für NCFET-basierte Prozessoren. Die NCFET Technologie hat einzigartige Eigenschaften, durch die herkömmliche Verfahren zur Spannungs- und Frequenzskalierungen zur Laufzeit (DVS/DVFS) suboptimale Ergebnisse erzielen. Dies erfordert NCFET-spezifische Power-Management Techniken, die in dieser Arbeit vorgestellt werden. (f)\textbf{(f)} Vorstellung eines neuartigen heterogenen Multicore Designs in NCFET Technologie. Das Design beinhaltet identische Kerne; Heterogenität entsteht durch die Anwendung der individuellen, optimalen Konfiguration der Kerne. Amdahls Gesetz wird erweitert, um neue system- und anwendungsspezifische Parameter abzudecken und die Vorzüge des neuen Designs aufzuzeigen. Die Auswertungen der vorgestellten Techniken werden mithilfe von Implementierungen und Simulationen auf Schaltkreisebene (gate-level) durchgeführt. Des Weiteren werden Simulatoren auf Systemebene (system-level) verwendet, um Multicore Designs zu implementieren und zu simulieren. Zur Validierung und Bewertung der Effektivität gegenüber dem Stand der Technik werden analytische, gate-level und system-level Simulationen herangezogen, die sowohl synthetische als auch reale Anwendungen betrachten

    Synthesis of Fault-Tolerant Embedded Systems with Checkpointing and Replication

    Get PDF
    We present an approach to the synthesis of fault-tolerant hard real-time systems for safety-critical applications. We use checkpointing with rollback recovery and active replication for tolerating transient faults. Processes are statically scheduled and communications are performed using the time-triggered protocol. Our synthesis approach decides the assignment of fault-tolerance policies to processes, the optimal placement of checkpoints and the mapping of processes to processors such that transient faults are tolerated and the timing constraints of the application are satisfied. We present several synthesis algorithms which are able to find fault-tolerant implementations given a limited amount of resources. The developed algorithms are evaluated using extensive experiments, including a real-life example. 1

    Phantom redundancy: a register transfer level technique for gracefully degradable data path synthesis

    Full text link

    Protein and Polysaccharide-Based Fiber Materials Generated from Ionic Liquids: A Review.

    Get PDF
    Natural biomacromolecules such as structural proteins and polysaccharides are composed of the basic building blocks of life: amino acids and carbohydrates. Understanding their molecular structure, self-assembly and interaction in solvents such as ionic liquids (ILs) is critical for unleashing a flora of new materials, revolutionizing the way we fabricate multi-structural and multi-functional systems with tunable physicochemical properties. Ionic liquids are superior to organic solvents because they do not produce unwanted by-products and are considered green substitutes because of their reusability. In addition, they will significantly improve the miscibility of biopolymers with other materials while maintaining the mechanical properties of the biopolymer in the final product. Understanding and controlling the physicochemical properties of biopolymers in ionic liquids matrices will be crucial for progress leading to the ability to fabricate robust multi-level structural 1D fiber materials. It will also help to predict the relationship between fiber conformation and protein secondary structures or carbohydrate crystallinity, thus creating potential applications for cell growth signaling, ionic conductivity, liquid diffusion and thermal conductivity, and several applications in biomedicine and environmental science. This will also enable the regeneration of biopolymer composite fiber materials with useful functionalities and customizable options critical for additive manufacturing. The specific capabilities of these fiber materials have been shown to vary based on their fabrication methods including electrospinning and post-treatments. This review serves to provide basic knowledge of these commonly utilized protein and polysaccharide biopolymers and their fiber fabrication methods from various ionic liquids, as well as the effect of post-treatments on these fiber materials and their applications in biomedical and pharmaceutical research, wound healing, environmental filters and sustainable and green chemistry research

    DESIGN, MANUFACTURE, AND IMPLEMENTATION OF 3D-PRINTED TISSUE ENGINEERING SCAFFOLDS USED WITH STROMAL VASCULAR CELLS FOR CRANIOFACIAL BONE REGENERATION

    Get PDF
    Craniofacial bone defects have poor outcomes under current treatments: implants become infected, loosen, and displace; bone grafts resorb and weaken over time; and there are no satisfactory pediatric options. To improve outcomes, this thesis develops a novel osteoinductive biomaterial, the methods to 3D-print the biomaterial, and the tools to design of that 3D-printed scaffold for the mechanical loads of the craniofacial skeleton. Further, it combines the scaffold with key regenerative agents – autologous stem cells – to facilitate boney regeneration in the implanted scaffold. The feasibility of the scaffold and cells approach is tested by implementation in preclinical models and assessment of bone and vascular outcomes. Aim 1: To create an osteoinductive biomaterial, trabecular bone was decellularized, cryo-milled, and mixed with polycaprolactone. This thermoplastic material mixture was then 3D-printed and demonstrated osteoinductive effects on cells. Aim 2: As regenerative autologous cells, the stromal vascular fraction of adipose tissue was isolated in a point-of-care manner and timeframe and the stem cell yield, surface markers, in vitro and in vivo regenerative potential for vascular and bone tissue was demonstrated. Aim 3: Then the means to design 3D-print the biomaterial with controlled tissue engineering properties – pore size and porosity – at human craniofacial scales and for human physiologic loads was developed and tested. Aim 4: Finally, the biomaterial, cells, and design and manufacturing were implemented in a patient-specific, large-animal, preclinical model of zygomatic arch regeneration in swine. Implant design and manufacture was successfully validated, and the implanted scaffolds and cells showed a substantial bone regenerating response compared to untreated controls
    corecore