71 research outputs found

    Code density concerns for new architectures

    Full text link
    Reducing a program\u27s instruction count can improve cache behavior and bandwidth utilization, lower power consumption, and increase overall performance. Nonetheless, code density is an often overlooked feature in studying processor architectures. We hand-optimize an assembly language embedded benchmark for size on 21 different instruction set architectures, finding up to a factor of three difference in code sizes from ISA alone. We find that the architectural features that contribute most heavily to code density are instruction length, number of registers, availability of a zero register, bit-width, hardware divide units, number of instruction operands, and the availability of unaligned loads and stores. We extend our results to investigate operating system, compiler, and system library effects on code density. We find that the executable starting address, executable format, and system call interface all affect program size. While ISA effects are important, the efficiency of the entire system stack must be taken into account when developing a new dense instruction set architecture

    Offloading cryptographic services to the SIM card in smartphones

    Get PDF
    Smartphones have achieved ubiquitous presence in people’s everyday life as communication, entertainment and work tools. Touch screens and a variety of sensors offer a rich experience and make applications increasingly diverse, complex and resource demanding. Despite their continuous evolution and enhancements, mobile devices are still limited in terms of battery life, processing power, storage capacity and network bandwidth. Computation offloading stands out among the efforts to extend device capabilities and face the growing gap between demand and availability of resources. As most popular technologies, mobile devices are attractive targets for malicious at- tackers. They usually store sensitive private data of their owners and are increasingly used for security sensitive activities such as online banking or mobile payments. While computation offloading introduces new challenges to the protection of those assets, it is very uncommon to take security and privacy into account as the main optimization objectives of this technique. Mobile OS security relies heavily on cryptography. Available hardware and software cryptographic providers are usually designed to resist software attacks. This kind of protection is not enough when physical control over the device is lost. Secure elements, on the other hand, include a set of protections that make them physically tamper-resistant devices. This work proposes a computation offloading technique that prioritizes enhancing security capabilities in mobile phones by offloading cryptographic operations to the SIM card, the only universally present secure element in those devices. Our contributions include an architecture for this technique, a proof-of-concept prototype developed under Android OS and the results of a performance evaluation that was conducted to study its execution times and battery consumption. Despite some limitations, our approach proves to be a valid alternative to enhance security on any smartphone.Los smartphones están omnipresentes en la vida cotidiana de las personas como herramientas de comunicación, entretenimiento y trabajo. Las pantallas táctiles y una variedad de sensores ofrecen una experiencia superior y hacen que las aplicaciones sean cada vez más diversas, complejas y demanden más recursos. A pesar de su continua evolución y mejoras, los dispositivos móviles aún están limitados en duración de batería, poder de procesamiento, capacidad de almacenamiento y ancho de banda de red. Computation offloading se destaca entre los esfuerzos para ampliar las capacidades del dispositivo y combatir la creciente brecha entre demanda y disponibilidad de recursos. Como toda tecnología popular, los smartphones son blancos atractivos para atacantes maliciosos. Generalmente almacenan datos privados y se utilizan cada vez más para actividades sensibles como banca en línea o pagos móviles. Si bien computation offloading presenta nuevos desafíos al proteger esos activos, es muy poco común tomar seguridad y privacidad como los principales objetivos de optimización de dicha técnica. La seguridad del SO móvil depende fuertemente de la criptografía. Los servicios criptográficos por hardware y software disponibles suelen estar diseñados para resistir ataques de software, protección insuficiente cuando se pierde el control físico sobre el dispositivo. Los elementos seguros, en cambio, incluyen un conjunto de protecciones que los hacen físicamente resistentes a la manipulación. Este trabajo propone una técnica de computation offloading que prioriza mejorar las capacidades de seguridad de los teléfonos móviles descargando operaciones criptográficas a la SIM, único elemento seguro universalmente presente en los mismos. Nuestras contribuciones incluyen una arquitectura para esta técnica, un prototipo de prueba de concepto desarrollado bajo Android y los resultados de una evaluación de desempeño que estudia tiempos de ejecución y consumo de batería. A pesar de algunas limitaciones, nuestro enfoque demuestra ser una alternativa válida para mejorar la seguridad en cualquier smartphone

    Huffman-based Code Compression Techniques for Embedded Systems

    Get PDF

    Android Application Development for the Intel Platform

    Get PDF
    Computer scienc

    A configurable vector processor for accelerating speech coding algorithms

    Get PDF
    The growing demand for voice-over-packer (VoIP) services and multimedia-rich applications has made increasingly important the efficient, real-time implementation of low-bit rates speech coders on embedded VLSI platforms. Such speech coders are designed to substantially reduce the bandwidth requirements thus enabling dense multichannel gateways in small form factor. This however comes at a high computational cost which mandates the use of very high performance embedded processors. This thesis investigates the potential acceleration of two major ITU-T speech coding algorithms, namely G.729A and G.723.1, through their efficient implementation on a configurable extensible vector embedded CPU architecture. New scalar and vector ISAs were introduced which resulted in up to 80% reduction in the dynamic instruction count of both workloads. These instructions were subsequently encapsulated into a parametric, hybrid SISD (scalar processor)–SIMD (vector) processor. This work presents the research and implementation of the vector datapath of this vector coprocessor which is tightly-coupled to a Sparc-V8 compliant CPU, the optimization and simulation methodologies employed and the use of Electronic System Level (ESL) techniques to rapidly design SIMD datapaths

    Energy reconstruction on the LHC ATLAS TileCal upgraded front end: feasibility study for a sROD co-processing unit

    Get PDF
    Dissertation presented in ful lment of the requirements for the degree of: Master of Science in Physics 2016The Phase-II upgrade of the Large Hadron Collider at CERN in the early 2020s will enable an order of magnitude increase in the data produced, unlocking the potential for new physics discoveries. In the ATLAS detector, the upgraded Hadronic Tile Calorimeter (TileCal) Phase-II front end read out system is currently being prototyped to handle a total data throughput of 5.1 TB/s, from the current 20.4 GB/s. The FPGA based Super Read Out Driver (sROD) prototype must perform an energy reconstruction algorithm on 2.88 GB/s raw data, or 275 million events per second. Due to the very high level of pro ciency required and time consuming nature of FPGA rmware development, it may be more e ective to implement certain complex energy reconstruction and monitoring algorithms on a general purpose, CPU based sROD co-processor. Hence, the feasibility of a general purpose ARM System on Chip based co-processing unit (PU) for the sROD is determined in this work. A PCI-Express test platform was designed and constructed to link two ARM Cortex-A9 SoCs via their PCI-Express Gen-2 x1 interfaces. Test results indicate that the latency of the PCI-Express interface is su ciently low and the data throughput is superior to that of alternative interfaces such as Ethernet, for use as an interconnect for the SoCs to the sROD. CPU performance benchmarks were performed on ve ARM development platforms to determine the CPU integer, oating point and memory system performance as well as energy e ciency. To complement the benchmarks, Fast Fourier Transform and Optimal Filtering (OF) applications were also tested. Based on the test results, in order for the PU to process 275 million events per second with OF, within the 6 s timing budget of the ATLAS triggering system, a cluster of three Tegra-K1, Cortex-A15 SoCs connected to the sROD via a Gen-2 x8 PCI-Express interface would be suitable. A high level design for the PU is proposed which surpasses the requirements for the sROD co-processor and can also be used in a general purpose, high data throughput system, with 80 Gb/s Ethernet and 15 GB/s PCI-Express throughput, using four X-Gene SoCs

    Affordable techniques for dependable microprocessor design

    Get PDF
    As high computing power is available at an affordable cost, we rely on microprocessor-based systems for much greater variety of applications. This dependence indicates that a processor failure could have more diverse impacts on our daily lives. Therefore, dependability is becoming an increasingly important quality measure of microprocessors.;Temporary hardware malfunctions caused by unstable environmental conditions can lead the processor to an incorrect state. This is referred to as a transient error or soft error. Studies have shown that soft errors are the major source of system failures. This dissertation characterizes the soft error behavior on microprocessors and presents new microarchitectural approaches that can realize high dependability with low overhead.;Our fault injection studies using RISC processors have demonstrated that different functional blocks of the processor have distinct susceptibilities to soft errors. The error susceptibility information must be reflected in devising fault tolerance schemes for cost-sensitive applications. Considering the common use of on-chip caches in modern processors, we investigated area-efficient protection schemes for memory arrays. The idea of caching redundant information was exploited to optimize resource utilization for increased dependability. We also developed a mechanism to verify the integrity of data transfer from lower level memories to the primary caches. The results of this study show that by exploiting bus idle cycles and the information redundancy, an almost complete check for the initial memory data transfer is possible without incurring a performance penalty.;For protecting the processor\u27s control logic, which usually remains unprotected, we propose a low-cost reliability enhancement strategy. We classified control logic signals into static and dynamic control depending on their changeability, and applied various techniques including commit-time checking, signature caching, component-level duplication, and control flow monitoring. Our schemes can achieve more than 99% coverage with a very small hardware addition. Finally, a virtual duplex architecture for superscalar processors is presented. In this system-level approach, the processor pipeline is backed up by a partially replicated pipeline. The replication-based checker minimizes the design and verification overheads. For a large-scale superscalar processor, the proposed architecture can bring 61.4% reduction in die area while sustaining the maximum performance

    Cross-core Microarchitectural Attacks and Countermeasures

    Get PDF
    In the last decade, multi-threaded systems and resource sharing have brought a number of technologies that facilitate our daily tasks in a way we never imagined. Among others, cloud computing has emerged to offer us powerful computational resources without having to physically acquire and install them, while smartphones have almost acquired the same importance desktop computers had a decade ago. This has only been possible thanks to the ever evolving performance optimization improvements made to modern microarchitectures that efficiently manage concurrent usage of hardware resources. One of the aforementioned optimizations is the usage of shared Last Level Caches (LLCs) to balance different CPU core loads and to maintain coherency between shared memory blocks utilized by different cores. The latter for instance has enabled concurrent execution of several processes in low RAM devices such as smartphones. Although efficient hardware resource sharing has become the de-facto model for several modern technologies, it also poses a major concern with respect to security. Some of the concurrently executed co-resident processes might in fact be malicious and try to take advantage of hardware proximity. New technologies usually claim to be secure by implementing sandboxing techniques and executing processes in isolated software environments, called Virtual Machines (VMs). However, the design of these isolated environments aims at preventing pure software- based attacks and usually does not consider hardware leakages. In fact, the malicious utilization of hardware resources as covert channels might have severe consequences to the privacy of the customers. Our work demonstrates that malicious customers of such technologies can utilize the LLC as the covert channel to obtain sensitive information from a co-resident victim. We show that the LLC is an attractive resource to be targeted by attackers, as it offers high resolution and, unlike previous microarchitectural attacks, does not require core-colocation. Particularly concerning are the cases in which cryptography is compromised, as it is the main component of every security solution. In this sense, the presented work does not only introduce three attack variants that can be applicable in different scenarios, but also demonstrates the ability to recover cryptographic keys (e.g. AES and RSA) and TLS session messages across VMs, bypassing sandboxing techniques. Finally, two countermeasures to prevent microarchitectural attacks in general and LLC attacks in particular from retrieving fine- grain information are presented. Unlike previously proposed countermeasures, ours do not add permanent overheads in the system but can be utilized as preemptive defenses. The first identifies leakages in cryptographic software that can potentially lead to key extraction, and thus, can be utilized by cryptographic code designers to ensure the sanity of their libraries before deployment. The second detects microarchitectural attacks embedded into innocent-looking binaries, preventing them from being posted in official application repositories that usually have the full trust of the customer
    corecore