491 research outputs found

    Architectural Support for High-Performance, Power-Efficient and Secure Multiprocessor Systems

    Get PDF
    High performance systems have been widely adopted in many fields and the demand for better performance is constantly increasing. And the need of powerful yet flexible systems is also increasing to meet varying application requirements from diverse domains. Also, power efficiency in high performance computing has been one of the major issues to be resolved. The power density of core components becomes significantly higher, and the fraction of power supply in total management cost is dominant. Providing dependability is also a main concern in large-scale systems since more hardware resources can be abused by attackers. Therefore, designing high-performance, power-efficient and secure systems is crucial to provide adequate performance as well as reliability to users. Adhering to using traditional design methodologies for large-scale computing systems has a limit to meet the demand under restricted resource budgets. Interconnecting a large number of uniprocessor chips to build parallel processing systems is not an efficient solution in terms of performance and power. Chip multiprocessor (CMP) integrates multiple processing cores and caches on a chip and is thought of as a good alternative to previous design trends. In this dissertation, we deal with various design issues of high performance multiprocessor systems based on CMP to achieve both performance and power efficiency while maintaining security. First, we propose a fast and secure off-chip interconnects through minimizing network overheads and providing an efficient security mechanism. Second, we propose architectural support for fast and efficient memory protection in CMP systems, making the best use of the characteristics in CMP environments and multi-threaded workloads. Third, we propose a new router design for network-on-chip (NoC) based on a new memory technique. We introduce hybrid input buffers that use both SRAM and STT-MRAM for better performance as well as power efficiency. Simulation results show that the proposed schemes improve the performance of off-chip networks through reducing the message size by 54% on average. Also, the schemes diminish the overheads of bounds checking operations, thus enhancing the overall performance by 11% on average. Adopting hybrid buffers in NoC routers contributes to increasing the network throughput up to 21%

    Architectural Support for High-Performance, Power-Efficient and Secure Multiprocessor Systems

    Get PDF
    High performance systems have been widely adopted in many fields and the demand for better performance is constantly increasing. And the need of powerful yet flexible systems is also increasing to meet varying application requirements from diverse domains. Also, power efficiency in high performance computing has been one of the major issues to be resolved. The power density of core components becomes significantly higher, and the fraction of power supply in total management cost is dominant. Providing dependability is also a main concern in large-scale systems since more hardware resources can be abused by attackers. Therefore, designing high-performance, power-efficient and secure systems is crucial to provide adequate performance as well as reliability to users. Adhering to using traditional design methodologies for large-scale computing systems has a limit to meet the demand under restricted resource budgets. Interconnecting a large number of uniprocessor chips to build parallel processing systems is not an efficient solution in terms of performance and power. Chip multiprocessor (CMP) integrates multiple processing cores and caches on a chip and is thought of as a good alternative to previous design trends. In this dissertation, we deal with various design issues of high performance multiprocessor systems based on CMP to achieve both performance and power efficiency while maintaining security. First, we propose a fast and secure off-chip interconnects through minimizing network overheads and providing an efficient security mechanism. Second, we propose architectural support for fast and efficient memory protection in CMP systems, making the best use of the characteristics in CMP environments and multi-threaded workloads. Third, we propose a new router design for network-on-chip (NoC) based on a new memory technique. We introduce hybrid input buffers that use both SRAM and STT-MRAM for better performance as well as power efficiency. Simulation results show that the proposed schemes improve the performance of off-chip networks through reducing the message size by 54% on average. Also, the schemes diminish the overheads of bounds checking operations, thus enhancing the overall performance by 11% on average. Adopting hybrid buffers in NoC routers contributes to increasing the network throughput up to 21%

    Proximity coherence for chip-multiprocessors

    Get PDF
    Many-core architectures provide an efficient way of harnessing the growing numbers of transistors available in modern fabrication processes; however, the parallel programs run on these platforms are increasingly limited by the energy and latency costs of communication. Existing designs provide a functional communication layer but do not necessarily implement the most efficient solution for chip-multiprocessors, placing limits on the performance of these complex systems. In an era of increasingly power limited silicon design, efficiency is now a primary concern that motivates designers to look again at the challenge of cache coherence. The first step in the design process is to analyse the communication behaviour of parallel benchmark suites such as Parsec and SPLASH-2. This thesis presents work detailing the sharing patterns observed when running the full benchmarks on a simulated 32-core x86 machine. The results reveal considerable locality of shared data accesses between threads with consecutive operating system assigned thread IDs. This pattern, although of little consequence in a multi-node system, corresponds to strong physical locality of shared data between adjacent cores on a chip-multiprocessor platform. Traditional cache coherence protocols, although often used in chip-multiprocessor designs, have been developed in the context of older multi-node systems. By redesigning coherence protocols to exploit new patterns such as the physical locality of shared data, improving the efficiency of communication, specifically in chip-multiprocessors, is possible. This thesis explores such a design – Proximity Coherence – a novel scheme in which L1 load misses are optimistically forwarded to nearby caches via new dedicated links rather than always being indirected via a directory structure.EPSRC DTA research scholarshi

    The Design of a System Architecture for Mobile Multimedia Computers

    Get PDF
    This chapter discusses the system architecture of a portable computer, called Mobile Digital Companion, which provides support for handling multimedia applications energy efficiently. Because battery life is limited and battery weight is an important factor for the size and the weight of the Mobile Digital Companion, energy management plays a crucial role in the architecture. As the Companion must remain usable in a variety of environments, it has to be flexible and adaptable to various operating conditions. The Mobile Digital Companion has an unconventional architecture that saves energy by using system decomposition at different levels of the architecture and exploits locality of reference with dedicated, optimised modules. The approach is based on dedicated functionality and the extensive use of energy reduction techniques at all levels of system design. The system has an architecture with a general-purpose processor accompanied by a set of heterogeneous autonomous programmable modules, each providing an energy efficient implementation of dedicated tasks. A reconfigurable internal communication network switch exploits locality of reference and eliminates wasteful data copies

    Advanced Techniques for Improving the Efficacy of Digital Forensics Investigations

    Get PDF
    Digital forensics is the science concerned with discovering, preserving, and analyzing evidence on digital devices. The intent is to be able to determine what events have taken place, when they occurred, who performed them, and how they were performed. In order for an investigation to be effective, it must exhibit several characteristics. The results produced must be reliable, or else the theory of events based on the results will be flawed. The investigation must be comprehensive, meaning that it must analyze all targets which may contain evidence of forensic interest. Since any investigation must be performed within the constraints of available time, storage, manpower, and computation, investigative techniques must be efficient. Finally, an investigation must provide a coherent view of the events under question using the evidence gathered. Unfortunately the set of currently available tools and techniques used in digital forensic investigations does a poor job of supporting these characteristics. Many tools used contain bugs which generate inaccurate results; there are many types of devices and data for which no analysis techniques exist; most existing tools are woefully inefficient, failing to take advantage of modern hardware; and the task of aggregating data into a coherent picture of events is largely left to the investigator to perform manually. To remedy this situation, we developed a set of techniques to facilitate more effective investigations. To improve reliability, we developed the Forensic Discovery Auditing Module, a mechanism for auditing and enforcing controls on accesses to evidence. To improve comprehensiveness, we developed ramparser, a tool for deep parsing of Linux RAM images, which provides previously inaccessible data on the live state of a machine. To improve efficiency, we developed a set of performance optimizations, and applied them to the Scalpel file carver, creating order of magnitude improvements to processing speed and storage requirements. Last, to facilitate more coherent investigations, we developed the Forensic Automated Coherence Engine, which generates a high-level view of a system from the data generated by low-level forensics tools. Together, these techniques significantly improve the effectiveness of digital forensic investigations conducted using them

    Design and Performance of Scalable High-Performance Programmable Routers - Doctoral Dissertation, August 2002

    Get PDF
    The flexibility to adapt to new services and protocols without changes in the underlying hardware is and will increasingly be a key requirement for advanced networks. Introducing a processing component into the data path of routers and implementing packet processing in software provides this ability. In such a programmable router, a powerful processing infrastructure is necessary to achieve to level of performance that is comparable to custom silicon-based routers and to demonstrate the feasibility of this approach. This work aims at the general design of such programmable routers and, specifically, at the design and performance analysis of the processing subsystem. The necessity of programmable routers is motivated, and a router design is proposed. Based on the design, a general performance model is developed and quantitatively evaluated using a new network processor benchmark. Operational challenges, like scheduling of packets to processing engines, are addressed, and novel algorithms are presented. The results of this work give qualitative and quantitative insights into this new domain that combines issues from networking, computer architecture, and system design

    Effective power saving method by on-chip traffic compression in noc-based embedded systems

    Full text link
    [EN] of components, relying on an efficient on-chip network (network-on-chip; NoC). As the size of the system increases, NoC performance and power consumption become a central issue. In this project, we design compression strategies at the NoC level reducing the number of transmitted flits and consequently the energy consumed. The provided mechanism relies on the abundance of memory data blocks filled with zeros in the analysed applications, thus easily compressible by using a zero-elimination strategy. We provide a hardware implementation for both compression and decompression end points at a generic network interface (NI). The mechanisms have been designed in isolated mode in order to make them modular and easily adapted to any NI protocol. Results show the effectiveness of the compression and decompression mechanisms and the low overhead they introduce. The percentage of traffic reduced by the compression strategy (it is reduced by a factor of 3) justifies the added resources. This work reflects some parts of the main research directions we tackle in the wider PhD framework. In particular, we propose a method for power efficient memory traffic management. The work presented here represents the initial research directions in simulation development, traffic pattern characterization and initial solutions development[ES] Con los avances de la tecnologĂ­a, los sistemas en chip multiprocesador (MPSoC) aumentan en nĂșmero de componentes, apoyĂĄndose en una red en el chip (NoC) eficiente. SegĂșn crece el tamaño de estos sistemas, la eficiencia de la red tanto temporal como energĂ©tica se convierte en una parte primordial. En este proyecto diseñamos estrategias de compresiĂłn a nivel de red (en la NoC) reduciendo el nĂșmero de flits transmitidos y por tanto la energĂ­a consumida. El mĂ©todo propuesto se basa en la abundancia de bloques de memoria con largas cadenas de ceros que se detectaron en las aplicaciones analizadas. Esta abundancia de ceros facilita la compresiĂłn mediante estrategias de eliminaciĂłn de ceros. Ofrecemos una implementaciĂłn hardware tanto de la parte de compresiĂłn como de la de descompresiĂłn sobre un interfaz de red (NI) genĂ©rico. Los mecanismos propuestos han sido diseñados de forma aislada para hacerlos modulares y fĂĄcilmente adaptables a cualquier protocolo de NI. Los resultados muestran la efectividad de los mecanismos de compresiĂłn y descompresiĂłn y la escasa penalizaciĂłn que introducen. El porcentaje de trĂĄfico reducido mediante la estrategia de compresiĂłn (se reduce con un factor de 3) justifica los recursos extra requeridos. Este trabajo refleja parte de la lĂ­nea de investigaciĂłn global que se pretende abordar en el marco mĂĄs amplio de un doctorado. En particular proponemos un mĂ©todo de gestiĂłn del trĂĄfico de memoria energĂ©ticamente eficiente. El trabajo presentado aquĂ­ representa pues una primera aproximaciĂłn a la investigaciĂłn realizando un desarrollo parcial del simulador, caracterizaciĂłn de patrones de trĂĄfico y el desarrollo de una soluciĂłn parcialSoler Heredia, M. (2013). Effective power saving method by on-chip traffic compression in noc-based embedded systems. http://hdl.handle.net/10251/43774Archivo delegad

    Doctor of Philosophy in Computer Science

    Get PDF
    dissertationRay tracing is becoming more widely adopted in offline rendering systems due to its natural support for high quality lighting. Since quality is also a concern in most real time systems, we believe ray tracing would be a welcome change in the real time world, but is avoided due to insufficient performance. Since power consumption is one of the primary factors limiting the increase of processor performance, it must be addressed as a foremost concern in any future ray tracing system designs. This will require cooperating advances in both algorithms and architecture. In this dissertation I study ray tracing system designs from a data movement perspective, targeting the various memory resources that are the primary consumer of power on a modern processor. The result is high performance, low energy ray tracing architectures

    Low power processor architecture and multicore approach for embedded systems

    Get PDF
    13301ç”Č珏4319ć·ćšćŁ«ïŒˆć·„ć­ŠïŒ‰é‡‘æČąć€§ć­ŠćšćŁ«è«–æ–‡æœŹæ–‡Full 仄䞋にæŽČ茉1.IEICE Transactions Vol. E98-C(7) pp.544-549 2015. IEICE. ć…±è‘—è€…ïŒš S. Otani, H. Kondo. /2.Reuse èš±ćŻă‚šăƒ“ăƒ‡ăƒłă‚č送
    • 

    corecore