Search CORE

491 research outputs found

Architectural Support for High-Performance, Power-Efficient and Secure Multiprocessor Systems

Author: An Baik Song
Publication venue
Publication date
Field of study

High performance systems have been widely adopted in many fields and the demand for better performance is constantly increasing. And the need of powerful yet flexible systems is also increasing to meet varying application requirements from diverse domains. Also, power efficiency in high performance computing has been one of the major issues to be resolved. The power density of core components becomes significantly higher, and the fraction of power supply in total management cost is dominant. Providing dependability is also a main concern in large-scale systems since more hardware resources can be abused by attackers. Therefore, designing high-performance, power-efficient and secure systems is crucial to provide adequate performance as well as reliability to users. Adhering to using traditional design methodologies for large-scale computing systems has a limit to meet the demand under restricted resource budgets. Interconnecting a large number of uniprocessor chips to build parallel processing systems is not an efficient solution in terms of performance and power. Chip multiprocessor (CMP) integrates multiple processing cores and caches on a chip and is thought of as a good alternative to previous design trends. In this dissertation, we deal with various design issues of high performance multiprocessor systems based on CMP to achieve both performance and power efficiency while maintaining security. First, we propose a fast and secure off-chip interconnects through minimizing network overheads and providing an efficient security mechanism. Second, we propose architectural support for fast and efficient memory protection in CMP systems, making the best use of the characteristics in CMP environments and multi-threaded workloads. Third, we propose a new router design for network-on-chip (NoC) based on a new memory technique. We introduce hybrid input buffers that use both SRAM and STT-MRAM for better performance as well as power efficiency. Simulation results show that the proposed schemes improve the performance of off-chip networks through reducing the message size by 54% on average. Also, the schemes diminish the overheads of bounds checking operations, thus enhancing the overall performance by 11% on average. Adopting hybrid buffers in NoC routers contributes to increasing the network throughput up to 21%

Texas A&M Repository

Resource management for media processing in networked embedded systems : proceedings of a one-day workshop, Eindhoven, March 31, 2005

Author
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2005
Field of study

Pure OAI Repository

Architectural Support for High-Performance, Power-Efficient and Secure Multiprocessor Systems

Author: An Baik Song
Publication venue
Publication date
Field of study

Texas A&M Repository

Proximity coherence for chip-multiprocessors

Author: Barrow-Williams N.
Fensch C.
Moore S.
Publication venue: University of Cambridge
Publication date: 01/01/2010
Field of study

Many-core architectures provide an efficient way of harnessing the growing numbers of transistors available in modern fabrication processes; however, the parallel programs run on these platforms are increasingly limited by the energy and latency costs of communication. Existing designs provide a functional communication layer but do not necessarily implement the most efficient solution for chip-multiprocessors, placing limits on the performance of these complex systems. In an era of increasingly power limited silicon design, efficiency is now a primary concern that motivates designers to look again at the challenge of cache coherence. The first step in the design process is to analyse the communication behaviour of parallel benchmark suites such as Parsec and SPLASH-2. This thesis presents work detailing the sharing patterns observed when running the full benchmarks on a simulated 32-core x86 machine. The results reveal considerable locality of shared data accesses between threads with consecutive operating system assigned thread IDs. This pattern, although of little consequence in a multi-node system, corresponds to strong physical locality of shared data between adjacent cores on a chip-multiprocessor platform. Traditional cache coherence protocols, although often used in chip-multiprocessor designs, have been developed in the context of older multi-node systems. By redesigning coherence protocols to exploit new patterns such as the physical locality of shared data, improving the efficiency of communication, specifically in chip-multiprocessors, is possible. This thesis explores such a design – Proximity Coherence – a novel scheme in which L1 load misses are optimistically forwarded to nearby caches via new dedicated links rather than always being indirected via a directory structure.EPSRC DTA research scholarshi

CiteSeerX

Heriot Watt Pure

Crossref

Edinburgh Research Archive

Apollo (Cambridge)

The Design of a System Architecture for Mobile Multimedia Computers

Author: Havinga Paul Johannes Mattheus
Publication venue: University of Twente
Publication date: 01/01/2000
Field of study

This chapter discusses the system architecture of a portable computer, called Mobile Digital Companion, which provides support for handling multimedia applications energy efficiently. Because battery life is limited and battery weight is an important factor for the size and the weight of the Mobile Digital Companion, energy management plays a crucial role in the architecture. As the Companion must remain usable in a variety of environments, it has to be flexible and adaptable to various operating conditions. The Mobile Digital Companion has an unconventional architecture that saves energy by using system decomposition at different levels of the architecture and exploits locality of reference with dedicated, optimised modules. The approach is based on dedicated functionality and the extensive use of energy reduction techniques at all levels of system design. The system has an architecture with a general-purpose processor accompanied by a set of heterogeneous autonomous programmable modules, each providing an energy efficient implementation of dedicated tasks. A reconfigurable internal communication network switch exploits locality of reference and eliminates wasteful data copies

CiteSeerX

University of Twente Research Information

Advanced Techniques for Improving the Efficacy of Digital Forensics Investigations

Author: Marziale Lodovico
Publication venue: ScholarWorks@UNO
Publication date: 20/12/2009
Field of study

Digital forensics is the science concerned with discovering, preserving, and analyzing evidence on digital devices. The intent is to be able to determine what events have taken place, when they occurred, who performed them, and how they were performed. In order for an investigation to be effective, it must exhibit several characteristics. The results produced must be reliable, or else the theory of events based on the results will be flawed. The investigation must be comprehensive, meaning that it must analyze all targets which may contain evidence of forensic interest. Since any investigation must be performed within the constraints of available time, storage, manpower, and computation, investigative techniques must be efficient. Finally, an investigation must provide a coherent view of the events under question using the evidence gathered. Unfortunately the set of currently available tools and techniques used in digital forensic investigations does a poor job of supporting these characteristics. Many tools used contain bugs which generate inaccurate results; there are many types of devices and data for which no analysis techniques exist; most existing tools are woefully inefficient, failing to take advantage of modern hardware; and the task of aggregating data into a coherent picture of events is largely left to the investigator to perform manually. To remedy this situation, we developed a set of techniques to facilitate more effective investigations. To improve reliability, we developed the Forensic Discovery Auditing Module, a mechanism for auditing and enforcing controls on accesses to evidence. To improve comprehensiveness, we developed ramparser, a tool for deep parsing of Linux RAM images, which provides previously inaccessible data on the live state of a machine. To improve efficiency, we developed a set of performance optimizations, and applied them to the Scalpel file carver, creating order of magnitude improvements to processing speed and storage requirements. Last, to facilitate more coherent investigations, we developed the Forensic Automated Coherence Engine, which generates a high-level view of a system from the data generated by low-level forensics tools. Together, these techniques significantly improve the effectiveness of digital forensic investigations conducted using them

University of New Orleans

Design and Performance of Scalable High-Performance Programmable Routers - Doctoral Dissertation, August 2002

Author: Wolf Tilman
Publication venue: Washington University Open Scholarship
Publication date: 03/05/2002
Field of study

The flexibility to adapt to new services and protocols without changes in the underlying hardware is and will increasingly be a key requirement for advanced networks. Introducing a processing component into the data path of routers and implementing packet processing in software provides this ability. In such a programmable router, a powerful processing infrastructure is necessary to achieve to level of performance that is comparable to custom silicon-based routers and to demonstrate the feasibility of this approach. This work aims at the general design of such programmable routers and, specifically, at the design and performance analysis of the processing subsystem. The necessity of programmable routers is motivated, and a router design is proposed. Based on the design, a general performance model is developed and quantitatively evaluated using a new network processor benchmark. Operational challenges, like scheduling of packets to processing engines, are addressed, and novel algorithms are presented. The results of this work give qualitative and quantitative insights into this new domain that combines issues from networking, computer architecture, and system design

Washington University St. Louis: Open Scholarship

Effective power saving method by on-chip traffic compression in noc-based embedded systems

Author: Soler Heredia María
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 03/11/2014
Field of study

[EN] of components, relying on an efficient on-chip network (network-on-chip; NoC). As the size of the system increases, NoC performance and power consumption become a central issue. In this project, we design compression strategies at the NoC level reducing the number of transmitted flits and consequently the energy consumed. The provided mechanism relies on the abundance of memory data blocks filled with zeros in the analysed applications, thus easily compressible by using a zero-elimination strategy. We provide a hardware implementation for both compression and decompression end points at a generic network interface (NI). The mechanisms have been designed in isolated mode in order to make them modular and easily adapted to any NI protocol. Results show the effectiveness of the compression and decompression mechanisms and the low overhead they introduce. The percentage of traffic reduced by the compression strategy (it is reduced by a factor of 3) justifies the added resources. This work reflects some parts of the main research directions we tackle in the wider PhD framework. In particular, we propose a method for power efficient memory traffic management. The work presented here represents the initial research directions in simulation development, traffic pattern characterization and initial solutions development[ES] Con los avances de la tecnología, los sistemas en chip multiprocesador (MPSoC) aumentan en número de componentes, apoyándose en una red en el chip (NoC) eficiente. Según crece el tamaño de estos sistemas, la eficiencia de la red tanto temporal como energética se convierte en una parte primordial. En este proyecto diseñamos estrategias de compresión a nivel de red (en la NoC) reduciendo el número de flits transmitidos y por tanto la energía consumida. El método propuesto se basa en la abundancia de bloques de memoria con largas cadenas de ceros que se detectaron en las aplicaciones analizadas. Esta abundancia de ceros facilita la compresión mediante estrategias de eliminación de ceros. Ofrecemos una implementación hardware tanto de la parte de compresión como de la de descompresión sobre un interfaz de red (NI) genérico. Los mecanismos propuestos han sido diseñados de forma aislada para hacerlos modulares y fácilmente adaptables a cualquier protocolo de NI. Los resultados muestran la efectividad de los mecanismos de compresión y descompresión y la escasa penalización que introducen. El porcentaje de tráfico reducido mediante la estrategia de compresión (se reduce con un factor de 3) justifica los recursos extra requeridos. Este trabajo refleja parte de la línea de investigación global que se pretende abordar en el marco más amplio de un doctorado. En particular proponemos un método de gestión del tráfico de memoria energéticamente eficiente. El trabajo presentado aquí representa pues una primera aproximación a la investigación realizando un desarrollo parcial del simulador, caracterización de patrones de tráfico y el desarrollo de una solución parcialSoler Heredia, M. (2013). Effective power saving method by on-chip traffic compression in noc-based embedded systems. http://hdl.handle.net/10251/43774Archivo delegad

RiuNet

Doctor of Philosophy in Computer Science

Author: Kopta Daniel
Publication venue: University of Utah
Publication date: 01/01/2016
Field of study

dissertationRay tracing is becoming more widely adopted in offline rendering systems due to its natural support for high quality lighting. Since quality is also a concern in most real time systems, we believe ray tracing would be a welcome change in the real time world, but is avoided due to insufficient performance. Since power consumption is one of the primary factors limiting the increase of processor performance, it must be addressed as a foremost concern in any future ray tracing system designs. This will require cooperating advances in both algorithms and architecture. In this dissertation I study ray tracing system designs from a data movement perspective, targeting the various memory resources that are the primary consumer of power on a modern processor. The result is high performance, low energy ray tracing architectures

The University of Utah: J. Willard Marriott Digital Library

Low power processor architecture and multicore approach for embedded systems

Author: Otani Sugako
大谷寿賀子
Publication venue
Publication date: 28/09/2015
Field of study

13301甲第4319号博士（工学）金沢大学博士論文本文Full 以下に掲載：1.IEICE Transactions Vol. E98-C(7) pp.544-549 2015. IEICE. 共著者： S. Otani, H. Kondo. /2.Reuse 許可エビデンス送

Kanazawa University Repository for Academic Resources