Search CORE

12,706 research outputs found

Improving latency tolerance of multithreading through decoupling

Author: González Colás Antonio María
Parcerisa Bundó Joan Manuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

The increasing hardware complexity of dynamically scheduled superscalar processors may compromise the scalability of this organization to make an efficient use of future increases in transistor budget. SMT processors, designed over a superscalar core, are therefore directly concerned by this problem. The article presents and evaluates a novel processor microarchitecture which combines two paradigms: simultaneous multithreading and access/execute decoupling. Since its decoupled units issue instructions in order, this architecture is significantly less complex, in terms of critical path delays, than a centralized out-of-order design, and it is more effective for future growth in issue-width and clock speed. We investigate how both techniques complement each other. Since decoupling features an excellent memory latency hiding efficiency, the large amount of parallelism exploited by multithreading may be used to hide the latency of functional units and keep them fully utilized. The study shows that, by adding decoupling to a multithreaded architecture, fewer threads are needed to achieve maximum throughput. Therefore, in addition to the obvious hardware complexity reduction, it places lower demands on the memory system. The study also reveals that multithreading by itself exhibits little memory latency tolerance. Results suggest that most of the latency hiding effectiveness of SMT architectures comes from the dynamic scheduling. On the other hand, decoupling is very effective at hiding memory latency. An increase in the cache miss penalty from 1 to 32 cycles reduces the performance of a 4-context multithreaded decoupled processor by less than 2 percent. For the nondecoupled multithreaded processor, the loss of performance is about 23 percent.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Service-Oriented Architecture for Space Exploration Robotic Rover Systems

Author: Bassil Youssef
Publication venue
Publication date: 05/03/2012
Field of study

Currently, industrial sectors are transforming their business processes into e-services and component-based architectures to build flexible, robust, and scalable systems, and reduce integration-related maintenance and development costs. Robotics is yet another promising and fast-growing industry that deals with the creation of machines that operate in an autonomous fashion and serve for various applications including space exploration, weaponry, laboratory research, and manufacturing. It is in space exploration that the most common type of robots is the planetary rover which moves across the surface of a planet and conducts a thorough geological study of the celestial surface. This type of rover system is still ad-hoc in that it incorporates its software into its core hardware making the whole system cohesive, tightly-coupled, more susceptible to shortcomings, less flexible, hard to be scaled and maintained, and impossible to be adapted to other purposes. This paper proposes a service-oriented architecture for space exploration robotic rover systems made out of loosely-coupled and distributed web services. The proposed architecture consists of three elementary tiers: the client tier that corresponds to the actual rover; the server tier that corresponds to the web services; and the middleware tier that corresponds to an Enterprise Service Bus which promotes interoperability between the interconnected entities. The niche of this architecture is that rover's software components are decoupled and isolated from the rover's body and possibly deployed at a distant location. A service-oriented architecture promotes integrate-ability, scalability, reusability, maintainability, and interoperability for client-to-server communication.Comment: LACSC - Lebanese Association for Computational Sciences, http://www.lacsc.org/; International Journal of Science & Emerging Technologies (IJSET), Vol. 3, No. 2, February 201

arXiv.org e-Print Archive

ExcelingTech Publishing Company (E-Journals)

An ultra low-power hardware accelerator for automatic speech recognition

Author: Arnau Montañés José María
González Colás Antonio María
Segura Salvador Albert
Yazdani Aminabadi Reza
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost which is not affordable for the tiny power budget of mobile devices. Hardware acceleration can reduce power consumption of ASR systems, while delivering high-performance. In this paper, we present an accelerator for large-vocabulary, speaker-independent, continuous speech recognition. It focuses on the Viterbi search algorithm, that represents the main bottleneck in an ASR system. The proposed design includes innovative techniques to improve the memory subsystem, since memory is identified as the main bottleneck for performance and power in the design of these accelerators. We propose a prefetching scheme tailored to the needs of an ASR system that hides main memory latency for a large fraction of the memory accesses with a negligible impact on area. In addition, we introduce a novel bandwidth saving technique that removes 20% of the off-chip memory accesses issued during the Viterbi search. The proposed design outperforms software implementations running on the CPU by orders of magnitude and achieves 1.7x speedup over a highly optimized CUDA implementation running on a high-end Geforce GTX 980 GPU, while reducing by two orders of magnitude (287x) the energy required to convert the speech into text.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

The 3DMA Middleware for Mobile Applications

Author: Dumas Menjivar Marlon
Elms Kim
Fjellheim Tore
Milliner Stephen
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2004
Field of study

Mobile devices have received much research interest in re- cent years. Mobility raises new issues such as more dynamic context, limited computing resources, and frequent disconnections. To handle these issues, we propose a middleware, called 3DMA, which introduces three requirements, 1) distribution, 2) decoupling and 3) decomposition. 3DMA uses a space based middleware approach combined with a set of workers which are able to act on the users behalf either to reduce load on the mobile device, or to support disconnected behavior. In order to demonstrate aspects of the middleware architecture we consider the development of a commonly used mobile application

Queensland University of Technology ePrints Archive