Search CORE

11 research outputs found

Rapid prototyping of a fixed-complexity sphere decoder and its application to iterative decoding of turbo-MIMO systems

Author: Barbero Linan Luis G.
Publication venue: The University of Edinburgh
Publication date: 01/01/2007
Field of study

Edinburgh Research Archive

Deliverable D3.4: WP3 overall public deliverable

Author: Collados M.
Dolmans G.
Modonesi I.
Schenk T.C.W.
Zelst van, A.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2005
Field of study

Pure OAI Repository

Architectural explorations for streaming accelerators with customized memory layouts

Author: Shafiq Muhammad
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2012
Field of study

El concepto básico de la arquitectura mono-nucleo en los procesadores de propósito general se ajusta bien a un modelo de programación secuencial. La integración de multiples núcleos en un solo chip ha permitido a los procesadores correr partes del programa en paralelo. Sin embargo, la explotación del enorme paralelismo disponible en muchas aplicaciones de alto rendimiento y de los datos correspondientes es difícil de conseguir usando unicamente multicores de propósito general. La aparición de aceleradores tipo streaming y de los correspondientes modelos de programación han mejorado esta situación proporcionando arquitecturas orientadas al proceso de flujos de datos. La idea básica detrás del diseño de estas arquitecturas responde a la necesidad de procesar conjuntos enormes de datos. Estos dispositivos de alto rendimiento orientados a flujos permiten el procesamiento rapido de datos mediante el uso eficiente de computación paralela y comunicación entre procesos. Los aceleradores streaming orientados a flujos, igual que en otros procesadores, consisten en diversos componentes micro-arquitectonicos como por ejemplo las estructuras de memoria, las unidades de computo, las unidades de control, los canales de Entrada/Salida y controles de Entrada/Salida, etc. Sin embargo, los requisitos del flujo de datos agregan algunas características especiales e imponen otras restricciones que afectan al rendimiento. Estos dispositivos, por lo general, ofrecen un gran número de recursos computacionales, pero obligan a reorganizar los conjuntos de datos en paralelo, maximizando la independiencia para alimentar los recursos de computación en forma de flujos. La disposición de datos en conjuntos independientes de flujos paralelos no es una tarea sencilla. Es posible que se tenga que cambiar la estructura de un algoritmo en su conjunto o, incluso, puede requerir la reescritura del algoritmo desde cero. Sin embargo, todos estos esfuerzos para la reordenación de los patrones de las aplicaciones de acceso a datos puede que no sean muy útiles para lograr un rendimiento óptimo. Esto es debido a las posibles limitaciones microarquitectonicas de la plataforma de destino para los mecanismos hardware de prefetch, el tamaño y la granularidad del almacenamiento local, y la flexibilidad para disponer de forma serial los datos en el interior del almacenamiento local. Las limitaciones de una plataforma de streaming de proposito general para el prefetching de datos, almacenamiento y demas procedimientos para organizar y mantener los datos en forma de flujos paralelos e independientes podría ser eliminado empleando técnicas a nivel micro-arquitectonico. Esto incluye el uso de memorias personalizadas especificamente para las aplicaciones en el front-end de una arquitectura streaming. El objetivo de esta tesis es presentar exploraciones arquitectónicas de los aceleradores streaming con diseños de memoria personalizados. En general, la tesis cubre tres aspectos principales de tales aceleradores. Estos aspectos se pueden clasificar como: i) Diseño de aceleradores de aplicaciones específicas con diseños de memoria personalizados, ii) diseño de aceleradores con memorias personalizadas basados en plantillas, y iii) exploraciones del espacio de diseño para dispositivos orientados a flujos con las memorias estándar y personalizadas. Esta tesis concluye con la propuesta conceptual de una Blacksmith Streaming Architecture (BSArc). El modelo de computación Blacksmith permite la adopción a nivel de hardware de un front-end de aplicación específico utilizando una GPU como back-end. Esto permite maximizar la explotación de la localidad de datos y el paralelismo a nivel de datos de una aplicación mientras que proporciona un flujo mayor de datos al back-end. Consideramos que el diseño de estos procesadores con memorias especializadas debe ser proporcionado por expertos del dominio de aplicación en la forma de plantillas.The basic concept behind the architecture of a general purpose CPU core conforms well to a serial programming model. The integration of more cores on a single chip helped CPUs in running parts of a program in parallel. However, the utilization of huge parallelism available from many high performance applications and the corresponding data is hard to achieve from these general purpose multi-cores. Streaming accelerators and the corresponding programing models improve upon this situation by providing throughput oriented architectures. The basic idea behind the design of these architectures matches the everyday increasing requirements of processing huge data sets. These high-performance throughput oriented devices help in high performance processing of data by using efficient parallel computations and streaming based communications. The throughput oriented streaming accelerators ¿ similar to the other processors ¿ consist of numerous types of micro-architectural components including the memory structures, compute units, control units, I/O channels and I/O controls etc. However, the throughput requirements add some special features and impose other restrictions for the performance purposes. These devices, normally, offer a large number of compute resources but restrict the applications to arrange parallel and maximally independent data sets to feed the compute resources in the form of streams. The arrangement of data into independent sets of parallel streams is not an easy and simple task. It may need to change the structure of an algorithm as a whole or even it can require to write a new algorithm from scratch for the target application. However, all these efforts for the re-arrangement of application data access patterns may still not be very helpful to achieve the optimal performance. This is because of the possible micro-architectural constraints of the target platform for the hardware pre-fetching mechanisms, the size and the granularity of the local storage and the flexibility in data marshaling inside the local storage. The constraints of a general purpose streaming platform on the data pre-fetching, storing and maneuvering to arrange and maintain it in the form of parallel and independent streams could be removed by employing micro-architectural level design approaches. This includes the usage of application specific customized memories in the front-end of a streaming architecture. The focus of this thesis is to present architectural explorations for the streaming accelerators using customized memory layouts. In general the thesis covers three main aspects of such streaming accelerators in this research. These aspects can be categorized as : i) Design of Application Specific Accelerators with Customized Memory Layout ii) Template Based Design Support for Customized Memory Accelerators and iii) Design Space Explorations for Throughput Oriented Devices with Standard and Customized Memories. This thesis concludes with a conceptual proposal on a Blacksmith Streaming Architecture (BSArc). The Blacksmith Computing allow the hardware-level adoption of an application specific front-end with a GPU like streaming back-end. This gives an opportunity to exploit maximum possible data locality and the data level parallelism from an application while providing a throughput natured powerful back-end. We consider that the design of these specialized memory layouts for the front-end of the device are provided by the application domain experts in the form of templates. These templates are adjustable according to a device and the problem size at the device's configuration time. The physical availability of such an architecture may still take time. However, simulation framework helps in architectural explorations to give insight into the proposal and predicts potential performance benefits for such an architecture

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

MIMO Systems

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

In recent years, it was realized that the MIMO communication systems seems to be inevitable in accelerated evolution of high data rates applications due to their potential to dramatically increase the spectral efficiency and simultaneously sending individual information to the corresponding users in wireless systems. This book, intends to provide highlights of the current research topics in the field of MIMO system, to offer a snapshot of the recent advances and major issues faced today by the researchers in the MIMO related areas. The book is written by specialists working in universities and research centers all over the world to cover the fundamental principles and main advanced topics on high data rates wireless communications systems over MIMO channels. Moreover, the book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

Directory of Open Access Books (DOAB)

Design of large polyphase filters in the Quadratic Residue Number System

Author: Cardarilli G
Nannarelli A
Oster Y
Petricca M
Re M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Crossref

ART

Online Research Database In Technology

Temperature aware power optimization for multicore floating-point units

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Parallel and Distributed Computing

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

Directory of Open Access Books (DOAB)

Architectural explorations for streaming accelerators with customized memory layouts

Author: Shafiq Muhammad
Publication venue: Universitat Politècnica de Catalunya
Publication date: 21/05/2012
Field of study

UPCommons. Portal del coneixement obert de la UPC

Improving the Reliability of Microprocessors under BTI and TDDB Degradations

Author: Li Lin
Publication venue
Publication date: 16/06/2014
Field of study

Reliability is a fundamental challenge for current and future microprocessors with advanced nanoscale technologies. With smaller gates, thinner dielectric and higher temperature microprocessors are vulnerable under aging mechanisms such as Bias Temperature Instability (BTI) and Temperature Dependent Dielectric Breakdown (TDDB). Under continuous stress both parametric and functional errors occur, resulting compromised microprocessor lifetime. In this thesis, based on the thorough study on BTI and TDDB mechanisms, solutions are proposed to mitigating the aging processes on memory based and random logic structures in modern out-of-order microprocessors. A large area of processor core is occupied by memory based structure that is vulnerable to BTI induced errors. The problem is exacerbated when PBTI degradation in NMOS is as severe as NBTI in PMOS in high-k metal gate technology. Hence a novel design is proposed to recover 4 internal gates within a SRAM cell simultaneously to mitigate both NBTI and PBTI effects. This technique is applied to both the L2 cache banks and the busy function units with storage cells in out-of-order pipeline in two different ways. For the L2 cache banks, redundant cache bank is added exclusively for proactive recovery rotation. For the critical and busy function units in out-of-order pipelines, idle cycles are exploited at per-buffer-entry level. Different from memory based structures, combinational logic structures such as function units in execution stage can not use low overhead redundancy to tolerate errors due to their irregular structure. A design framework that aims to improve the reliability of the vulnerable functional units of a processor core is designed and implemented. The approach is designing a generic function unit (GFU) that can be reconfigured to replace a particular functional unit (FU) while it is being recovered for improved lifetime. Although flexible, the GFU is slower than the original target FUs. So GFU is carefully designed so as to minimize the performance loss when it is in-use. More schemes are also designed to avoid using the GFU on performance critical paths of a program execution

D-Scholarship@Pitt

Translational pipelines for closed-loop neuromodulation

Author: Toth Robert
Publication venue
Publication date: 25/06/2024
Field of study

Closed-loop neuromodulation systems have shown significant potential for addressing unmet needs in the treatment of disorders of the central nervous system, yet progress towards clinical adoption has been slow. Advanced technological developments often stall in the preclinical stage by failing to account for the constraints of implantable medical devices, and due to the lack of research platforms with a translational focus. This thesis presents the development of three clinically relevant research systems focusing on refinements of deep brain stimulation therapies. First, we introduce a system for synchronising implanted and external stimulation devices, allowing for research into multi-site stimulation paradigms, cross-region neural plasticity, and questions of phase coupling. The proposed design aims to sidestep the limited communication capabilities of existing commercial implant systems in providing a stimulation state readout without reliance on telemetry, creating a cross-platform research tool. Next, we present work on the Picostim-DyNeuMo adaptive neuromodulation platform, focusing on expanding device capabilities from activity and circadian adaptation to bioelectric marker--based responsive stimulation. Here, we introduce a computationally optimised implementation of a popular band power--estimation algorithm suitable for deployment in the DyNeuMo system. The new algorithmic capability was externally validated to establish neural state classification performance in two widely-researched use cases: Parkinsonian beta bursts and seizures. For in vivo validation, a pilot experiment is presented demonstrating responsive neurostimulation to cortical alpha-band activity in a non-human primate model for the modulation of attention state. Finally, we turn our focus to the validation of a recently developed method to provide computationally efficient real-time phase estimation. Following theoretical analysis, the method is integrated into the commonly used Intan electrophysiological recording platform, creating a novel closed-loop optogenetics research platform. The performance of the research system is characterised through a pilot experiment, targeting the modulation of cortical theta-band activity in a transgenic mouse model

Oxford University Research Archive