Search CORE

704 research outputs found

Castell: a heterogeneous cmp architecture scalable to hundreds of processors

Author: Cabarcas Jaramillo Felipe
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2011
Field of study

Technology improvements and power constrains have taken multicore architectures to dominate microprocessor designs over uniprocessors. At the same time, accelerator based architectures have shown that heterogeneous multicores are very efficient and can provide high throughput for parallel applications, but with a high-programming effort. We propose Castell a scalable chip multiprocessor architecture that can be programmed as uniprocessors, and provides the high throughput of accelerator-based architectures. Castell relies on task-based programming models that simplify software development. These models use a runtime system that dynamically finds, schedules, and adds hardware-specific features to parallel tasks. One of these features is DMA transfers to overlap computation and data movement, which is known as double buffering. This feature allows applications on Castell to tolerate large memory latencies and lets us design the memory system focusing on memory bandwidth. In addition to provide programmability and the design of the memory system, we have used a hierarchical NoC and added a synchronization module. The NoC design distributes memory traffic efficiently to allow the architecture to scale. The synchronization module is a consequence of the large performance degradation of application for large synchronization latencies. Castell is mainly an architecture framework that enables the definition of domain-specific implementations, fine-tuned to a particular problem or application. So far, Castell has been successfully used to propose heterogeneous multicore architectures for scientific kernels, video decoding (using H.264), and protein sequence alignment (using Smith-Waterman and clustalW). It has also been used to explore a number of architecture optimizations such as enhanced DMA controllers, and architecture support for task-based programming models. ii

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Trace-based Performance Analysis for Hardware Accelerators

Author: Juckeland Guido
Publication venue
Publication date: 05/02/2013
Field of study

This thesis presents how performance data from hardware accelerators can be included in event logs. It extends the capabilities of trace-based performance analysis to also monitor and record data from this novel parallelization layer. The increasing awareness to power consumption of computing devices has led to an interest in hybrid computing architectures as well. High-end computers, workstations, and mobile devices start to employ hardware accelerators to offload computationally intense and parallel tasks, while at the same time retaining a highly efficient scalar compute unit for non-parallel tasks. This execution pattern is typically asynchronous so that the scalar unit can resume other work while the hardware accelerator is busy. Performance analysis tools provided by the hardware accelerator vendors cover the situation of one host using one device very well. Yet, they do not address the needs of the high performance computing community. This thesis investigates ways to extend existing methods for recording events from highly parallel applications to also cover scenarios in which hardware accelerators aid these applications. After introducing a generic approach that is suitable for any API based acceleration paradigm, the thesis derives a suggestion for a generic performance API for hardware accelerators and its implementation with NVIDIA CUPTI. In a next step the visualization of event logs containing data from execution streams on different levels of parallelism is discussed. In order to overcome the limitations of classic performance profiles and timeline displays, a graph-based visualization using Parallel Performance Flow Graphs (PPFGs) is introduced. This novel technical approach is using program states in order to display similarities and differences between the potentially very large number of event streams and, thus, enables a fast way to spot load imbalances. The thesis concludes with the in-depth analysis of a case-study of PIConGPU---a highly parallel, multi-hybrid plasma physics simulation---that benefited greatly from the developed performance analysis methods.Diese Dissertation zeigt, wie der Ablauf von Anwendungsteilen, die auf Hardwarebeschleuniger ausgelagert wurden, als Programmspur mit aufgezeichnet werden kann. Damit wird die bekannte Technik der Leistungsanalyse von Anwendungen mittels Programmspuren so erweitert, dass auch diese neue Parallelitätsebene mit erfasst wird. Die Beschränkungen von Computersystemen bezüglich der elektrischen Leistungsaufnahme hat zu einer steigenden Anzahl von hybriden Computerarchitekturen geführt. Sowohl Hochleistungsrechner, aber auch Arbeitsplatzcomputer und mobile Endgeräte nutzen heute Hardwarebeschleuniger um rechenintensive, parallele Programmteile auszulagern und so den skalaren Hauptprozessor zu entlasten und nur für nicht parallele Programmteile zu verwenden. Dieses Ausführungsschema ist typischerweise asynchron: der Skalarprozessor kann, während der Hardwarebeschleuniger rechnet, selbst weiterarbeiten. Die Leistungsanalyse-Werkzeuge der Hersteller von Hardwarebeschleunigern decken den Standardfall (ein Host-System mit einem Hardwarebeschleuniger) sehr gut ab, scheitern aber an einer Unterstützung von hochparallelen Rechnersystemen. Die vorliegende Dissertation untersucht, in wie weit auch multi-hybride Anwendungen die Aktivität von Hardwarebeschleunigern aufzeichnen können. Dazu wird die vorhandene Methode zur Erzeugung von Programmspuren für hochparallele Anwendungen entsprechend erweitert. In dieser Untersuchung wird zuerst eine allgemeine Methodik entwickelt, mit der sich für jede API-gestützte Hardwarebeschleunigung eine Programmspur erstellen lässt. Darauf aufbauend wird eine eigene Programmierschnittstelle entwickelt, die es ermöglicht weitere leistungsrelevante Daten aufzuzeichnen. Die Umsetzung dieser Schnittstelle wird am Beispiel von NVIDIA CUPTI darstellt. Ein weiterer Teil der Arbeit beschäftigt sich mit der Darstellung von Programmspuren, welche Aufzeichnungen von den unterschiedlichen Parallelitätsebenen enthalten. Um die Einschränkungen klassischer Leistungsprofile oder Zeitachsendarstellungen zu überwinden, wird mit den parallelen Programmablaufgraphen (PPFGs) eine neue graphenbasisierte Darstellungsform eingeführt. Dieser neuartige Ansatz zeigt eine Programmspur als eine Folge von Programmzuständen mit gemeinsamen und unterchiedlichen Abläufen. So können divergierendes Programmverhalten und Lastimbalancen deutlich einfacher lokalisiert werden. Die Arbeit schließt mit der detaillierten Analyse von PIConGPU -- einer multi-hybriden Simulation aus der Plasmaphysik --, die in großem Maße von den in dieser Arbeit entwickelten Analysemöglichkeiten profiert hat

Technische Universität Dresden: Qucosa

Scalability of parallel video decoding on heterogeneous manycore architectures

Author: Cabarcas Jaramillo Felipe
Juurlink Ben
Meenderinck Cor
Ramírez Bellido Alejandro
Valero Cortés Mateo
Álvarez Mesa Mauricio
Publication venue
Publication date: 01/01/2011
Field of study

This paper presents an analysis of the scalability of the parallel video decoding on heterogeneous many core architectures. As benchmark, we use a highly parallel H.264/AVC video decoder that generates a large number of independent tasks. In order to translate task-level parallelism into performance gains both the video decoder and the architecture have been optimized. The video decoder was modified for exploiting coarse-grain frame-level parallelism in the entropy decoding kernel which has been considered the main bottleneck. Second, a heterogeneous combination of cores is evaluated for executing different type of tasks. Finally, an evaluation of the memory requirements of the whole system has been carried out. Experiments conducted using a trace-driven simulation methodology shows that the evaluated system exhibits a good parallel scalability up to 68 cores. At this point the parallel video decoder is able to decode more than 200 HD frames per second using simple low power processors.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Ravel-XL: a hardware accelerator for assigned-delay compiled-code logic gate simulation

Author: Brown R. B.
Marques-Silva J. P.
Riepe M. A.
Sakallah K. A.
Publication venue
Publication date: 01/03/1996
Field of study

Southampton (e-Prints Soton)

WAVOS: a MATLAB toolkit for wavelet analysis and visualization of oscillatory systems

Author: B Cazelles
C Torrence
D Morlet
DB Percival
DK Welsh
ED Herzog
ED Herzog
Guillaume Bonnet
I Daubechies
J Buckheit
J Etchegaray
J Kong
J Levine
JE Baggs
K Meeker
Linda R Petzold
M Brai
NA Krivova
PS Addison
R Carmona
Richard Harang
S Mallat
TS Price
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Wavelets have proven to be a powerful technique for the analysis of periodic data, such as those that arise in the analysis of circadian oscillators. While many implementations of both continuous and discrete wavelet transforms are available, we are aware of no software that has been designed with the nontechnical end-user in mind. By developing a toolkit that makes these analyses accessible to end users without significant programming experience, we hope to promote the more widespread use of wavelet analysis. Findings We have developed the WAVOS toolkit for wavelet analysis and visualization of oscillatory systems. WAVOS features both the continuous (Morlet) and discrete (Daubechies) wavelet transforms, with a simple, user-friendly graphical user interface within MATLAB. The interface allows for data to be imported from a number of standard file formats, visualized, processed and analyzed, and exported without use of the command line. Our work has been motivated by the challenges of circadian data, thus default settings appropriate to the analysis of such data have been pre-selected in order to minimize the need for fine-tuning. The toolkit is flexible enough to deal with a wide range of oscillatory signals, however, and may be used in more general contexts. Conclusions We have presented WAVOS: a comprehensive wavelet-based MATLAB toolkit that allows for easy visualization, exploration, and analysis of oscillatory data. WAVOS includes both the Morlet continuous wavelet transform and the Daubechies discrete wavelet transform. We have illustrated the use of WAVOS, and demonstrated its utility for the analysis of circadian data on both bioluminesence and wheel-running data. WAVOS is freely available at <url>http://sourceforge.net/projects/wavos/files/</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Homologs of ancestral CNNM proteins affect magnesium homeostasis and circadian rhythmicity in a model eukaryotic cell

Author: Feord Helen K.
Gil Sergio
van Ooijen Gerben
Publication venue: 'MDPI AG'
Publication date: 01/01/2023
Field of study

Biological rhythms are ubiquitous across organisms and coordinate key cellular processes. Oscillations of Mg2+ levels in cells are now well-established, and due to the critical roles of Mg2+ in cell metabolism, they are potentially fundamental for the circadian control of cellular activity. The identity of the transport proteins responsible for sustaining Mg2+ levels in eukaryotic cells remains hotly debated, and several are restricted to specific groups of higher eukaryotes. Here, using the eukaryotic minimal model cells of Ostreococcus tauri, we report two homologs of common descents of the Cyclin M (CNNM)/CorC protein family. Overexpression of these proteins leads to a reduction in the overall magnesium content of cells and a lengthening of the period of circadian gene expression rhythms. However, we observed a paradoxical increase in the magnesium content of the organelle fraction. The chemical inhibition of Mg2+ transport has a synergistic effect on circadian period lengthening upon the overexpression of one CNNM homolog, but not the other. Finally, both homologs rescue the deleterious effect of low extracellular magnesium on cell proliferation rates. Overall, we identified two CNNM proteins that directly affect Mg2+ homeostasis and cellular rhythms

Directory of Open Access Journals

Edinburgh Research Explorer

A Genome-Scale Resource for the Functional Characterization of Arabidopsis Transcription Factors

Author: Bonaldi Katia
Breton Ghislain
Doherty Colleen J.
Ecker Joseph R.
Galli Mary
Kang S. Earl
Kay Steve A.
Nagel Dawn H.
Pruneda-Paz Jose L.
Ravelo Stephanie
Publication venue: The Authors. Published by Elsevier Inc.
Publication date: 01/07/2014
Field of study

SummaryExtensive transcriptional networks play major roles in cellular and organismal functions. Transcript levels are in part determined by the combinatorial and overlapping functions of multiple transcription factors (TFs) bound to gene promoters. Thus, TF-promoter interactions provide the basic molecular wiring of transcriptional regulatory networks. In plants, discovery of the functional roles of TFs is limited by an increased complexity of network circuitry due to a significant expansion of TF families. Here, we present the construction of a comprehensive collection of Arabidopsis TFs clones created to provide a versatile resource for uncovering TF biological functions. We leveraged this collection by implementing a high-throughput DNA binding assay and identified direct regulators of a key clock gene (CCA1) that provide molecular links between different signaling modules and the circadian clock. The resources introduced in this work will significantly contribute to a better understanding of the transcriptional regulatory landscape of plant genomes

Elsevier - Publisher Connector

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California