Search CORE

12 research outputs found

Memory hierarchy studies of multimedia- enhanced simultaneous multithreaded processors for MPEG-2 video decompression

Author: Sigmund Ulrich
Ungerer Theo
Publication venue
Publication date: 02/08/2007
Field of study

Learning about knowledge: A complex network approach

Author: B. Tadic
C. Gershenson
D. ben Avraham
E. M. Bollt
J. F. Sowa
J. Silc
L. Acedo
Luciano da Fontoura Costa
P. C. Jackson
S. J. Russell
W. Reisig
Publication venue: 'American Physical Society (APS)'
Publication date: 17/01/2006
Field of study

This article describes an approach to modeling knowledge acquisition in terms of walks along complex networks. Each subset of knowledge is represented as a node, and relations between such knowledge are expressed as edges. Two types of edges are considered, corresponding to free and conditional transitions. The latter case implies that a node can only be reached after visiting previously a set of nodes (the required conditions). The process of knowledge acquisition can then be simulated by considering the number of nodes visited as a single agent moves along the network, starting from its lowest layer. It is shown that hierarchical networks, i.e. networks composed of successive interconnected layers, arise naturally as a consequence of compositions of the prerequisite relationships between the nodes. In order to avoid deadlocks, i.e. unreachable nodes, the subnetwork in each layer is assumed to be a connected component. Several configurations of such hierarchical knowledge networks are simulated and the performance of the moving agent quantified in terms of the percentage of visited nodes after each movement. The Barab\'asi-Albert and random models are considered for the layer and interconnecting subnetworks. Although all subnetworks in each realization have the same number of nodes, several interconnectivities, defined by the average node degree of the interconnection networks, have been considered. Two visiting strategies are investigated: random choice among the existing edges and preferential choice to so far untracked edges. A series of interesting results are obtained, including the identification of a series of plateaux of knowledge stagnation in the case of the preferential movements strategy in presence of conditional edges.Comment: 18 pages, 19 figure

arXiv.org e-Print Archive

Crossref

CERN Document Server

Alineamiento de secuencias genéticas en procesadores multicore

Author: Moure López Juan Carlos
Pons Noguera Carles
Universitat Autònoma de Barcelona. Escola d'Enginyeria
Publication venue
Publication date: 01/01/2010
Field of study

Este trabajo analiza el rendimiento del algoritmo de alineamiento de secuencias conocido como Needleman-Wunsch, sobre 3 sistemas de cómputo multiprocesador diferentes. Se analiza y se codifica el algoritmo serie usando el lenguaje de programación C y se plantean una serie de optimizaciones con la finalidad de minimizar el volumen y el tiempo de cómputo. Posteriormente, se realiza un análisis de las prestaciones del programa sobre los diferentes sistemas de cómputo. En la segunda parte del trabajo, se paraleliza el algoritmo serie y se codifica ayudándonos de OpenMP. El resultado son dos variantes del programa que difieren en la relación entre la cantidad de cómputo y la de comunicación. En la primera variante, la comunicación entre procesadores es poco frecuente y se realiza tras largos periodos de ejecución (granularidad gruesa). En cambio, en la segunda variante las tareas individuales son relativamente pequeñas en término de tiempo de ejecución y la comunicación entre los procesadores es frecuente (granularidad fina). Ambas variantes se ejecutan y analizan en arquitecturas multicore que explotan el paralelismo a nivel de thread. Los resultados obtenidos muestran la importancia de entender y saber analizar el efecto del multicore y multithreading en el rendimiento.Aquest treball analitza el rendiment de l'algorisme d'alineament de seqüències conegut com a Needleman-Wunsch sobre 3 sistemes de còmput multiprocessador diferents. S'analitza i es codifica l'algorisme sèrie emprant el llenguatge de programació C i es plantegen una sèrie d'optimitzacions amb la finalitat de minimitzar el volum i el temps de còmput. Posteriorment es realitza una anàlisi de les prestacions del programa sobre els diferents sistemes de còmput. En la segona part del treball, es paral·lelitza l'algorisme sèrie i es codifica ajudant-nos de OpenMP. El resultat són dues variants del programa que difereixen en la relació entre la quantitat de còmput i la de comunicació. En la primera variant, la comunicació entre processadors és poc habitual i es realitza després de llargs períodes d'execució (granularitat gruixuda). En canvi, en la segona variant les tasques individuals s'executen relativament ràpides i la comunicació entre els processadors és freqüent (granularitat fina). Ambdues variants s'executen i s'analitzen en arquitectures multicore que exploten el paral·lelisme a nivell de thread. Els resultats obtinguts ens mostren la importància d'entendre i saber analitzar l'efecte del multicore i el multithreading en el rendiment.This research analyzes the performance of three multiprocessor computing nodes solving the seqüence alignment algorithm known as Needleman-Wunsh. First of all, the algorithm is analyzed and coded using the C language. We raise a series of optimizations with a common goal: minimize memory requirements and reduce computation time. Right afterwards we analyze the program's performance over the three computation nodes. In the second part of the research the sequential algorithm is parallelized using OpenMP. Two program variations are designed, these two variations differs between them in the amount of computation and the comunication. On the first variation the comunication between processors is rarely common and only occurs after long time periods . On the second variation the tasks are processed rapidly and the communication between processors is common. Both variations have been implemented and executed in multicore architectures that exploits thread-level parallelism. The result shows the importance of understanding and knowing how to analyze the effect of multicore and multithreading performance

Diposit Digital de Documents de la UAB

A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

Author: Azarkhish Erfan
Benini Luca
Bonetti Andrea
Emery Stephane
Jokic Petar
Pons Marc
Publication venue
Publication date: 24/06/2021
Field of study

Implementing embedded neural network processing at the edge requires efficient hardware acceleration that couples high computational performance with low power consumption. Driven by the rapid evolution of network architectures and their algorithmic features, accelerator designs are constantly updated and improved. To evaluate and compare hardware design choices, designers can refer to a myriad of accelerator implementations in the literature. Surveys provide an overview of these works but are often limited to system-level and benchmark-specific performance metrics, making it difficult to quantitatively compare the individual effect of each utilized optimization technique. This complicates the evaluation of optimizations for new accelerator designs, slowing-down the research progress. This work provides a survey of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance. It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately. Reported optimizations range from up to 10'000x memory savings to 33x energy reductions, providing chip designers an overview of design choices for implementing efficient low power neural network accelerators

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

VHDL Design of Advanced CPU

Author: Slavík Daniel
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2010
Field of study

Cílem projektu bylo prostudovat zřetězené architektury procesorů, dále pak architektury instrukčních a datových cache. Vybraná zřetězená architektura měla být navržena včetně instrukční a datové cache a implementována v jazyce VHDL. Projekt jsem pojal tak, že jsem implementoval nejprve subskalární architekturu, poté tři verze skalární architektury. Byla provedena syntéza těchto architektur do FPGA a na zvoleném algoritmu porovnána jejich výkonnost. V další části práce jsem navrhl a implementoval instrukční i datovou cache pro obě architektury. Tyto cache se mi však už nepodařilo syntetizovat. Závěrečná kapitola této práce pojednává o superskalární architektuře, což je architektura používaná v dnešní době.The goal of this project was to study pipelined processor architectures along with instruction and data cache. Chosen pipelined architecture should be designed and implemented using VHDL language. Firstly, I decided to implement the subscalar architecture first, secondly, three versions of scalar architecture. For these architectures synthesis into FPGA was done and performance of these architectures was compared on chosen algorithm. In the next part of this thesis I designed and implemented instruction and data cache logic for both architectures. However I was not able to synthetise these caches. Last chapter of this thesis deals with the superscalar architecture, which is the architecture of nowadays.

Digital library of Brno University of Technology

National Repository of Grey Literature

Formal Verification of Instruction Dependencies in Microprocessors

Author: Shehata Hazem
Publication venue: 'University of Waterloo'
Publication date: 01/01/2011
Field of study

In microprocessors, achieving an efficient utilization of the execution units is a key factor in improving performance. However, maintaining an uninterrupted flow of instructions is a challenge due to the data and control dependencies between instructions of a program. Modern microprocessors employ aggressive optimizations trying to keep their execution units busy without violating inter-instruction dependencies. Such complex optimizations may cause subtle implementation flaws that can be hard to detect using conventional simulation-based verification techniques. Formal verification is known for its ability to discover design flaws that may go undetected using conventional verification techniques. However, with formal verification come two major challenges. First, the correctness of the implementation needs to be defined formally. Second, formal verification is often hard to apply at the scale of realistic implementations. In this thesis, we present a formal verification strategy to guarantee that a microprocessor implementation preserves both data and control dependencies among instructions. Throughout our strategy, we address the two major challenges associated with formal verification: correctness and scalability. We address the correctness challenge by specifying our correctness in the context of generic pipelines. Unlike conventional pipeline hazard rules, we make no distinction between the data and control aspects. Instead, we describe the relationship between a producer instruction and a consumer instruction in a way such that both instructions can speculatively read their source operands, speculatively write their results, and go out of their program order during execution. In addition to supporting branch and value prediction, our correctness criteria allow the implementation to discard (squash) or replay instructions while being executed. We address the scalability challenge in three ways: abstraction, decomposition, and induction. First, we state our inter-instruction dependency correctness criteria in terms of read and write operations without making reference to data values. Consequently, our correctness criteria can be verified for implementations with abstract datapaths. Second, we decompose our correctness criteria into a set of smaller obligations that are easier to verify. All these obligations can be expressed as properties within the Syntactically-Safe fragment of Linear Temporal Logic (SSLTL). Third, we introduce a technique to verify SSLTL properties by induction, and prove its soundness and completeness. To demonstrate our overall strategy, we verified a term-level model of an out-of-order speculative processor. The processor model implements register renaming using a P6-style reorder buffer and branch prediction with a hybrid (discard-replay) recovery mechanism. The verification obligations (expressed in SSLTL) are checked using a tool implementing our inductive technique. Our tool, named Tahrir, is built on top of a generic interface to SMT solvers and can be generally used for verifying SSLTL properties about infinite-state systems

University of Waterloo's Institutional Repository

Exploration sémantique des modèles socio-environnementaux: Approche générique pour l'initialisation et l'observation des modèles de simulation complexes

Author: RAKOTONIRAINY Hasina,
Publication venue: HAL CCSD
Publication date: 01/01/2016
Field of study

Researchers have sought to deal with the complexity of socio-ecosystems including biophysical and social dynamics, and their interactions. In order to cope with this complexity, they need increasingly complex models, whose initialization, and observation are becoming very difficult to implement. However, no generic framework has yet been developed to address this issue. The objective of the thesis is a generic framework for specifying and implementing the initialization from numerous heterogeneous data, and the observation producing the desired indicators. The result is a set of tools and know-how, allowing thematicians to specify and automate the whole process of exploitation of a simulation model, from the initialization to the production of indicators. For this, we propose to formulate the initialization and observation as transformations among data and data structures. This formulation allows to use the Model Driven Engineering (MDE) concepts in order to implement the generic framework and the corresponding domain specific languages (DSL) allow thematicians to specify easier initialization and observation SES models.Les chercheurs veulent aborder toute la complexité des socio-écosystèmes (SES) afférents aux dynamiques biophysiques, sociales ainsi qu’à leurs interactions. Afin d'aborder cette complexité, ils ont recours à des modèles de simulation de plus en plus complexes, dont l'initialisation et l'observation sont devenues difficiles à mettre en œuvre. Toutefois, aucun cadre générique n'a encore été développé pour résoudre ce problème. L’objectif de cette thèse est de proposer un cadre générique pour la spécification et la mise en œuvre de l'initialisation, à partir de nombreuses données hétérogènes, et l'observation pour produire les indicateurs souhaités par les thématiciens. Le résultat est un ensemble d’outils et de savoir-faire, permettant aux thématiciens de spécifier et d’automatiser l'ensemble du processus d'exploitation d’un modèle de simulation, de l'initialisation à la production des indicateurs. Pour cela, nous proposons de formuler l'initialisation et l'observation des modèles de simulation en des transformations entre données et structures de données. Cette formulation permet d'utiliser les concepts de l’ingénierie dirigée par les modèles (IDM) afin de mettre en œuvre des langages dédiés (DSL). Ces derniers fournissent les concepts nécessaires permettant aux thématiciens de spécifier plus facilement l’initialisation et l’observation de modèles de SES

Thèses en Ligne

Agritrop

HAL-CIRAD

Processor architecture: from Dataflow to Superscalar and Beyond

Author: Robiè Borut
Ungerer Theo
Šilc Jurij
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1999
Field of study

OPUS Augsburg

Crossref

Numerical Study For Acoustic Micro-Imaging Of Three Dimensional Microelectronic Packages

Author: Lee Chean Shen
Publication venue
Publication date
Field of study

Complex structures and multiple interfaces of modern microelectronic packages complicate the interpretation of acoustic data. This study has four novel contributions. 1) Contributions to the finite element method. 2) Novel approaches to reduce computational cost. 3) New post processing technologies to interpret the simulation data. 4) Formation of theoretical guidance for acoustic image interpretation. The impact of simulation resolution on the numerical dispersion error and the exploration of quadrilateral infinite boundaries make up the first part of this thesis's contributions. The former focuses on establishing the convergence score of varying resolution densities in the time and spatial domain against a very high fidelity numerical solution. The latter evaluates the configuration of quadrilateral infinite boundaries in comparison against traditional circular infinite boundaries and quadrilateral Perfectly Matched Layers. The second part of this study features the modelling of a flip chip with a 140µm solder bump assembly, which is implemented with a 230MHz virtual raster scanning transducer with a spot size of 17µm. The Virtual Transducer was designed to reduce the total numerical elements from hundreds of millions to hundreds of thousands. Thirdly, two techniques are invented to analyze and evaluate simulated acoustic data: 1) The C-Line plot is a 2D max plot of specific gate interfaces that allows quantitative characterization of acoustic phenomena. 2) The Acoustic Propagation Map, contour maps an overall summary of intra sample wave propagation across the time domain in one image. Lastly, combining all the developments. The physical mechanics of edge effects was studied and verified against experimental data. A direct relationship between transducer spot size and edge effect severity was established. At regions with edge effect, the acoustic pulse interfacing with the solder bump edge is scattered mainly along the horizontal axis. The edge effect did not manifest in solder bump models without Under Bump Metallization (UBM). Measurements found acoustic penetration improvements of up to 44% with the removal of (UBM). Other acoustic mechanisms were also discovered and explored. Defect detection mechanism was investigated by modelling crack propagation in the solder bump assembly. Gradual progression of the crack was found have a predictable influence on the edge effect profile. By exploiting this feature, the progress of crack propagation from experimental data can be interpreted by evaluating the C-Scan image

LJMU Research Online (Liverpool John Moores University)

Methoden zur applikationsspezifischen Effizienzsteigerung adaptiver Prozessorplattformen

Author: Tradowsky Carsten
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2016
Field of study

General-Purpose Prozessoren sind für den durchschnittlichen Anwendungsfall optimiert, wodurch vorhandene Ressourcen nicht effizient genutzt werden. In der vorliegenden Arbeit wird untersucht, in wie weit es möglich ist, einen General-Purpose Prozessor an einzelne Anwendungen anzupassen und so die Effizienz zu steigern. Die Adaption kann zur Laufzeit durch das Prozessor- oder Laufzeitsystem anhand der jeweiligen Systemparameter erfolgen, um eine Effizienzsteigerung zu erzielen

KITopen