38 research outputs found
MPSoCBench : um framework para avaliação de ferramentas e metodologias para sistemas multiprocessados em chip
Orientador: Rodolfo Jardim de AzevedoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Recentes metodologias e ferramentas de projetos de sistemas multiprocessados em chip (MPSoC) aumentam a produtividade por meio da utilização de plataformas baseadas em simuladores, antes de definir os Ășltimos detalhes da arquitetura. No entanto, a simulação sĂł Ă© eficiente quando utiliza ferramentas de modelagem que suportem a descrição do comportamento do sistema em um elevado nĂvel de abstração. A escassez de plataformas virtuais de MPSoCs que integrem hardware e software escalĂĄveis nos motivou a desenvolver o MPSoCBench, que consiste de um conjunto escalĂĄvel de MPSoCs incluindo quatro modelos de processadores (PowerPC, MIPS, SPARC e ARM), organizado em plataformas com 1, 2, 4, 8, 16, 32 e 64 nĂșcleos, cross-compiladores, IPs, interconexĂ”es, 17 aplicaçÔes paralelas e estimativa de consumo de energia para os principais componentes (processadores, roteadores, memĂłria principal e caches). Uma importante demanda em projetos MPSoC Ă© atender Ă s restriçÔes de consumo de energia o mais cedo possĂvel. Considerando que o desempenho do processador estĂĄ diretamente relacionado ao consumo, hĂĄ um crescente interesse em explorar o trade-off entre consumo de energia e desempenho, tendo em conta o domĂnio da aplicação alvo. TĂ©cnicas de escalabilidade dinĂąmica de freqĂŒĂȘncia e voltagem fundamentam-se em gerenciar o nĂvel de tensĂŁo e frequĂȘncia da CPU, permitindo que o sistema alcance apenas o desempenho suficiente para processar a carga de trabalho, reduzindo, consequentemente, o consumo de energia. Para explorar a eficiĂȘncia energĂ©tica e desempenho, foram adicionados recursos ao MPSoCBench, visando explorar escalabilidade dinĂąmica de voltaegem e frequĂȘncia (DVFS) e foram validados trĂȘs mecanismos com base na estimativa dinĂąmica de energia e taxa de uso de CPUAbstract: Recent design methodologies and tools aim at enhancing the design productivity by providing a software development platform before the definition of the final Multiprocessor System on Chip (MPSoC) architecture details. However, simulation can only be efficiently performed when using a modeling and simulation engine that supports system behavior description at a high abstraction level. The lack of MPSoC virtual platform prototyping integrating both scalable hardware and software in order to create and evaluate new methodologies and tools motivated us to develop the MPSoCBench, a scalable set of MPSoCs including four different ISAs (PowerPC, MIPS, SPARC, and ARM) organized in platforms with 1, 2, 4, 8, 16, 32, and 64 cores, cross-compilers, IPs, interconnections, 17 parallel version of software from well-known benchmarks, and power consumption estimation for main components (processors, routers, memory, and caches). An important demand in MPSoC designs is the addressing of energy consumption constraints as early as possible. Whereas processor performance comes with a high power cost, there is an increasing interest in exploring the trade-off between power and performance, taking into account the target application domain. Dynamic Voltage and Frequency Scaling techniques adaptively scale the voltage and frequency levels of the CPU allowing it to reach just enough performance to process the system workload while meeting throughput constraints, and thereby, reducing the energy consumption. To explore this wide design space for energy efficiency and performance, both for hardware and software components, we provided MPSoCBench features to explore dynamic voltage and frequency scalability (DVFS) and evaluated three mechanisms based on energy estimation and CPU usage rateDoutoradoCiĂȘncia da ComputaçãoDoutora em CiĂȘncia da Computaçã
Parallel and Distributed Computing
The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing
Parallel architectures and runtime systems co-design for task-based programming models
The increasing parallelism levels in modern computing systems has extolled the need for a holistic vision when designing multiprocessor architectures taking in account the needs of the programming models and applications. Nowadays, system design consists of several layers on top of each other from the architecture up to the application software. Although this design allows to do a separation of concerns where it is possible to independently change layers due to a well-known interface between them, it is hampering future systems design as the Law of Moore reaches to an end. Current performance improvements on computer architecture are driven by the shrinkage of the transistor channel width, allowing faster and more power efficient chips to be made. However, technology is reaching physical limitations were the transistor size will not be able to be reduced furthermore and requires a change of paradigm in systems design.
This thesis proposes to break this layered design, and advocates for a system where the architecture and the programming model runtime system are able to exchange information towards a common goal, improve performance and reduce power consumption. By making the architecture aware of runtime information such as a Task Dependency Graph (TDG) in the case of dataflow task-based programming models, it is possible to improve power consumption by exploiting the critical path of the graph. Moreover, the architecture can provide hardware support to create such a graph in order to reduce the runtime overheads and making possible the execution of fine-grained tasks to increase the available parallelism. Finally, the current status of inter-node communication primitives can be exposed to the runtime system in order to perform a more efficient communication scheduling, and also creates new opportunities of computation and communication overlap that were not possible before. An evaluation of the proposals introduced in this thesis is provided and a methodology to simulate and characterize the application behavior is also presented.El aumento del paralelismo proporcionado por los sistemas de cĂłmputo modernos ha provocado la necesidad de una visiĂłn holĂstica en el diseño de arquitecturas multiprocesador que tome en cuenta las necesidades de los modelos de programaciĂłn y las aplicaciones. Hoy en dĂa el diseño de los computadores consiste en diferentes capas de abstracciĂłn con una interfaz bien definida entre ellas. Las limitaciones de esta aproximaciĂłn junto con el fin de la ley de Moore limitan el potencial de los futuros computadores. La mayorĂa de las mejoras actuales en el diseño de los computadores provienen fundamentalmente de la reducciĂłn del tamaño del canal del transistor, lo cual permite chips mĂĄs rĂĄpidos y con un consumo eficiente sin apenas cambios fundamentales en el diseño de la arquitectura. Sin embargo, la tecnologĂa actual estĂĄ alcanzando limitaciones fĂsicas donde no serĂĄ posible reducir el tamaño de los transistores motivando asĂ un cambio de paradigma en la construcciĂłn de los computadores. Esta tesis propone romper este diseño en capas y abogar por un sistema donde la arquitectura y el sistema de tiempo de ejecuciĂłn del modelo de programaciĂłn sean capaces de intercambiar informaciĂłn para alcanzar una meta comĂșn: La mejora del rendimiento y la reducciĂłn del consumo energĂ©tico. Haciendo que la arquitectura sea consciente de la informaciĂłn disponible en el modelo de programaciĂłn, como puede ser el grafo de dependencias entre tareas en los modelos de programaciĂłn dataflow, es posible reducir el consumo energĂ©tico explotando el camino critico del grafo. AdemĂĄs, la arquitectura puede proveer de soporte hardware para crear este grafo con el objetivo de reducir el overhead de construir este grado cuando la granularidad de las tareas es demasiado fina. Finalmente, el estado de las comunicaciones entre nodos puede ser expuesto al sistema de tiempo de ejecuciĂłn para realizar una mejor planificaciĂłn de las comunicaciones y creando nuevas oportunidades de solapamiento entre cĂłmputo y comunicaciĂłn que no eran posibles anteriormente. Esta tesis aporta una evaluaciĂłn de todas estas propuestas, asĂ como una metodologĂa para simular y caracterizar el comportamiento de las aplicacionesPostprint (published version
Cross-Layer Rapid Prototyping and Synthesis of Application-Specific and Reconfigurable Many-accelerator Platforms
Technological advances of recent years laid the foundation consolidation of informatisationof society, impacting on economic, political, cultural and socialdimensions. At the peak of this realization, today, more and more everydaydevices are connected to the web, giving the term âInternet of Thingsâ. The futureholds the full connection and interaction of IT and communications systemsto the natural world, delimiting the transition to natural cyber systems and offeringmeta-services in the physical world, such as personalized medical care, autonomoustransportation, smart energy cities etc. . Outlining the necessities of this dynamicallyevolving market, computer engineers are required to implement computingplatforms that incorporate both increased systemic complexity and also cover awide range of meta-characteristics, such as the cost and design time, reliabilityand reuse, which are prescribed by a conflicting set of functional, technical andconstruction constraints. This thesis aims to address these design challenges bydeveloping methodologies and hardware/software co-design tools that enable therapid implementation and efficient synthesis of architectural solutions, which specifyoperating meta-features required by the modern market. Specifically, this thesispresents a) methodologies to accelerate the design flow for both reconfigurableand application-specific architectures, b) coarse-grain heterogeneous architecturaltemplates for processing and communication acceleration and c) efficient multiobjectivesynthesis techniques both at high abstraction level of programming andphysical silicon level.Regarding to the acceleration of the design flow, the proposed methodologyemploys virtual platforms in order to hide architectural details and drastically reducesimulation time. An extension of this framework introduces the systemicco-simulation using reconfigurable acceleration platforms as co-emulation intermediateplatforms. Thus, the development cycle of a hardware/software productis accelerated by moving from a vertical serial flow to a circular interactive loop.Moreover the simulation capabilities are enriched with efficient detection and correctiontechniques of design errors, as well as control methods of performancemetrics of the system according to the desired specifications, during all phasesof the system development. In orthogonal correlation with the aforementionedmethodological framework, a new architectural template is proposed, aiming atbridging the gap between design complexity and technological productivity usingspecialized hardware accelerators in heterogeneous systems-on-chip and networkon-chip platforms. It is presented a novel co-design methodology for the hardwareaccelerators and their respective programming software, including the tasks allocationto the available resources of the system/network. The introduced frameworkprovides implementation techniques for the accelerators, using either conventionalprogramming flows with hardware description language or abstract programmingmodel flows, using techniques from high-level synthesis. In any case, it is providedthe option of systemic measures optimization, such as the processing speed,the throughput, the reliability, the power consumption and the design silicon area.Finally, on addressing the increased complexity in design tools of reconfigurablesystems, there are proposed novel multi-objective optimization evolutionary algo-rithms which exploit the modern multicore processors and the coarse-grain natureof multithreaded programming environments (e.g. OpenMP) in order to reduce theplacement time, while by simultaneously grouping the applications based on theirintrinsic characteristics, the effectively explore the design space effectively.The efficiency of the proposed architectural templates, design tools and methodologyflows is evaluated in relation to the existing edge solutions with applicationsfrom typical computing domains, such as digital signal processing, multimedia andarithmetic complexity, as well as from systemic heterogeneous environments, suchas a computer vision system for autonomous robotic space navigation and manyacceleratorsystems for HPC and workstations/datacenters. The results strengthenthe belief of the author, that this thesis provides competitive expertise to addresscomplex modern - and projected future - design challenges.ÎÎč ÏΔÏÎœÎżÎ»ÎżÎłÎčÎșÎÏ Î”ÎŸÎ”Î»ÎŻÎŸÎ”ÎčÏ ÏÏÎœ ÏΔλΔÏ
ÏαίÏÎœ ΔÏÏÎœ ÎΞΔÏαΜ Ïα ΞΔΌÎλÎčα ΔΎÏαίÏÏÎ·Ï ÏÎ·Ï ÏληÏÎżÏÎżÏÎčÎżÏοίηÏÎ·Ï ÏÎ·Ï ÎșÎżÎčÎœÏÎœÎŻÎ±Ï, ΔÏÎčÎŽÏÏÎœÏÎ±Ï ÏΔ ÎżÎčÎșÎżÎœÎżÎŒÎčÎșÎÏ,ÏολÎčÏÎčÎșÎÏ, ÏολÎčÏÎčÏÏÎčÎșÎÏ ÎșαÎč ÎșÎżÎčÎœÏÎœÎčÎșÎÏ ÎŽÎčαÏÏÎŹÏΔÎčÏ. ÎŁÏÎż αÏÏγΔÎčÎż αÏ
ÏÎźÏ Ïη ÏÏÏÎ±ÎłÎŒÎŹÏÏÏηÏ, ÏÎźÎŒÎ”Ïα, ολοÎΜα ÎșαÎč ÏΔÏÎčÏÏÏÏΔÏÎ”Ï ÎșαΞηΌΔÏÎčÎœÎÏ ÏÏ
ÏÎșΔÏ
ÎÏ ÏÏ
ΜΎÎÎżÎœÏαÎč ÏÏÎż ÏαγÎșÏÏÎŒÎčÎż ÎčÏÏÏ, αÏÎżÎŽÎŻÎŽÎżÎœÏÎ±Ï ÏÎżÎœ ÏÏÎż «ÎÎœÏΔÏÎœÎ”Ï ÏÏÎœ ÏÏÎ±ÎłÎŒÎŹÏÏΜ».΀ο ÎŒÎÎ»Î»ÎżÎœ ΔÏÎčÏÏ
λΏÏÏΔÎč ÏηΜ ÏλΟÏη ÏÏΜΎΔÏη ÎșαÎč αλληλΔÏÎŻÎŽÏαÏη ÏÏÎœ ÏÏ
ÏÏÎ·ÎŒÎŹÏÏÎœ ÏληÏÎżÏÎżÏÎčÎșÎźÏ ÎșαÎč ΔÏÎčÎșÎżÎčÎœÏÎœÎčÏÎœ ΌΔ ÏÎżÎœ ÏÏ
ÏÎčÎșÏ ÎșÏÏÎŒÎż, ÎżÏÎčοΞΔÏÏÎœÏÎ±Ï Ïη ΌΔÏÎŹÎČαÏη ÏÏα ÏÏ
ÏÏÎźÎŒÎ±Ïα ÏÏ
ÏÎčÎșÎżÏ ÎșÏ
ÎČΔÏÎœÎżÏÏÏÎżÏ
ÎșαÎč ÏÏÎżÏÏÎÏÎżÎœÏÎ±Ï ÎŒÎ”ÏαÏ
ÏηÏΔÏÎŻÎ”Ï ÏÏÎżÎœ ÏÏ
ÏÎčÎșÏ ÎșÏÏÎŒÎż ÏÏÏÏ ÏÏÎżÏÏÏÎżÏÎżÎčηΌÎΜη ÎčαÏÏÎčÎșÎź ÏΔÏίΞαλÏη, αÏ
ÏÏÎœÎżÎŒÎ”Ï ÎŒÎ”ÏαÎșÎčÎœÎźÏΔÎčÏ, ÎΟÏ
ÏÎœÎ”Ï Î”ÎœÎ”ÏγΔÎčαÎșÎŹ ÏÏλΔÎčÏ Îș.α. . ÎŁÎșÎčαγÏαÏÏÎœÏÎ±Ï ÏÎčÏ Î±ÎœÎŹÎłÎșÎ”Ï Î±Ï
ÏÎźÏ ÏÎ·Ï ÎŽÏ
ΜαΌÎčÎșÎŹ ΔΟΔλÎčÏÏÏÎŒÎ”ÎœÎ·Ï Î±ÎłÎżÏÎŹÏ, ÎżÎč ΌηÏαΜÎčÎșοί Ï
ÏολογÎčÏÏÏÎœ ÎșαλοÏÎœÏαÎč Μα Ï
λοÏÎżÎčÎźÏÎżÏ
Îœ Ï
ÏολογÎčÏÏÎčÎșÎÏ ÏλαÏÏÏÏÎŒÎ”Ï ÏÎżÏ
αÏΔΜÏÏ Î”ÎœÏÏΌαÏÏÎœÎżÏ
Îœ αÏ
ΟηΌÎΜη ÏÏ
ÏÏηΌÎčÎșÎź ÏολÏ
ÏλοÎșÏÏηÏα ÎșαÎč αÏΔÏÎÏÎżÏ
ÎșαλÏÏÏÎżÏ
Îœ ÎΜα ΔÏ
ÏÏ ÏÎŹÏΌα ΌΔÏαÏαÏαÎșÏηÏÎčÏÏÎčÎșÏÎœ, ÏÏÏÏ Î».Ï. ÏÎż ÎșÏÏÏÎżÏ ÏÏΔΎÎčαÏÎŒÎżÏ, Îż ÏÏÏÎœÎżÏ ÏÏΔΎÎčαÏÎŒÎżÏ, η αΟÎčÎżÏÎčÏÏία ÎșαÎč η ΔÏαΜαÏÏηÏÎčÎŒÎżÏοίηÏη, Ïα ÎżÏοία ÏÏοΎÎčαγÏÎŹÏÎżÎœÏαÎč αÏÏ ÎΜα αΜÏÎčÎșÏÎżÏ
ÏÎŒÎ”ÎœÎż ÏÏÎœÎżÎ»Îż λΔÎčÏÎżÏ
ÏÎłÎčÎșÏÎœ, ÏΔÏÎœÎżÎ»ÎżÎłÎčÎșÏÎœ ÎșαÎč ÎșαÏαÏÎșΔÏ
αÏÏÎčÎșÏÎœ ÏΔÏÎčÎżÏÎčÏÎŒÏÎœ. Î ÏαÏÎżÏÏα ÎŽÎčαÏÏÎčÎČÎź ÏÏÎżÏΔÏΔÎč ÏÏηΜ αΜÏÎčΌΔÏÏÏÎčÏη ÏÏÎœ ÏαÏαÏÎŹÎœÏ ÏÏΔΎÎčαÏÏÎčÎșÏÎœ ÏÏÎżÎșλΟÏΔÏÎœ, ÎŒÎÏÏ ÏÎ·Ï Î±ÎœÎŹÏÏÏ
ÎŸÎ·Ï ÎŒÎ”ÎžÎżÎŽÎżÎ»ÎżÎłÎčÏÎœ ÎșαÎč ΔÏγαλΔίÏÎœ ÏÏ
ÎœÏÏΔΎίαÏÎ·Ï Ï
λÎčÎșÎżÏ/λογÎčÏÎŒÎčÎșÎżÏ ÏÎżÏ
ΔÏÎčÏÏÎÏÎżÏ
Îœ ÏηΜ ÏαÏΔία Ï
λοÏοίηÏη ÎșαΞÏÏ ÎșαÎč ÏηΜ αÏοΎοÏÎčÎșÎź ÏÏΜΞΔÏη αÏÏÎčÏΔÎșÏÎżÎœÎčÎșÏÎœ λÏÏΔÏÎœ, ÎżÎč ÎżÏÎżÎŻÎ”Ï ÏÏοΎÎčαγÏÎŹÏÎżÏ
Îœ Ïα ΌΔÏα-ÏαÏαÎșÏηÏÎčÏÏÎčÎșÎŹ λΔÎčÏÎżÏ
ÏÎłÎŻÎ±Ï ÏÎżÏ
αÏαÎčÏΔί η ÏÏÎłÏÏÎżÎœÎ· αγοÏÎŹ. ÎŁÏ
ÎłÎșΔÎșÏÎčÎŒÎΜα, ÏÏα ÏλαίÏÎčα αÏ
ÏÎźÏ ÏÎ·Ï ÎŽÎčαÏÏÎčÎČÎźÏ, ÏαÏÎżÏ
ÏÎčÎŹÎ¶ÎżÎœÏαÎč α) ÎŒÎ”ÎžÎżÎŽÎżÎ»ÎżÎłÎŻÎ”Ï Î”ÏÎčÏÎŹÏÏ
ÎœÏÎ·Ï ÏÎ·Ï ÏÎżÎźÏ ÏÏΔΎÎčαÏÎŒÎżÏ ÏÏÏÎż ÎłÎčα ΔÏαΜαΎÎčÎ±ÎŒÎżÏÏÎżÏÎŒÎ”ÎœÎ”Ï ÏÏÎż ÎșαÎč ÎłÎčα ΔΟΔÎčÎŽÎčÎșΔÏ
ÎŒÎÎœÎ”Ï Î±ÏÏÎčÏΔÎșÏÎżÎœÎčÎșÎÏ, ÎČ) ΔÏΔÏÎżÎłÎ”ÎœÎź αΎÏÎżÎŒÎ”ÏÎź αÏÏÎčÏΔÎșÏÎżÎœÎčÎșÎŹ ÏÏÏÏÏ
Ïα ΔÏÎčÏÎŹÏÏ
ÎœÏÎ·Ï Î”ÏΔΟΔÏγαÏÎŻÎ±Ï ÎșαÎč ΔÏÎčÎșÎżÎčÎœÏÎœÎŻÎ±Ï ÎșαÎč Îł) αÏοΎοÏÎčÎșÎÏ ÏΔÏÎœÎčÎșÎÏ ÏολÏ
ÎșÏÎčÏηÏÎčαÎșÎźÏ ÏÏΜΞΔÏÎ·Ï ÏÏÏÎż ÏΔ Ï
ÏÎ·Î»Ï Î±ÏαÎčÏΔÏÎčÎșÏ Î”ÏÎŻÏΔΎο ÏÏογÏαΌΌαÏÎčÏÎŒÎżÏ,ÏÏÎż ÎșαÎč ÏΔ ÏÏ
ÏÎčÎșÏ Î”ÏÎŻÏΔΎο ÏÏ
ÏÎčÏÎŻÎżÏ
.ÎΜαÏÎżÏÎčÎșÎŹ ÏÏÎżÏ ÏηΜ ΔÏÎčÏÎŹÏÏ
ÎœÏη ÏÎ·Ï ÏÎżÎźÏ ÏÏΔΎÎčαÏÎŒÎżÏ, ÏÏÎżÏÎ”ÎŻÎœÎ”ÏαÎč ÎŒÎčα ÎŒÎ”ÎžÎżÎŽÎżÎ»ÎżÎłÎŻÎ± ÏÎżÏ
ÏÏηÏÎčÎŒÎżÏÎżÎčΔί ΔÎčÎșÎżÎœÎčÎșÎÏ ÏλαÏÏÏÏΌΔÏ, ÎżÎč ÎżÏÎżÎŻÎ”Ï Î±ÏαÎčÏÏÎœÏÎ±Ï ÏÎčÏ Î±ÏÏÎčÏΔÎșÏÎżÎœÎčÎșÎÏ Î»Î”ÏÏÎżÎŒÎÏΔÎčÎ”Ï ÎșαÏαÏÎÏÎœÎżÏ
Îœ Μα ΌΔÎčÏÏÎżÏ
Îœ ÏηΌαΜÏÎčÎșÎŹ ÏÎż ÏÏÏÎœÎż Î”ÎŸÎżÎŒÎżÎŻÏÏηÏ. ΠαÏΏλληλα, ΔÎčÏηγΔίÏαÎč η ÏÏ
ÏÏηΌÎčÎșÎź ÏÏ
Îœ-Î”ÎŸÎżÎŒÎżÎŻÏÏη ΌΔ Ïη ÏÏÎźÏη ΔÏαΜαΎÎčÎ±ÎŒÎżÏÏÎżÏΌΔΜÏÎœ ÏλαÏÏÎżÏÎŒÏÎœ, ÏÏ ÎŒÎÏÏÎœ ΔÏÎčÏÎŹÏÏ
ÎœÏηÏ. ÎΔ αÏ
ÏÏÎœ ÏÎżÎœ ÏÏÏÏÎż, Îż ÎșÏÎșÎ»ÎżÏ Î±ÎœÎŹÏÏÏ
ÎŸÎ·Ï Î”ÎœÏÏ ÏÏÎżÏÏÎœÏÎżÏ Ï
λÎčÎșÎżÏ, ΌΔÏαÏΔΞΔÎčÎŒÎÎœÎżÏ Î±ÏÏ ÏηΜ ÎșΏΞΔÏη ÏΔÎčÏÎčαÎșÎź ÏοΟ ÏΔ ÎΜαΜ ÎșÏ
ÎșλÎčÎșÏ Î±Î»Î»Î·Î»Î”ÏÎčÎŽÏαÏÏÎčÎșÏ ÎČÏÏÎłÏÎż, ÎșαΞίÏÏαÏαÎč ÏαÏÏÏΔÏÎżÏ, Î”ÎœÏ ÎżÎč ÎŽÏ
ΜαÏÏÏηÏÎ”Ï ÏÏÎżÏÎżÎŒÎżÎŻÏÏÎ·Ï Î”ÎŒÏλοÏ
ÏÎŻÎ¶ÎżÎœÏαÎč ΌΔ αÏοΎοÏÎčÎșÏÏΔÏÎ”Ï ÎŒÎ”ÎžÏÎŽÎżÏ
Ï Î”ÎœÏÎżÏÎčÏÎŒÎżÏ ÎșαÎč ÎŽÎčÏÏΞÏÏÎ·Ï ÏÏΔΎÎčαÏÏÎčÎșÏÎœ ÏÏÎ±Î»ÎŒÎŹÏÏÎœ, ÎșαΞÏÏ ÎșαÎč ΌΔΞÏÎŽÎżÏ
Ï Î”Î»ÎÎłÏÎżÏ
ÏÏÎœ ΌΔÏÏÎčÎșÏÎœ αÏÏÎŽÎżÏÎ·Ï ÏÎżÏ
ÏÏ
ÏÏÎźÎŒÎ±ÏÎżÏ ÏΔ ÏÏÎÏη ΌΔ ÏÎčÏ Î”ÏÎčΞÏ
ΌηÏÎÏ ÏÏοΎÎčαγÏαÏÎÏ, ÏΔ ÏÎ»Î”Ï ÏÎčÏ ÏÎŹÏΔÎčÏ Î±ÎœÎŹÏÏÏ
ÎŸÎ·Ï ÏÎżÏ
ÏÏ
ÏÏÎźÎŒÎ±ÏÎżÏ. ΣΔ ÎżÏΞογÏÎœÎčα ÏÏ
ÎœÎŹÏΔÎčα ΌΔ ÏÎż ÏÏÎżÎ±ÎœÎ±ÏΔÏΞÎÎœ ÎŒÎ”ÎžÎżÎŽÎżÎ»ÎżÎłÎčÎșÏ ÏλαίÏÎčÎż, ÏÏÎżÏÎ”ÎŻÎœÎżÎœÏαÎč ÎœÎα αÏÏÎčÏΔÎșÏÎżÎœÎčÎșÎŹ ÏÏÏÏÏ
Ïα ÏÎżÏ
ÏÏÎżÏΔÏÎżÏ
Îœ ÏÏη γΔÏÏÏÏÏη ÏÎżÏ
ÏÎŹÏΌαÏÎżÏ ÎŒÎ”ÏÎ±ÎŸÏ ÏÎ·Ï ÏÏΔΎÎčαÏÏÎčÎșÎźÏ ÏολÏ
ÏλοÎșÏÏηÏÎ±Ï ÎșαÎč ÏÎ·Ï ÏΔÏÎœÎżÎ»ÎżÎłÎčÎșÎźÏ ÏαÏαγÏÎłÎčÎșÏÏηÏαÏ, ΌΔ Ïη ÏÏÎźÏη ÏÏ
ÏÏÎ·ÎŒÎŹÏÏÎœ ΔΟΔÎčÎŽÎčÎșΔÏ
ÎŒÎÎœÏÎœ ΔÏÎčÏαÏÏ
ÎœÏÏÎœ Ï
λÎčÎșÎżÏ ÏΔ ΔÏΔÏÎżÎłÎ”ÎœÎź ÏÏ
ÏÏÎźÎŒÎ±Ïα-ÏΔ-ÏηÏίΎα ÎșαΞÏÏ ÎșαÎč ÎŽÎŻÎșÏÏ
α-ÏΔ-ÏηÏίΎα. ΠαÏÎżÏ
ÏÎčΏζΔÏαÎč ÎșαÏΏλληλη ÎŒÎ”ÎžÎżÎŽÎżÎ»ÎżÎłÎŻÎ± ÏÏ
Îœ-ÏÏΔΎίαÏÎ·Ï ÏÏÎœ ΔÏÎčÏαÏÏ
ÎœÏÏÎœ Ï
λÎčÎșÎżÏ ÎșαÎč ÏÎżÏ
λογÎčÏÎŒÎčÎșÎżÏ ÏÏÎżÎșΔÎčÎŒÎÎœÎżÏ
Μα αÏÎżÏαÏÎčÏΞΔί η ÎșαÏÎ±ÎœÎżÎŒÎź ÏÏÎœ ΔÏγαÏÎčÏÎœ ÏÏÎżÏ
Ï ÎŽÎčαΞÎÏÎčÎŒÎżÏ
Ï ÏÏÏÎżÏ
Ï ÏÎżÏ
ÏÏ
ÏÏÎźÎŒÎ±ÏÎżÏ/ÎŽÎčÎșÏÏÎżÏ
. ΀ο ÎŒÎ”ÎžÎżÎŽÎżÎ»ÎżÎłÎčÎșÏ ÏλαίÏÎčÎż ÏÏÎżÎČλÎÏΔÎč ÏηΜ Ï
λοÏοίηÏη ÏÏÎœ ΔÏÎčÏαÏÏ
ÎœÏÏÎœ ΔίÏΔ ΌΔ ÏÏ
ÎŒÎČαÏÎčÎșÎÏ ÎŒÎ”ÎžÏÎŽÎżÏ
Ï ÏÏογÏαΌΌαÏÎčÏÎŒÎżÏ ÏΔ γλÏÏÏα ÏΔÏÎčÎłÏαÏÎźÏ Ï
λÎčÎșÎżÏ Î”ÎŻÏΔ ΌΔ αÏαÎčÏΔÏÎčÎșÏ ÏÏογÏαΌΌαÏÎčÏÏÎčÎșÏ ÎŒÎżÎœÏÎλο ΌΔ Ïη ÏÏÎźÏη ÏΔÏÎœÎčÎșÏÎœ Ï
ÏÎ·Î»ÎżÏ Î”ÏÎčÏÎÎŽÎżÏ
ÏÏΜΞΔÏηÏ. ΣΔ ÎșΏΞΔ ÏΔÏÎŻÏÏÏÏη, ΎίΎΔÏαÎč η ÎŽÏ
ΜαÏÏÏηÏα ÏÏÎż ÏÏΔΎÎčαÏÏÎź ÎłÎčα ÎČΔλÏÎčÏÏÎżÏοίηÏη ÏÏ
ÏÏηΌÎčÎșÏÎœ ΌΔÏÏÎčÎșÏÎœ, ÏÏÏÏ Î· ÏαÏÏÏηÏα ΔÏΔΟΔÏγαÏίαÏ, η ÏÏ
ΞΌαÏÏÎŽÎżÏη, η αΟÎčÎżÏÎčÏÏία, η ÎșαÏÎ±ÎœÎŹÎ»ÏÏη ΔΜÎÏγΔÎčÎ±Ï ÎșαÎč η ΔÏÎčÏÎŹÎœÎ”Îčα ÏÏ
ÏÎčÏÎŻÎżÏ
ÏÎżÏ
ÏÏΔΎÎčαÏÎŒÎżÏ. ΀ÎλοÏ, ÏÏÎżÎșΔÎčÎŒÎÎœÎżÏ
Μα αΜÏÎčΌΔÏÏÏÎčÏΞΔί η αÏ
ΟηΌÎΜη ÏολÏ
ÏλοÎșÏÏηÏα ÏÏα ÏÏΔΎÎčαÏÏÎčÎșÎŹ ΔÏγαλΔία ΔÏαΜαΎÎčÎ±ÎŒÎżÏÏÎżÏΌΔΜÏÎœ ÏÏ
ÏÏÎ·ÎŒÎŹÏÏÎœ, ÏÏÎżÏÎ”ÎŻÎœÎżÎœÏαÎč ÎœÎÎżÎč ΔΟΔλÎčÎșÏÎčÎșοί αλγÏÏÎčÎžÎŒÎżÎč ÏολÏ
ÎșÏÎčÏηÏÎčαÎșÎźÏ ÎČΔλÏÎčÏÏÎżÏοίηÏηÏ, ÎżÎč ÎżÏοίοÎč ΔÎșΌΔÏαλλΔÏ
ÏÎŒÎ”ÎœÎżÎč ÏÎżÏ
Ï ÏÏÎłÏÏÎżÎœÎżÏ
Ï ÏολÏ
ÏÏÏÎ·ÎœÎżÏ
Ï Î”ÏΔΟΔÏγαÏÏÎÏ ÎșαÎč ÏηΜ αΎÏÎżÎŒÎ”ÏÎź ÏÏÏη ÏÏÎœ ÏολÏ
ΜηΌαÏÎčÎșÏÎœ ÏΔÏÎčÎČαλλÏÎœÏÏÎœ ÏÏογÏαΌΌαÏÎčÏÎŒÎżÏ (Ï.Ï. OpenMP), ΌΔÎčÏÎœÎżÏ
Îœ ÏÎż ÏÏÏÎœÎż ΔÏίλÏ
ÏÎ·Ï ÏÎżÏ
ÏÏÎżÎČÎ»ÎźÎŒÎ±ÏÎżÏ ÏÎ·Ï ÏÎżÏοΞÎÏηÏÎ·Ï ÏÏÎœ λογÎčÎșÏÎœ ÏÏÏÏÎœ ÏΔ ÏÏ
ÏÎčÎșÎżÏÏ,Î”ÎœÏ ÏαÏ
ÏÏÏÏÎżÎœÎ±, ÎżÎŒÎ±ÎŽÎżÏÎżÎčÏÎœÏÎ±Ï ÏÎčÏ Î”ÏαÏÎŒÎżÎłÎÏ ÎČÎŹÏη ÏÏÎœ Î”ÎłÎłÎ”ÎœÏÎœ ÏαÏαÎșÏηÏÎčÏÏÎčÎșÏÎœ ÏÎżÏ
Ï, ÎŽÎčΔÏΔÏ
ÎœÎżÏÎœ αÏÎżÏΔλΔÏΌαÏÎčÎșÏÏΔÏα ÏÎż ÏÏÏÎż ÏÏΔΎίαÏηÏ.ΠαÏοΎοÏÎčÎșÏÏηÏÎŹ ÏÏÎœ ÏÏÎżÏΔÎčÎœÏΌΔΜÏÎœ αÏÏÎčÏΔÎșÏÎżÎœÎčÎșÏÎœ ÏÏÎżÏÏÏÏÎœ ÎșαÎč ÎŒÎ”ÎžÎżÎŽÎżÎ»ÎżÎłÎčÏÎœ ΔÏαληΞΔÏÏηÎșΔ ÏΔ ÏÏÎÏη ΌΔ ÏÎčÏ Ï
ÏÎčÏÏÎŹÎŒÎ”ÎœÎ”Ï Î»ÏÏΔÎčÏ Î±ÎčÏÎŒÎźÏ ÏÏÏÎż ÏΔ αÏ
ÏÎżÏÎ”Î»ÎźÏ Î”ÏαÏÎŒÎżÎłÎÏ, ÏÏÏÏ Î· ÏηÏÎčαÎșÎź ΔÏΔΟΔÏγαÏία ÏÎźÎŒÎ±ÏÎżÏ, Ïα ÏολÏ
ÎŒÎÏα ÎșαÎč Ïα ÏÏÎżÎČÎ»ÎźÎŒÎ±Ïα αÏÎčΞΌηÏÎčÎșÎźÏ ÏολÏ
ÏλοÎșÏÏηÏαÏ, ÎșαΞÏÏ ÎșαÎč ÏΔ ÏÏ
ÏÏηΌÎčÎșÎŹ ΔÏΔÏÎżÎłÎ”ÎœÎź ÏΔÏÎčÎČÎŹÎ»Î»ÎżÎœÏα, ÏÏÏÏ ÎΜα ÏÏÏÏηΌα ÏÏαÏÎ·Ï Ï
ÏολογÎčÏÏÏÎœ ÎłÎčα αÏ
ÏÏÎœÎżÎŒÎ± ÎŽÎčαÏÏηΌÎčÎșÎŹ ÏÎżÎŒÏÎżÏÎčÎșÎŹ ÎżÏÎźÎŒÎ±Ïα ÎșαÎč ÎΜα ÏÏÏÏηΌα ÏολλαÏλÏÎœ ΔÏÎčÏαÏÏ
ÎœÏÏÎœ Ï
λÎčÎșÎżÏ ÎłÎčα ÏÏÎ±ÎžÎŒÎżÏÏ Î”ÏγαÏÎŻÎ±Ï ÎșαÎč ÎșÎÎœÏÏα ÎŽÎ”ÎŽÎżÎŒÎÎœÏÎœ, ÏÏÎżÏΔÏÎżÎœÏÎ±Ï Î”ÏαÏÎŒÎżÎłÎÏ Ï
ÏÎ·Î»ÎźÏ Ï
ÏολογÎčÏÏÎčÎșÎźÏ Î±ÏÏÎŽÎżÏÎ·Ï (HPC). ΀α αÏÎżÏΔλÎÏΌαÏα ΔΜÎčÏÏÏÎżÏ
Îœ ÏηΜ ÏΔÏοίΞηÏη ÏÎżÏ
ÎłÏÎŹÏÎżÎœÏα, ÏÏÎč η ÏαÏÎżÏÏα ÎŽÎčαÏÏÎčÎČÎź ÏαÏÎÏΔÎč αΜÏαγÏÎœÎčÏÏÎčÎșÎź ÏΔÏÎœÎżÎłÎœÏÏία ÎłÎčα ÏηΜ αΜÏÎčΌΔÏÏÏÎčÏη ÏÏÎœ ÏολÏÏλοÎșÏÎœ ÏÏÎłÏÏÎżÎœÏÎœ ÎșαÎč ÏÏÎżÎČλΔÏÏΌΔΜα ÎŒÎ”Î»Î»ÎżÎœÏÎčÎșÏÎœ ÏÏΔΎÎčαÏÏÎčÎșÏÎœ ÏÏÎżÎșλΟÏΔÏÎœ
Embedded electronic systems driven by run-time reconfigurable hardware
Abstract
This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology âavailable through SRAM-based FPGA/SoC devicesâ aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation âsilicon area, processing time, complexity, flexibility, functional density, cost and power consumptionâ in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry.Resumen
Esta tesis doctoral abarca el diseño de sistemas electrĂłnicos embebidos basados en tecnologĂa hardware dinĂĄmicamente reconfigurable âdisponible a travĂ©s de dispositivos lĂłgicos programables SRAM FPGA/SoCâ que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguraciĂłn que proporcione a la FPGA la capacidad de reconfiguraciĂłn dinĂĄmica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicaciĂłn particionada en tareas multiplexadas en tiempo y en espacio, optimizando asĂ su implementaciĂłn fĂsica âĂĄrea de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipadaâ comparada con otras alternativas basadas en hardware estĂĄtico (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalĂșa el flujo de diseño de dicha tecnologĂa a travĂ©s del prototipado de varias aplicaciones de ingenierĂa (sistemas de control, coprocesadores aritmĂ©ticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotaciĂłn en la industria.Resum
Aquesta tesi doctoral estĂ orientada al disseny de sistemes electrĂČnics empotrats basats en tecnologia hardware dinĂ micament reconfigurable âdisponible mitjançant dispositius lĂČgics programables SRAM FPGA/SoCâ que contribueixin a la millora de la qualitat de vida de la societat. Sâinvestiga lâarquitectura del sistema i del motor de reconfiguraciĂł que proporcioni a la FPGA la capacitat de reconfiguraciĂł dinĂ mica parcial dels seus recursos programables, amb lâobjectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicaciĂł particionada en tasques multiplexades en temps i en espai, optimizant aixĂ la seva implementaciĂł fĂsica âĂ rea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potĂšncia dissipadaâ comparada amb altres alternatives basades en hardware estĂ tic (MCU, DSP, GPU, ASSP, ASIC, etc.). SâevalĂșa el fluxe de disseny dâaquesta tecnologia a travĂ©s del prototipat de varies aplicacions dâenginyeria (sistemes de control, coprocessadors aritmĂštics, processadors dâimatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotaciĂł a la indĂșstria
Many-core architectures with time predictable execution Support for hard real-time applications
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 183-193).Hybrid control systems are a growing domain of application. They are pervasive and their complexity is increasing rapidly. Distributed control systems for future "Intelligent Grid" and renewable energy generation systems are demanding high-performance, hard real-time computation, and more programmability. General-purpose computer systems are primarily designed to process data and not to interact with physical processes as required by these systems. Generic general-purpose architectures even with the use of real-time operating systems fail to meet the hard realtime constraints of hybrid system dynamics. ASIC, FPGA, or traditional embedded design approaches to these systems often result in expensive, complicated systems that are hard to program, reuse, or maintain. In this thesis, we propose a domain-specific architecture template targeting hybrid control system applications. Using power electronics control applications, we present new modeling techniques, synthesis methodologies, and a parameterizable computer architecture for these large distributed control systems. We propose a new system modeling approach, called Adaptive Hybrid Automaton, based on previous work in control system theory, that uses a mixed-model abstractions and lends itself well to digital processing. We develop a domain-specific architecture based on this modeling that uses heterogeneous processing units and predictable execution, called MARTHA. We develop a hard real-time aware router architecture to enable deterministic on-chip interconnect network communication. We present several algorithms for scheduling task-based applications onto these types of heterogeneous architectures. We create Heracles, an open-source, functional, parameterized, synthesizable many-core system design toolkit, that can be used to explore future multi/many-core processors with different topologies, routing schemes, processing elements or cores, and memory system organizations. Using the Heracles design tool we build a prototype of the proposed architecture using a state-of-the-art FPGA-based platform, and deploy and test it in actual physical power electronics systems. We develop and release an open-source, small representative set of power electronics system applications that can be used for hard real-time application benchmarking.by Michel A. Kinsy.Ph.D
Erreichen von Performance in Netzwerken-On-Chip fĂŒr Echtzeitsysteme
In many new applications, such as in automatic driving, high performance requirements have reached safety critical real-time systems. Consequently, Networks-on-Chip (NoCs) must efficiently host new sets of highly dynamic workloads e.g., high resolution sensor fusion and data processing, autonomous decisionâs making combined with machine learning.
The static platform management, as used in current safety critical systems, is no more sufficient to provide the needed level of service. A dynamic platform management could meet the challenge, but it usually suffers from a lack of predictability and the simplicity necessary for certification of safety and real-time properties. In this work, we propose a novel, global and dynamic arbitration for NoCs
with real-time QoS requirements. The mechanism decouples the admission control from arbitration in routers thereby simplifying a dynamic adaptation and real-time analysis. Consequently, the proposed solution allows the deployment of a sophisticated contract-based QoS provisioning without introducing complicated and hard to maintain schemes, known from the frequently applied static arbiters.
The presented work introduces an overlay network to synchronize transmissions using arbitration units called Resource Managers (RMs), which allows global and work-conserving scheduling. The description of resource allocation strategies is supplemented by protocol design and verification methodology bringing adaptive control to NoC communication in setups with different QoS requirements and traffic classes. For doing that, a formal worst-case timing analysis for the mechanism has been proposed which demonstrates that this solution not only exposes higher performance in simulation but, even more importantly, consistently reaches smaller formally guaranteed worst-case latencies than other strategies for realistic levels of system's utilization.
The approach is not limited to a specific network architecture or topology as the mechanism does not require modifications of routers and therefore can be used together with the majority of existing manycore systems. Indeed, the evaluation followed using the generic performance optimized router designs, as well as two systems-on-chip focused on real-time deployments. The results confirmed that the proposed approach proves to exhibit significantly higher average performance in simulation and execution.In vielen neuen sicherheitskritische Anwendungen, wie z.B. dem automatisierten
Fahren, werden groĂe Anforderungen an die Leistung von Echtzeitsysteme gestellt.
Daher mĂŒssen Networks-on-Chip (NoCs) neue, hochdynamische Workloads
wie z.B. hochauflösende Sensorfusion und Datenverarbeitung oder autonome Entscheidungsfindung
kombiniert mit maschineller Lernen, effizient auf einem System unterbringen.
Die Steuerung der zugrunde liegenden NoC-Architektur, muss die Systemsicherheit vor Fehlern,
resultierend aus dem dynamischen Verhalten des Systems schĂŒtzen und
gleichzeitig die geforderte Performance bereitstellen.
In dieser Arbeit schlagen wir eine neuartige, globale und dynamische Steuerung
fĂŒr NoCs mit Echtzeit QoS Anforderungen vor. Das Schema entkoppelt die Zutrittskontrolle
von der Arbitrierung in Routern. Hierdurch wird eine dynamische Anpassung
ermöglicht und die Echtzeitanalyse vereinfacht. Der Einsatz einer ausgefeilten
vertragsbasierten Ressourcen-Zuweisung wird so ermöglicht, ohne komplexe und schwer wartbare Mechanismen, welche bereits aus dem statischen Plattformmanagement bekannt sind einzufĂŒhren.
Diese Arbeit stellt ein ĂŒbergelagertes Netzwerk vor, welches Ăbertragungen mit
Hilfe von Arbitrierungseinheiten, den so genannten Resource Managern (RMs),
synchronisiert. Dieses ĂŒberlagerte Netzwerk ermöglicht eine globale und lasterhaltende
Steuerung. Die Beschreibung verschiedener Ressourcenzuweisungstrategien
wird ergÀnzt durch ein Protokolldesign und Methoden zur Verifikation der
adaptiven NoC Steuerung mit unterschiedlichen QoS Anforderungen und Verkehrsklassen.
HierfĂŒr wird eine formale Worst Case Timing Analyse prĂ€sentiert,
welche das vorgestellte Verfahren abbildet. Die Resultate bestÀtitgen, dass die prÀsentierte
Lösung nicht nur eine höhere Performance in der Simulation bietet, sondern
auch formal kleinere Worst-Case Latenzen fĂŒr realistische Systemauslastungen
als andere Strategien garantiert.
Der vorgestellte Ansatz ist nicht auf eine bestimmte Netzwerkarchitektur oder
Topologie beschrĂ€nkt, da der Mechanismus keine Ănderungen an den unterliegenden
Routern erfordert und kann daher zusammen mit bestehenden Manycore-Systemen
eingesetzt werden. Die Evaluierung erfolgte auf Basis eines leistungsoptimierten
Router-Designs sowie zwei auf Echtzeit-Anwendungen fokusierten Platformen.
Die Ergebnisse bestÀtigten, dass der vorgeschlagene Ansatz im Durchschnitt
eine deutlich höhere Leistung in der Simulation und AusfĂŒhrung liefert
A framework for the dynamic management of Peer-to-Peer overlays
Peer-to-Peer (P2P) applications have been associated with inefficient operation, interference with other network services and large operational costs for network providers. This thesis presents a framework which can help ISPs address these issues by means of intelligent management of peer behaviour. The proposed approach involves limited control of P2P overlays without interfering with the fundamental characteristics of peer autonomy and decentralised operation.
At the core of the management framework lays the Active Virtual Peer (AVP). Essentially intelligent peers operated by the network providers, the AVPs interact with the overlay from within, minimising redundant or inefficient traffic, enhancing overlay stability and facilitating the efficient and balanced use of available peer and network resources. They offer an âinsiderâsâ view of the overlay and permit the management of P2P functions in a compatible and non-intrusive manner. AVPs can support multiple P2P protocols and coordinate to perform functions collectively.
To account for the multi-faceted nature of P2P applications and allow the incorporation of modern techniques and protocols as they appear, the framework is based on a modular architecture. Core modules for overlay control and transit traffic minimisation are presented. Towards the latter, a number of suitable P2P content caching strategies are proposed.
Using a purpose-built P2P network simulator and small-scale experiments, it is demonstrated that the introduction of AVPs inside the network can significantly reduce inter-AS traffic, minimise costly multi-hop flows, increase overlay stability and load-balancing and offer improved peer transfer performance