17 research outputs found
Recommended from our members
Design and Optimization of Networks-on-Chip for Future Heterogeneous Systems-on-Chip
Due to the tight power budget and reduced time-to-market, Systems-on-Chip (SoC) have emerged as a power-efficient solution that provides the functionality required by target applications in embedded systems. To support a diverse set of applications such as real-time video/audio processing and sensor signal processing, SoCs consist of multiple heterogeneous components, such as software processors, digital signal processors, and application-specific hardware accelerators. These components offer different flexibility, power, and performance values so that SoCs can be designed by mix-and-matching them.
With the increased amount of heterogeneous cores, however, the traditional interconnects in an SoC exhibit excessive power dissipation and poor performance scalability. As an alternative, Networks-on-Chip (NoC) have been proposed. NoCs provide modularity at design-time because
communications among the cores are isolated from their computations via standard interfaces. NoCs also exploit communication parallelism at run-time because multiple data can be transferred simultaneously.
In order to construct an efficient NoC, the communication behaviors of various heterogeneous components in an SoC must be considered with the large amount of NoC design parameters. Therefore, providing an efficient NoC design and optimization framework is critical to reduce the design
cycle and address the complexity of future heterogeneous SoCs. This is the thesis of my dissertation.
Some existing design automation tools for NoCs support very limited degrees of automation that cannot satisfy the requirements of future heterogeneous SoCs. First, these tools only support a limited number of NoC design parameters. Second, they do not provide an integrated environment for software-hardware co-development.
Thus, I propose FINDNOC, an integrated framework for the generation, optimization, and validation of NoCs for future heterogeneous SoCs. The proposed framework supports software-hardware co-development, incremental NoC design-decision model, SystemC-based NoC customization and generation, and fast system protyping with FPGA emulations.
Virtual channels (VC) and multiple physical (MP) networks are the two main alternative methods to provide better performance, support quality-of-service, and avoid protocol deadlocks in packet-switched NoC design. To examine the effect of using VCs and MPs with other NoC architectural
parameters, I completed a comprehensive comparative analysis that combines an analytical model, synthesis-based designs for both FPGAs and standard-cell libraries, and system-level simulations.
Based on the results of this analysis, I developed VENTTI, a design and simulation environment that combines a virtual platform (VP), a NoC synthesis tool, and four NoC models characterized at different abstraction levels. VENTTI facilitates an incremental decision-making process with four
NoC abstraction models associated with different NoC parameters. The selected NoC parameters can be validated by running simulations with the corresponding model instantiated in the VP.
I augmented this framework to complete FINDNOC by implementing ICON, a NoC generation and customization tool that dynamically combines and customizes synthesizable SystemC components from a predesigned library. Thanks to its flexibility and automatic network interface generation
capabilities, ICON can generate a rich variety of NoCs that can be then integrated into any Embedded Scalable Platform (ESP) architectures for fast prototying with FPGA emulations.
I designed FINDNOC in a modular way that makes it easy to augmenting it with new capabilities. This, combined with the continuous progress of the ESP design methodology, will provide a seamless SoC integration framework, where the hardware accelerators, software applications, and
NoCs can be designed, validated, and integrated simultaneously, in order to reduce the design cycle of future SoC platforms
High-Speed Message Routing Mechanisms for Massively Parallel Computers
現在超並列処理システム(MPP)は、伝統的なベクトルプロセッサやSIMDマシンの
牙城であった多くの分野に進出している。これらのシステムは、入手が容易な高性能
CPUの急激な進歩をうまく利用し、これらを数百~数千個接続して均質なマルチプ
ロセッサのシステムとして構成したものである。しかし、これらのシステムの性能は、
現実の問題を解くときは必ずしも良くなく、常に公称の最高性能にははるかに及ばな
いのが現状である。これらのシステムではプロセッサ間の通信はすべて相互結合網に
よって行われるので、実現可能な最高性能を決める決定的な要素は相互結合網と、そ
れに使われる通信機構である。
本論文ではMPPの相互結合網に使われる、効率的な通信機構を実現する2つの方法
を提案する。第1は「特急ルータ」の提案であり、これを相互結合網に用いた場合の
適合性を検註する。特急ルータは多重の単方向レジスタ挿入パスを利用して、時間
空間混合分割型ネットワークを実現するためのものである。異なる基数や次元数につ
いて、特急ルータのスイッチ回路とバッファ回路の性能を予測するための正確なモデ
ルを開発した。この結果、特急ルータは効率的な通信を行うためのすべての条件を満
足していることが確かめられた。さらに重要な点は、特急ルータはネットワークに故
障のある場合や、通信が錯綜する場合にも、低遅延時間、高スループットを損なわな
い経路制御が行えることである。シミュレーションによって評価した特急ルータのの
性能は、これまでに発表された固定経路選択方式のルータより優れており、また他の
適応経路制御方式のルータに比べても、同程度あるいはそれを越えていることが確か
められた。
第2は経路長制限方式のマルチキャスト通信の提案である。マルチキャスト通信は
多くの並列処理問題において速度向上に寄与する通信方式である。そこでワームホー
ル通信方式において問題となるマルチキャスト通信におけるデッドロックの問題につ
いて研究した。そしてこの問題を解決する方法として経路長制限方式のマルチキャス
ト通信を提案し、この方式による通信性能をシミュレーションによって評価し、ユニ
キャスト方式やマルチパス方式によるマルチキャスト通信の性能と比較した。その結
果、提案する経路長制限方式のマルチキャスト通信は、パリヤ同期のためのクラスタ
へのマルチキャスト通信や、最近傍ノードへのマルチキャストや全ノードへの放送の
場合に、特に優れた解決法となることを明らかにした
High-Speed Message Routing Mechanisms for Massively Parallel Computers
現在超並列処理システム(MPP)は、伝統的なベクトルプロセッサやSIMDマシンの
牙城であった多くの分野に進出している。これらのシステムは、入手が容易な高性能
CPUの急激な進歩をうまく利用し、これらを数百~数千個接続して均質なマルチプ
ロセッサのシステムとして構成したものである。しかし、これらのシステムの性能は、
現実の問題を解くときは必ずしも良くなく、常に公称の最高性能にははるかに及ばな
いのが現状である。これらのシステムではプロセッサ間の通信はすべて相互結合網に
よって行われるので、実現可能な最高性能を決める決定的な要素は相互結合網と、そ
れに使われる通信機構である。
本論文ではMPPの相互結合網に使われる、効率的な通信機構を実現する2つの方法
を提案する。第1は「特急ルータ」の提案であり、これを相互結合網に用いた場合の
適合性を検註する。特急ルータは多重の単方向レジスタ挿入パスを利用して、時間
空間混合分割型ネットワークを実現するためのものである。異なる基数や次元数につ
いて、特急ルータのスイッチ回路とバッファ回路の性能を予測するための正確なモデ
ルを開発した。この結果、特急ルータは効率的な通信を行うためのすべての条件を満
足していることが確かめられた。さらに重要な点は、特急ルータはネットワークに故
障のある場合や、通信が錯綜する場合にも、低遅延時間、高スループットを損なわな
い経路制御が行えることである。シミュレーションによって評価した特急ルータのの
性能は、これまでに発表された固定経路選択方式のルータより優れており、また他の
適応経路制御方式のルータに比べても、同程度あるいはそれを越えていることが確か
められた。
第2は経路長制限方式のマルチキャスト通信の提案である。マルチキャスト通信は
多くの並列処理問題において速度向上に寄与する通信方式である。そこでワームホー
ル通信方式において問題となるマルチキャスト通信におけるデッドロックの問題につ
いて研究した。そしてこの問題を解決する方法として経路長制限方式のマルチキャス
ト通信を提案し、この方式による通信性能をシミュレーションによって評価し、ユニ
キャスト方式やマルチパス方式によるマルチキャスト通信の性能と比較した。その結
果、提案する経路長制限方式のマルチキャスト通信は、パリヤ同期のためのクラスタ
へのマルチキャスト通信や、最近傍ノードへのマルチキャストや全ノードへの放送の
場合に、特に優れた解決法となることを明らかにした
Projective networks : topologies for large parallel computer systems
The interconnection network comprises a significant portion of the cost of large parallel computers, both in economic terms and power consumption. Several previous proposals exploit large-radix routers to build scalable low-distance topologies with the aim of minimizing these costs. However, they fail to consider potential unbalance in the network utilization, which in some cases results in suboptimal designs. Based on an appropriate cost model, this paper advocates the use of networks based on incidence graphs of
projective planes, broadly denoted as Projective Networks. Projective Networks rely on generalized Moore graphs with uniform link utilization and encompass several proposed direct (PN and demi-PN) and indirect (OFT) topologies under a common mathematical framework. Compared to other proposals with average distance between 2 and 3 hops, these networks provide very high scalability while preserving a balanced network utilization, resulting in low network costs
Design and Validation of Network-on-Chip Architectures for the Next Generation of Multi-synchronous, Reliable, and Reconfigurable Embedded Systems
NETWORK-ON-CHIP (NoC) design is today at a crossroad. On one hand, the
design principles to efficiently implement interconnection networks in the
resource-constrained on-chip setting have stabilized. On the other hand,
the requirements on embedded system design are far from stabilizing. Embedded
systems are composed by assembling together heterogeneous components featuring
differentiated operating speeds and ad-hoc counter measures must be adopted
to bridge frequency domains. Moreover, an unmistakable trend toward enhanced
reconfigurability is clearly underway due to the increasing complexity of applications.
At the same time, the technology effect is manyfold since it provides unprecedented
levels of system integration but it also brings new severe constraints
to the forefront: power budget restrictions, overheating concerns, circuit delay and
power variability, permanent fault, increased probability of transient faults.
Supporting different degrees of reconfigurability and flexibility in the parallel
hardware platform cannot be however achieved with the incremental evolution of
current design techniques, but requires a disruptive approach and a major increase
in complexity. In addition, new reliability challenges cannot be solved by using
traditional fault tolerance techniques alone but the reliability approach must be
also part of the overall reconfiguration methodology.
In this thesis we take on the challenge of engineering a NoC architectures for
the next generation systems and we provide design methods able to overcome the
conventional way of implementing multi-synchronous, reliable and reconfigurable
NoC. Our analysis is not only limited to research novel approaches to the specific
challenges of the NoC architecture but we also co-design the solutions in a single
integrated framework. Interdependencies between different NoC features are
detected ahead of time and we finally avoid the engineering of highly optimized solutions
to specific problems that however coexist inefficiently together in the final
NoC architecture. To conclude, a silicon implementation by means of a testchip
tape-out and a prototype on a FPGA board validate the feasibility and effectivenes
Many-core architectures with time predictable execution Support for hard real-time applications
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 183-193).Hybrid control systems are a growing domain of application. They are pervasive and their complexity is increasing rapidly. Distributed control systems for future "Intelligent Grid" and renewable energy generation systems are demanding high-performance, hard real-time computation, and more programmability. General-purpose computer systems are primarily designed to process data and not to interact with physical processes as required by these systems. Generic general-purpose architectures even with the use of real-time operating systems fail to meet the hard realtime constraints of hybrid system dynamics. ASIC, FPGA, or traditional embedded design approaches to these systems often result in expensive, complicated systems that are hard to program, reuse, or maintain. In this thesis, we propose a domain-specific architecture template targeting hybrid control system applications. Using power electronics control applications, we present new modeling techniques, synthesis methodologies, and a parameterizable computer architecture for these large distributed control systems. We propose a new system modeling approach, called Adaptive Hybrid Automaton, based on previous work in control system theory, that uses a mixed-model abstractions and lends itself well to digital processing. We develop a domain-specific architecture based on this modeling that uses heterogeneous processing units and predictable execution, called MARTHA. We develop a hard real-time aware router architecture to enable deterministic on-chip interconnect network communication. We present several algorithms for scheduling task-based applications onto these types of heterogeneous architectures. We create Heracles, an open-source, functional, parameterized, synthesizable many-core system design toolkit, that can be used to explore future multi/many-core processors with different topologies, routing schemes, processing elements or cores, and memory system organizations. Using the Heracles design tool we build a prototype of the proposed architecture using a state-of-the-art FPGA-based platform, and deploy and test it in actual physical power electronics systems. We develop and release an open-source, small representative set of power electronics system applications that can be used for hard real-time application benchmarking.by Michel A. Kinsy.Ph.D
Topology Agnostic Methods for Routing, Reconfiguration and Virtualization of Interconnection Networks
Modern computing systems, such as supercomputers, data centers and multicore chips, generally require efficient communication between their different system units; tolerance towards component faults; flexibility to expand or merge; and a high utilization of their resources. Interconnection networks are used in a variety of such computing systems in order to enable communication between their diverse system units.
Investigation and proposal of new or improved solutions to topology agnostic routing and reconfiguration of interconnection networks are main objectives of this thesis. In addition, topology agnostic routing and reconfiguration algorithms are utilized in the development of new and flexible approaches to processor allocation. The thesis aims to present versatile solutions that can be used for the interconnection networks of a number of different computing systems.
No particular routing algorithm was specified for an interconnection network technology which is now incorporated in Dolphin Express. The thesis states a set of criteria for a suitable routing algorithm, evaluates a number of existing routing algorithms, and recommend that one of the algorithms – which fulfils all of the criteria – is used. Further investigations demonstrate how this routing algorithm inherently supports fault-tolerance, and how it can be optimized for some network topologies. These considerations are also relevant for the InfiniBand interconnection network technology.
Reconfiguration of interconnection networks (change of routing function) is a deadlock prone process. Some existing reconfiguration strategies include deadlock avoidance mechanisms that significantly reduce the network service offered to running applications. The thesis expands the area of application for one of the most versatile and efficient reconfiguration algorithms available in the literature, and proposes an optimization of this algorithm that improves the network service offered to running applications. Moreover, a new reconfiguration algorithm is presented that supports a replacement of the routing function without causing performance penalties.
Processor allocation strategies that guarantee traffic-containment commonly pose strict requirements on the shape of partitions, and thus achieve only a limited utilization of a system’s computing resources. The thesis introduces two new approaches that are more flexible. Both approaches utilize the properties of a topology agnostic routing algorithm in order to enforce traffic-containment within arbitrarily shaped partitions. Consequently, a high resource utilization as well as isolation of traffic between different partitions is achieved