Search CORE

9,095 research outputs found

Vector support for multicore processors with major emphasis on configurable multiprocessors

Author: Yang Hongyan
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2008
Field of study

It recently became increasingly difficult to build higher speed uniprocessor chips because of performance degradation and high power consumption. The quadratically increasing circuit complexity forbade the exploration of more instruction-level parallelism (JLP). To continue raising the performance, processor designers then focused on thread-level parallelism (TLP) to realize a new architecture design paradigm. Multicore processor design is the result of this trend. It has proven quite capable in performance increase and provides new opportunities in power management and system scalability. But current multicore processors do not provide powerful vector architecture support which could yield significant speedups for array operations while maintaining arealpower efficiency. This dissertation proposes and presents the realization of an FPGA-based prototype of a multicore architecture with a shared vector unit (MCwSV). FPGA stands for Filed-Programmable Gate Array. The idea is that rather than improving only scalar or TLP performance, some hardware budget could be used to realize a vector unit to greatly speedup applications abundant in data-level parallelism (DLP). To be realistic, limited by the parallelism in the application itself and by the compiler\u27s vectorizing abilities, most of the general-purpose programs can only be partially vectorized. Thus, for efficient resource usage, one vector unit should be shared by several scalar processors. This approach could also keep the overall budget within acceptable limits. We suggest that this type of vector-unit sharing be established in future multicore chips. The design, implementation and evaluation of an MCwSV system with two scalar processors and a shared vector unit are presented for FPGA prototyping. The MicroBlaze processor, which is a commercial IP (Intellectual Property) core from Xilinx, is used as the scalar processor; in the experiments the vector unit is connected to a pair of MicroBlaze processors through standard bus interfaces. The overall system is organized in a decoupled and multi-banked structure. This organization provides substantial system scalability and better vector performance. For a given area budget, benchmarks from several areas show that the MCwSV system can provide significant performance increase as compared to a multicore system without a vector unit. However, a MCwSV system with two MicroBlazes and a shared vector unit is not always an optimized system configuration for various applications with different percentages of vectorization. On the other hand, the MCwSV framework was designed for easy scalability to potentially incorporate various numbers of scalar/vector units and various function units. Also, the flexibility inherent to FPGAs can aid the task of matching target applications. These benefits can be taken into account to create optimized MCwSV systems for various applications. So the work eventually focused on building an architecture design framework incorporating performance and resource management for application-specific MCwSV (AS-MCwSV) systems. For embedded system design, resource usage, power consumption and execution latency are three metrics to be used in design tradeoffs. The product of these metrics is used here to choose the MCwSV system with the smallest value

Digital Commons @ New Jersey Institute of Technology (NJIT)

Coarse-grained reconfigurable array architectures

Author: A Lambrechts
B Bougard
B Bougard
B Mei
B Mei
B Mei
B Sutter De
G Venkataramani
H Park
H Park
J Lee
JMP Cardoso
JW Waerdt van de
K Berkel van
K Bondalapati
K Sankaralingam
KE Coons
LH Lee
M Ahn
M Gebhart
M Schlansker
M Taylor
M Woh
MD Galanis
MH Lee
S Friedman
SA Mahlke
T Oh
Y Kim
Y Kim
Y Kim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Coarse-Grained Reconﬁgurable Array (CGRA) architectures accelerate the same inner loops that beneﬁt from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efﬁciently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on ﬂexibility, performance, and power-efﬁciency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual ﬁne-tuning of source code

Crossref

Ghent University Academic Bibliography

Instruction-set architecture synthesis for VLIW processors

Author: Jordans R.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2015
Field of study

Repository TU/e

Pure OAI Repository

Decentralized Patent System

Author: Helman Lital
Publication venue: Scholarly Commons @ UNLV Boyd Law
Publication date: 01/01/2019
Field of study

bepress Legal Repository

Scholarly Commons @ UNLV Law

Crossref

Configurable data center switch architectures

Author: Gebara Nadeen
Publication venue: Computing, Imperial College London
Publication date: 01/03/2023
Field of study

In this thesis, we explore alternative architectures for implementing con_gurable Data Center Switches along with the advantages that can be provided by such switches. Our first contribution centers around determining switch architectures that can be implemented on Field Programmable Gate Array (FPGA) to provide configurable switching protocols. In the process, we identify a gap in the availability of frameworks to realistically evaluate the performance of switch architectures in data centers and contribute a simulation framework that relies on realistic data center traffic patterns. Our framework is then used to evaluate the performance of currently existing as well as newly proposed FPGA-amenable switch designs. Through collaborative work with Meng and Papaphilippou, we establish that only small-medium range switches can be implemented on today's FPGAs. Our second contribution is a novel switch architecture that integrates a custom in-network hardware accelerator with a generic switch to accelerate Deep Neural Network training applications in data centers. Our proposed accelerator architecture is prototyped on an FPGA, and a scalability study is conducted to demonstrate the trade-offs of an FPGA implementation when compared to an ASIC implementation. In addition to the hardware prototype, we contribute a light weight load-balancing and congestion control protocol that leverages the unique communication patterns of ML data-parallel jobs to enable fair sharing of network resources across different jobs. Our large-scale simulations demonstrate the ability of our novel switch architecture and light weight congestion control protocol to both accelerate the training time of machine learning jobs by up to 1.34x and benefit other latency-sensitive applications by reducing their 99%-tile completion time by up to 4.5x. As for our final contribution, we identify the main requirements of in-network applications and propose a Network-on-Chip (NoC)-based architecture for supporting a heterogeneous set of applications. Observing the lack of tools to support such research, we provide a tool that can be used to evaluate NoC-based switch architectures.Open Acces

Spiral - Imperial College Digital Repository

Desenvolvimento de um sistema de gestão técnica centralizado

Author: Ferreira João Miguel Soares
Publication venue
Publication date: 19/02/2021
Field of study

A building management system has user confort and comodity, as well as reduction of energy consumption, as its main goals. To accomplish this, it is necessary to integrate sensors and actuators as to control and retrieve information about the physical processes of a building. These processes include control over illumination and temperature of a room, and even access control. The information, after processed, allows a more intelligent and efficient way of controlling electronic and mechanical systems of a building, such as HVAC and illumination, while also trying to reduce energy expenditure. The emergence of IoT allowed to increment the number of low level devices on these systems, thanks to their cost reduction, increased performance and improved connectivity. To better make use of the new paradigm, it is required a modern system with multi-protocol capabilities, as well as tools for data processing and presentation. Therefore, the most relevant industrial and building automation technologies were studied, as to define a modern, IoT compatible, architecture and choose its constituting software platforms. InfluxDB, EdgeX Foundry and Node-Red were the selected technologies for the database, gateway and dashboard, respectively, as they closely align with the requirements set. This way, a demonstrator was developed in order to assess a systems’s operation, using these technologies, as well as to evaluate EdgeX’s performance for jitter and latency. From the obtained results, it was verified that, although versatile and complete, this platform underperforms for real-time applications and high reading rate workloads.Um Sistema de Gestão Centralizado tem por objetivo aumentar a comodidade e conforto dos utilizadores de um edifício, ao mesmo tempo que tenta reduzir os consumos energéticos do mesmo. Para isso, torna-se necessário integrar sensores e atuadores para controlar e recolher informação acerca dos processos físicos existentes. Nestes processos estão incluídos a iluminação e temperatura de, por exemplo, uma sala, ou até controlo de acesso. Esta informação, após processamento, permite, de uma maneira mais inteligente e eficiente, controlar os sistemas eletrónicos e mecânicos de um edifício, tais como os sistemas de AVAC ou iluminação, tentando, simultaneamente, diminuir gastos energéticos. O aparecimento do IoT, tornou possível o aumento do número de dispositivos de baixo nível nestes sistemas, graças à redução de custo e aumento de performance e conectividade que estes têm sofrido. Para melhor usufruir deste paradigma, é necessário um sistema moderno, com capacidade de conexão multi-protocolo e ferramentas para processamento e apresentação de informação. Neste sentido, fez-se um estudo das tecnologias mais relevantes da área da automação industrial e de edifícios, de modo a definir uma arquitetura moderna compatível com IoT e a escolher as plataformas de software que a constituem. InfluxDB, EdgeX Foundry e Node-Red foram as tecnologias escolhidas para a base de dados, gateway e dashboard, respetivamente, por serem as que mais se aproximaram dos requisitos definidos. Assim, foi desenvolvido um demonstrador que permitiu verificar o funcionamento de um sistema com a utilização destas tecnologias, assim como avaliar a performance da plataforma EdgeX em termos de jitter e latência. Verificou-se a partir dos resultados obtidos, que embora versátil e completa, esta plataforma ficou aquém do que se pretendia, tanto para aplicações real-time, como para as que necessitem de uma taxa de leitura de sensores elevada.Mestrado em Engenharia Eletrónica e Telecomunicaçõe

Repositório Institucional da Universidade de Aveiro

Banked microarchitectures for complexity-effective superscalar microprocessors

Author: Tseng Jessica Hui-Chun, 1977-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2006
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 95-99).High performance superscalar microarchitectures exploit instruction-level parallelism (ILP) to improve processor performance by executing instructions out of program order and by speculating on branch instructions. Monolithic centralized structures with global communications, including issue windows and register files, are used to buffer in-flight instructions and to maintain machine state. These structures scale poorly to greater issue widths and deeper pipelines, as they must support simultaneous global accesses from all active instructions. The lack of scalability is exacerbated in future technologies, which have increasing global interconnect delay and a much greater emphasis on reducing both switching and leakage power. However, these fully orthogonal structures are over-engineered for typical use. Banked microarchitectures that consist of multiple interleaved banks of fewer ported cells can significantly reduce power, area, and latency of these structures.(cont.) Although banked structures exhibit a minor performance penalty, significant reductions in delay and power can potentially be used to increase clock rate and lead to more complexity-effective designs. There are two main contributions in this thesis. First, a speculative control scheme is proposed to simplify the complicated control logic that is involved in managing a less-ported banked register file for high-frequency superscalar processors. Second, the RingScalar architecture, a complexity-effective out-of-order superscalar microarchitecture, based on a ring topology of banked structures, is introduced and evaluated.by Jessica Hui-Chun Tseng.Ph.D

DSpace@MIT

An analysis of domestic ferry safety and the pre-departure inspection enforcement in the Philippines

Author: Ong Jose Ronnie T, Jr.
Publication venue: The Maritime Commons: Digital Repository of the World Maritime University
Publication date: 31/10/2021
Field of study

World Maritime University

Rhode Island Report on the Judiciary 1979

Author
Publication venue: HELIN Digital Commons
Publication date: 01/01/1979
Field of study

This report describes progress and programs in all the state courts. Its articles briefly mention some of the efforts and achievements of our judges and court employees to better serve the people of Rhode Island and the interests of Justice

HELIN Digital Commons

Doctor of Philosophy

Author: Ramani Karthik
Publication venue: University of Utah
Publication date: 01/12/2012
Field of study

dissertationThe embedded system space is characterized by a rapid evolution in the complexity and functionality of applications. In addition, the short time-to-market nature of the business motivates the use of programmable devices capable of meeting the conflicting constraints of low-energy, high-performance, and short design times. The keys to achieving these conflicting constraints are specialization and maximally extracting available application parallelism. General purpose processors are flexible but are either too power hungry or lack the necessary performance. Application-specific integrated circuits (ASICS) efficiently meet the performance and power needs but are inflexible. Programmable domain-specific architectures (DSAs) are an attractive middle ground, but their design requires significant time, resources, and expertise in a variety of specialties, which range from application algorithms to architecture and ultimately, circuit design. This dissertation presents CoGenE, a design framework that automates the design of energy-performance-optimal DSAs for embedded systems. For a given application domain and a user-chosen initial architectural specification, CoGenE consists of a a Compiler to generate execution binary, a simulator Generator to collect performance/energy statistics, and an Explorer that modifies the current architecture to improve energy-performance-area characteristics. The above process repeats automatically until the user-specified constraints are achieved. This removes or alleviates the time needed to understand the application, manually design the DSA, and generate object code for the DSA. Thus, CoGenE is a new design methodology that represents a significant improvement in performance, energy dissipation, design time, and resources. This dissertation employs the face recognition domain to showcase a flexible architectural design methodology that creates "ASIC-like" DSAs. The DSAs are instruction set architecture (ISA)-independent and achieve good energy-performance characteristics by coscheduling the often conflicting constraints of data access, data movement, and computation through a flexible interconnect. This represents a significant increase in programming complexity and code generation time. To address this problem, the CoGenE compiler employs integer linear programming (ILP)-based 'interconnect-aware' scheduling techniques for automatic code generation. The CoGenE explorer employs an iterative technique to search the complete design space and select a set of energy-performance-optimal candidates. When compared to manual designs, results demonstrate that CoGenE produces superior designs for three application domains: face recognition, speech recognition and wireless telephony. While CoGenE is well suited to applications that exhibit a streaming behavior, multithreaded applications like ray tracing present a different but important challenge. To demonstrate its generality, CoGenE is evaluated in designing a novel multicore N-wide SIMD architecture, known as StreamRay, for the ray tracing domain. CoGenE is used to synthesize the SIMD execution cores, the compiler that generates the application binary, and the interconnection subsystem. Further, separating address and data computations in space reduces data movement and contention for resources, thereby significantly improving performance compared to existing ray tracing approaches

The University of Utah: J. Willard Marriott Digital Library