Search CORE

22 research outputs found

Heterogeneous Secure Multi-level Remote Acceleration Service for Low-Power Integrated Systems and Devices

Author: Fernández Carles
Hong Cheol-Ho
Kosta Sokol
López Lara
Mavroidis Iakovos
Montella Raffaele
Nieto Francisco Javier
Velivassaki Terpsichori-Helen
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

AbstractThis position paper presents a novel heterogeneous CPU-GPU multi-level cloud acceleration focusing on applications running on embedded systems found on low-power devices. A runtime system performs energy and performance estimations in order to automatically select local CPU-based and GPU-based tasks that should be seamlessly executed on more powerful remote devices or cloud infrastructures. Moreover, it proposes, for the first time, a secure unified model where almost any device or infrastructure can operate as an accelerated entity and/or as an accelerator serving other less powerful devices in a secure way

Queen's University Belfast Research Portal

Archivio della ricerca - Università degli studi di Napoli "Parthenope"

Elsevier - Publisher Connector

Crossref

VBN

Archivio della ricerca- Università di Roma La Sapienza

for the Vector-IRAM chip

Author: Iakovos Mavroidis
Iakovos Mavroidis
Iakovos Mavroidis
Iakovos Mavroidis Report
M. Rabaey
Professor Jan
Publication venue
Publication date
Field of study

Vector IRAM integrates vector processing with embedded DRAM on a single chip to provide high multimedia performance at low energy cost. This report presents the design and the implementation of the VIRAM Vector Register File. Our design successfully faces many challenges such as the need for speed, low power consumption, compact design and multiported access. Using a 0.18m technology and a 1.3 Volts supply voltage, it operates at 200 MHz, consumes an average power of 330 mW and 8 mm of area, and provides eight read and three write ports. A number of available CAD tools were used, including layout tools from CADENCE, extraction tools from Avant!, hspice, timemill and powermill. This report gives emphasis on implementation issues and evaluates the performance and power consumption of our design

CiteSeerX

Accelerating Emulation and Providing Full Chip Observability and Controllability

Author: Iakovos Mavroidis
Ioannis Mavroidis
Ioannis Papaefstathiou
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Novel techniques for hardware / software partitioning and emulation

Author: Mavroidis Iakovos
Μαυροειδής Ιάκωβος
Publication venue: 'National Documentation Centre (EKT)'
Publication date: 01/01/2011
Field of study

Over the last several years, uniprocessor systems, in an effort to overcome the limits of deeperpipelining, instruction-level parallelism and power dissipation, evolved from one processing coreto tens or hundreds of cores. At the same time, multi-chip systems and Systems on Board (SoB),have started giving their place to Systems on Chip (SoC) that exploit the latest nanometertechnologies. This has also caused a tremendous shift in the system development process towardsembedded systems, hardware/software co-design, SoC designs, multi-core designs, and hardwareaccelerators. Nowadays, one of the key issues for continued performance scaling is thedevelopment of advanced CAD tools that can efficiently support the design and verification ofthese new platforms and the requirements of today’s complex applications. This thesis focuses on three important aspects of the system development process: hardware/software partitioning, simulation and verification. Since the time consumed in those tasks is usually a large percentage of the overall development time, speeding them up can significantly reduce the ever important time to market. Hardware emulation on FPGAs has been widely used as a significantly faster and moreaccurate approach for the verification of complex designs than software simulation. In this approach, Hardware Simulation Accelerator and Emulator co-processor units are used to offloadcalculation-intensive tasks from software simulators. One of the biggest problems however is thatthe communication overhead between the software simulator, where the behavioral testbenchusually runs, and the hardware emulator where the Design Under Test (DUT) is emulated, isbecoming a new critical bottleneck. Another problem is that in a hardware emulation environmentit is impossible to bring outside of the chip a large number of internal signals for verificationpurposes. Therefore, on-chip observability has become a significant issue. Finally, one more crucial issue is the decision that has to be made on how to partition the system components into two distinct sets: those that will be implemented in hardware and those that will run in software. Inthis thesis we analyze all the aforementioned problems and propose novel techniques that can beused to attack them. First, we introduce a novel emulation framework that automatically transforms certain HDL parts of the testbench into synthesizable code in order to offload them from the software simulator and, more importantly, minimize the aforementioned communication overhead. In particular, we partition the testbench running on the software simulator into two sections: the testbench HDL code that communicates directly with the DUT and the rest, C-like, testbench code. The former section is transformed into synthesizable code while the latter runs in a general purpose CPU. Next, we extend this architecture by adding multiple fast scan-chain paths in the design in order to provide full circuit observability and controllability on the fly. Finally, we develop a fullyautomated hardware/software partitioning tool that incorporates a novel flow with new costmetrics and functions to provide fast and efficient solutions. The tool employs two separatepartitioning algorithms; Simulated Annealing (SA) and a novel greedy algorithm, the GroupingMapping Partitioning (GMP). Our experiments demonstrate that our methodologies provide cost-effective solutions for the hardware/software partitioning and emulation of large and complex systems

Hellenic National Archive of Doctoral Dissertations

Wormhole IP over (Connectionless) ATM

Author: Georgios Glykopoulos
Iakovos Mavroidis
Ioannis Mavroidis
Manolis Katevenis
Publication venue
Publication date
Field of study

ABSTRACT: In the eighties, high throughput and low latency requirements in multiprocessor interconnection networks led to wormhole routing. Today, the same techniques are applicable to routing internet packets over ATM hardware athigh speed. Just like virtual channels in wormhole routing carry packets segmented into flits, a number of hardware-managed VC’s in ATM can carry IP packets segmented into cells according to AAL-5; each VC is dedicated to one packet for the duration of that packet, and is afterwards reassigned to another packet, in hardware. This idea was introduced by Barnett [Barn97] and was named connectionless ATM. We modify the Barnett proposal to make it applicable to existing ATMequipment: we propose a single-input, single-output Wormhole IP Router, that functions as a VP/VC translation filter between ATM subnetworks; fast IP routing lookups can be as in [GuLK98]. Based on actual internet traces, we show by simulation that a few tens of hardware-managed VC’s per outgoing VP suffice for all but 10 −4 or less of the packets. We ana-lyze the hardware cost of a wormhole IP routing filter, and show that it can be built at low cost: 10 off-the-shelf chips will do for 622 Mb/s operation; using pipelining, oper-ation is feasible even at 10 Gb/s, today

CiteSeerX

Multilingual Extensions to DIENST

Author: Hariklia Tsalapata
Iakovos Mavroidis
Sarantos Kapidakis
Publication venue
Publication date
Field of study

Digital libraries enable on-line access of information and provide advanced methods for material search, retrieval, and presentation. In order to support collections of documents written in several languages and to increase the applicability of digital libraries in non-english speaking countries a multilingual digital library design is necessary that supports the native languages of the users. Issues that must be taken into account in a multilingual design include limitations on the use of more than one character sets concurrently and the availability (or lack of) of metadata in languages other than english. Furthermore, the desired display language of each piece of information depends on the languages that each individual user can understand, the languages in which the documents and their metadata are available, and the locally available resources (fonts). DIENST 1 is a digital library search tool developed at Cornell University. This report describes our work on extending DIENST to..

CiteSeerX

Wormhole IP over (Connectionless) ATM

Author: Eva Kalyvianaki
Georgios Glykopoulos
Georgios Sapountzis
Iakovos Mavroidis
Ioannis Mavroidis
Manolis Katevenis
Publication venue
Publication date
Field of study

Abstract− − High speed switches and routers internally operate using fixed-size cells or segments; variable-size packets are segmented and later reassembled. Connectionless ATMwas proposed to quickly carry IP packets segmented into cells (AAL5) using a number of hardware-managed ATM VC’s. We show that this is analogous to wormhole routing. We modify this architecture to make it applicable to existing ATM equipment: we propose a low-cost, singleinput, single-output Wormhole IP Router that functions as a VP/VC translation filter between ATM subnetworks. When compared to IP routers, the proposed architecture features simpler hardware and lower latency. When compared to software-based IP-over-ATM techniques, the new architecture avoids the overheads of a large number of labels, or the delays of establishing new flows in software after the first few packets have suffered considerable latencies. We simulated a wormhole IP routing filter, showing that a few tens of hardware-managed VC’s per outgoing VP usually suffice. We built and successfully tested a prototype, operating at 2 × 155 Mbps, using one FPGA and DRAM. Simple analysis shows that operation at 10 Gbps and beyond is feasible today. Index Terms− − IP over ATM, connectionless ATM, wormhole routing, gigabit router, wormhole IP, routing filter. 1

CiteSeerX

ECOSCALE: Reconfigurable Computing and Runtime System for Future Exascale Systems

Author: Coppola Marcelo
Goodacre John
Koch Dirk
Lavagno Luciano
Mavroidis Iakovos
Nikolopoulos Dimitrios
Palomino Manuel
Papaefstathiou Ioannis
Papaefstathiou Vassilis
Sourdis Ioannis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

In order to reach exascale performance, current HPC systems need to be improved. Simple hardware scaling is not a feasible solution due to the increasing utility costs and power consumption limitations. Apart from improvements in implementation technology, what is needed is to refine the HPC application development flow as well as the system architecture of future HPC systems. ECOSCALE tackles these challenges by proposing a scalable programming environment and architecture, aiming to substantially reduce energy consumption as well as data traffic and latency. ECOSCALE introduces a novel heterogeneous energy-efficient hierarchical architecture, as well as a hybrid many-core+OpenCL programming environment and runtime system. The ECOSCALE approach is hierarchical and is expected to scale well by partitioning the physical system into multiple independent Workers (i.e. compute nodes). Workers are interconnected in a tree-like fashion and define a contiguous global address space that can be viewed either as a set of partitions in a Partitioned Global Address Space (PGAS), or as a set of nodes hierarchically interconnected via an MPI protocol. To further increase energy efficiency, as well as to provide resilience, the Workers employ reconfigurable accelerators mapped into the virtual address space utilizing a dual stage System Memory Management Unit with coherent memory access. The architecture supports shared partitioned reconfigurable resources accessed by any Worker in a PGAS partition, as well as automated hardware synthesis of these resources from an OpenCL-based programming model

Queen's University Belfast Research Portal

Crossref

Chalmers Research

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

The University of Manchester - Institutional Repository

Chalmers Publication Library

PORTO Publications Open Repository TOrino