Search CORE

58,484 research outputs found

Proposal for an architecture for TUMULT based on a serial data link

Author: Hofstede J.
Scholten J.
Smit G.J.M.
Publication venue: North-Holland
Publication date: 01/01/1987
Field of study

University of Twente Research Information

How Much Can D2D Communication Reduce Content Delivery Latency in Fog Networks with Edge Caching?

Author: Karasik Roy
Shamai Shlomo
Simeone Osvaldo
Publication venue
Publication date: 02/04/2019
Field of study

A Fog-Radio Access Network (F-RAN) is studied in which cache-enabled Edge Nodes (ENs) with dedicated fronthaul connections to the cloud aim at delivering contents to mobile users. Using an information-theoretic approach, this work tackles the problem of quantifying the potential latency reduction that can be obtained by enabling Device-to-Device (D2D) communication over out-of-band broadcast links. Following prior work, the Normalized Delivery Time (NDT) --- a metric that captures the high signal-to-noise ratio worst-case latency --- is adopted as the performance criterion of interest. Joint edge caching, downlink transmission, and D2D communication policies based on compress-and-forward are proposed that are shown to be information-theoretically optimal to within a constant multiplicative factor of two for all values of the problem parameters, and to achieve the minimum NDT for a number of special cases. The analysis provides insights on the role of D2D cooperation in improving the delivery latency.Comment: Submitted to the IEEE Transactions on Communication

arXiv.org e-Print Archive

King's Research Portal

The status of US Teraflops-scale projects

Author: Arsenin
Robert D. Mawhinney
Publication venue: 'Elsevier BV'
Publication date: 01/01/1994
Field of study

The current status of United States projects pursuing Teraflops-scale computing resources for lattice field theory is discussed. Two projects are in existence at this time: the Multidisciplinary Teraflops Project, incorporating the physicists of the QCD Teraflops Collaboration, and a smaller project, centered at Columbia, involving the design and construction of a 0.8 Teraflops computer primarily for QCD.Comment: Contribution to Lattice 94. 7 pages. Latex source followed by compressed, uuenocded postscript file of the complete paper. Individual figures available from [email protected]

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

A 90 nm CMOS 16 Gb/s Transceiver for Optical Interconnects

Author: Emami-Neyestanak Azita
Horowitz Mark
Palermo Samuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2007
Field of study

Interconnect architectures which leverage high-bandwidth optical channels offer a promising solution to address the increasing chip-to-chip I/O bandwidth demands. This paper describes a dense, high-speed, and low-power CMOS optical interconnect transceiver architecture. Vertical-cavity surface-emitting laser (VCSEL) data rate is extended for a given average current and corresponding reliability level with a four-tap current summing FIR transmitter. A low-voltage integrating and double-sampling optical receiver front-end provides adequate sensitivity in a power efficient manner by avoiding linear high-gain elements common in conventional transimpedance-amplifier (TIA) receivers. Clock recovery is performed with a dual-loop architecture which employs baud-rate phase detection and feedback interpolation to achieve reduced power consumption, while high-precision phase spacing is ensured at both the transmitter and receiver through adjustable delay clock buffers. A prototype chip fabricated in 1 V 90 nm CMOS achieves 16 Gb/s operation while consuming 129 mW and occupying 0.105 mm^2

Crossref

Caltech Authors

Stencils and problem partitionings: Their influence on the performance of multiple processor systems

Author: Adams L. M.
Patrick M. L.
Reed D. A.
Publication venue
Publication date
Field of study

Given a discretization stencil, partitioning the problem domain is an important first step for the efficient solution of partial differential equations on multiple processor systems. Partitions are derived that minimize interprocessor communication when the number of processors is known a priori and each domain partition is assigned to a different processor. This partitioning technique uses the stencil structure to select appropriate partition shapes. For square problem domains, it is shown that non-standard partitions (e.g., hexagons) are frequently preferable to the standard square partitions for a variety of commonly used stencils. This investigation is concluded with a formalization of the relationship between partition shape, stencil structure, and architecture, allowing selection of optimal partitions for a variety of parallel systems

NASA Technical Reports Server

NaNet: a Low-Latency, Real-Time, Multi-Standard Network Interface Card with GPUDirect Features

Author: Ameli F.
Ammendola R.
Biagioni A.
Frezza O.
Lamanna G.
Lo Cicero F.
Lonardo A.
Martinelli M.
Nicolau C.
Paolucci P.S.
Pastorelli E.
Pontisso L.
Rossetti D.
Simeone F.
Simula F.
Sozzi M.
Tosoratto L.
Vicini P.
Publication venue
Publication date: 13/06/2014
Field of study

While the GPGPU paradigm is widely recognized as an effective approach to high performance computing, its adoption in low-latency, real-time systems is still in its early stages. Although GPUs typically show deterministic behaviour in terms of latency in executing computational kernels as soon as data is available in their internal memories, assessment of real-time features of a standard GPGPU system needs careful characterization of all subsystems along data stream path. The networking subsystem results in being the most critical one in terms of absolute value and fluctuations of its response latency. Our envisioned solution to this issue is NaNet, a FPGA-based PCIe Network Interface Card (NIC) design featuring a configurable and extensible set of network channels with direct access through GPUDirect to NVIDIA Fermi/Kepler GPU memories. NaNet design currently supports both standard - GbE (1000BASE-T) and 10GbE (10Base-R) - and custom - 34~Gbps APElink and 2.5~Gbps deterministic latency KM3link - channels, but its modularity allows for a straightforward inclusion of other link technologies. To avoid host OS intervention on data stream and remove a possible source of jitter, the design includes a network/transport layer offload module with cycle-accurate, upper-bound latency, supporting UDP, KM3link Time Division Multiplexing and APElink protocols. After NaNet architecture description and its latency/bandwidth characterization for all supported links, two real world use cases will be presented: the GPU-based low level trigger for the RICH detector in the NA62 experiment at CERN and the on-/off-shore data link for KM3 underwater neutrino telescope

arXiv.org e-Print Archive

CERN Document Server

Serialized Asynchronous Links for NoC

Author: Al-Hashimi Bashir
Benini Luca
D'Alessandro Crescenzo
Ogg Simon
Valli Enrico
Yakovlev Alex
Publication venue
Publication date: 01/01/2008
Field of study

This paper proposes an asynchronous serialized link for NoC that can achieve the same levels of performance in terms of flits per second as a synchronous link but with a reduced number of wires in the point to point switch links and reduced power consumption. This is achieved by employing serialization in the asynchronous domain as opposed to synchronous to facilitate the removal of global clocking on the serial links. Based on transistor level simulations using 0.12 ?m foundry models it has been shown that it is possible to achieve the same level of performance as synchronous but with 75% reduction in wires and 65% reduction in power for a 300 MFlit/s link with 8 buffers with a switch clock speed of 300 MHz. Furthermore the paper presents the design requirements arising from interfacing switches of synchronous NoC and asynchronous serial links

CiteSeerX

Southampton (e-Prints Soton)

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna