Search CORE

9,354 research outputs found

Communication costs in a multi-tiered MPSoC

Author: Burgwal Marcel D. van de
Smit Gerard J.M.
Publication venue: STW Technology Foundation
Publication date: 01/01/2008
Field of study

The amount of digital processing required for phased array beamformers is very large. It requires many parallel processors, which can be organized in a multi-tiered structure. Communication costs differ for each of the stages in such an architecture. For example, communication costs from the antenna front-end to the first processing stages is costly because of the amount of connections and data rate. Furthermore there is a trade-off between sequential processing exploiting locality of reference versus exploiting parallelism but adding communication costs. Thus, the optimal architecture depends on the importance that is given to the different measures.\ud \ud A model is presented to determine the partitioning of a (beamforming) system based on communication costs. It is shown that different solutions can be explored based on the cost model and the incorporated quantitative and qualitative measures. Determining the importance of each measure is subjective to the situation and application. In this work a simple beamforming application is used optimised for energy efficiency

CiteSeerX

University of Twente Research Information

Parallel Toolkit for Measuring the Quality of Network Community Structure

Author: Chen Mingming
Liu Sisi
Szymanski Boleslaw K.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Many networks display community structure which identifies groups of nodes within which connections are denser than between them. Detecting and characterizing such community structure, which is known as community detection, is one of the fundamental issues in the study of network systems. It has received a considerable attention in the last years. Numerous techniques have been developed for both efficient and effective community detection. Among them, the most efficient algorithm is the label propagation algorithm whose computational complexity is O(|E|). Although it is linear in the number of edges, the running time is still too long for very large networks, creating the need for parallel community detection. Also, computing community quality metrics for community structure is computationally expensive both with and without ground truth. However, to date we are not aware of any effort to introduce parallelism for this problem. In this paper, we provide a parallel toolkit to calculate the values of such metrics. We evaluate the parallel algorithms on both distributed memory machine and shared memory machine. The experimental results show that they yield a significant performance gain over sequential execution in terms of total running time, speedup, and efficiency.Comment: 8 pages; in Network Intelligence Conference (ENIC), 2014 Europea

arXiv.org e-Print Archive

CiteSeerX

Crossref

Partitioning problems in parallel, pipelined and distributed computing

Author: Bokhari S.
Publication venue
Publication date
Field of study

The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest

NASA Technical Reports Server

The communication processor of TUMULT-64

Author: Jansen Pierre G.
Smit Gerard J.M.
Publication venue: North-Holland
Publication date: 01/01/1988
Field of study

Tumult (Twente University MULTi-processor system) is a modular extendible multi-processor system designed and implemented at the Twente University of Technology in co-operation with Oce Nederland B.V. and the Dr. Neher Laboratories (Dutch PTT). Characteristics of the hardware are: MIMD type, distributed memory, message passing, high performance, real-time and fault tolerant. A distributed real-time operating system has been realized, consisting of a multi-tasking kernel per node, inter process communication via typed messages and a distributed file system. In this paper first a brief description of the system is given, after that the architecture of the communication processor will be discussed. Reduction of the communication overhead due to message passing will be emphasized.\ud \u

University of Twente Research Information

EbbRT: Elastic Building Block Runtime - case studies

Author: Appavoo Jonathan
Cadden James
Krieger Orran
Schatzberg Dan
Publication venue: Computer Science Department, Boston University
Publication date: 01/05/2015
Field of study

We present a new systems runtime, EbbRT, for cloud hosted applications. EbbRT takes a different approach to the role operating systems play in cloud computing. It supports stitching application functionality across nodes running commodity OSs and nodes running specialized application specific software that only execute what is necessary to accelerate core functions of the application. In doing so, it allows tradeoffs between efficiency, developer productivity, and exploitation of elasticity and scale. EbbRT, as a software model, is a framework for constructing applications as collections of standard application software and Elastic Building Blocks (Ebbs). Elastic Building Blocks are components that encapsulate runtime software objects and are implemented to exploit the raw access, scale and elasticity of IaaS resources to accelerate critical application functionality. This paper presents the EbbRT architecture, our prototype and experimental evaluation of the prototype under three different application scenarios

Boston University Institutional Repository (OpenBU)

GraphLab: A New Framework for Parallel Machine Learning

Author: Bickson Danny
Gonzalez Joseph
Guestrin Carlos
Hellerstein Joseph M.
Kyrola Aapo
Low Yucheng
Publication venue
Publication date: 01/01/2010
Field of study

Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on large scale real-world problems

arXiv.org e-Print Archive

CiteSeerX

COMSAT Laboratories' on-board baseband switch development

Author: Inukai Thomas
Paul D. K.
Pontano B. A.
Razdan R.
Redman W. A.
Publication venue
Publication date
Field of study

Work performed at COMSAT Laboratories to develop a prototype on-board baseband switch is summarized. The switch design is modular to accommodate different service types, and the architecture features a high-speed optical ring operating at 1 Gbit/s to route input (up-link) channels to output (down-link) channels. The switch is inherently a packet switch, but can process either circuit-switched or packet-switched traffic. If the traffic arrives at the satellite in a circuit-switched mode, the input processor packetizes it and passes it on to the switch. The main advantage of the packet approach lies in its simplified control structure. Details of the switch architecture and design, and the status of its implementation, are presented

NASA Technical Reports Server