9,354 research outputs found
Communication costs in a multi-tiered MPSoC
The amount of digital processing required for phased array beamformers is very large. It requires many parallel processors, which can be organized in a multi-tiered structure. Communication costs differ for each of the stages in such an architecture. For example, communication costs from the antenna front-end to the first processing stages is costly because of the amount of connections and data rate. Furthermore there is a trade-off between sequential processing exploiting locality of reference versus exploiting parallelism but adding communication costs. Thus, the optimal architecture depends on the importance that is given to the different measures.\ud
\ud
A model is presented to determine the partitioning of a (beamforming) system based on communication costs. It is shown that different solutions can be explored based on the cost model and the incorporated quantitative and qualitative measures. Determining the importance of each measure is subjective to the situation and application. In this work a simple beamforming application is used optimised for energy efficiency
Parallel Toolkit for Measuring the Quality of Network Community Structure
Many networks display community structure which identifies groups of nodes
within which connections are denser than between them. Detecting and
characterizing such community structure, which is known as community detection,
is one of the fundamental issues in the study of network systems. It has
received a considerable attention in the last years. Numerous techniques have
been developed for both efficient and effective community detection. Among
them, the most efficient algorithm is the label propagation algorithm whose
computational complexity is O(|E|). Although it is linear in the number of
edges, the running time is still too long for very large networks, creating the
need for parallel community detection. Also, computing community quality
metrics for community structure is computationally expensive both with and
without ground truth. However, to date we are not aware of any effort to
introduce parallelism for this problem. In this paper, we provide a parallel
toolkit to calculate the values of such metrics. We evaluate the parallel
algorithms on both distributed memory machine and shared memory machine. The
experimental results show that they yield a significant performance gain over
sequential execution in terms of total running time, speedup, and efficiency.Comment: 8 pages; in Network Intelligence Conference (ENIC), 2014 Europea
Partitioning problems in parallel, pipelined and distributed computing
The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest
The communication processor of TUMULT-64
Tumult (Twente University MULTi-processor system) is a modular extendible multi-processor system designed and implemented at the Twente University of Technology in co-operation with Oce Nederland B.V. and the Dr. Neher Laboratories (Dutch PTT). Characteristics of the hardware are: MIMD type, distributed memory, message passing, high performance, real-time and fault tolerant. A distributed real-time operating system has been realized, consisting of a multi-tasking kernel per node, inter process communication via typed messages and a distributed file system. In this paper first a brief description of the system is given, after that the architecture of the communication processor will be discussed. Reduction of the communication overhead due to message passing will be emphasized.\ud
\u
EbbRT: Elastic Building Block Runtime - case studies
We present a new systems runtime, EbbRT, for cloud hosted applications. EbbRT takes a different approach to the role operating systems play in cloud computing. It supports stitching application functionality across nodes running commodity OSs and nodes running specialized application specific software that only execute what is necessary to accelerate core functions of the application. In doing so, it allows tradeoffs between efficiency, developer productivity, and exploitation of elasticity and scale. EbbRT, as a software model, is a framework for constructing applications as collections of standard application software and Elastic Building Blocks (Ebbs). Elastic Building Blocks are components that encapsulate runtime software objects and are implemented to exploit the raw access, scale and elasticity of IaaS resources to accelerate critical application functionality. This paper presents the EbbRT architecture, our prototype and experimental evaluation of the prototype under three different application scenarios
GraphLab: A New Framework for Parallel Machine Learning
Designing and implementing efficient, provably correct parallel machine
learning (ML) algorithms is challenging. Existing high-level parallel
abstractions like MapReduce are insufficiently expressive while low-level tools
like MPI and Pthreads leave ML experts repeatedly solving the same design
challenges. By targeting common patterns in ML, we developed GraphLab, which
improves upon abstractions like MapReduce by compactly expressing asynchronous
iterative algorithms with sparse computational dependencies while ensuring data
consistency and achieving a high degree of parallel performance. We demonstrate
the expressiveness of the GraphLab framework by designing and implementing
parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and
Compressed Sensing. We show that using GraphLab we can achieve excellent
parallel performance on large scale real-world problems
COMSAT Laboratories' on-board baseband switch development
Work performed at COMSAT Laboratories to develop a prototype on-board baseband switch is summarized. The switch design is modular to accommodate different service types, and the architecture features a high-speed optical ring operating at 1 Gbit/s to route input (up-link) channels to output (down-link) channels. The switch is inherently a packet switch, but can process either circuit-switched or packet-switched traffic. If the traffic arrives at the satellite in a circuit-switched mode, the input processor packetizes it and passes it on to the switch. The main advantage of the packet approach lies in its simplified control structure. Details of the switch architecture and design, and the status of its implementation, are presented
- …