Search CORE

425 research outputs found

Scalable Parallel Computers for Real-Time Signal Processing

Author: Hwang K
Xu Z
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

We assess the state-of-the-art technology in massively parallel processors (MPPs) and their variations in different architectural platforms. Architectural and programming issues are identified in using MPPs for time-critical applications such as adaptive radar signal processing. We review the enabling technologies. These include high-performance CPU chips and system interconnects, distributed memory architectures, and various latency hiding mechanisms. We characterize the concept of scalability in three areas: resources, applications, and technology. Scalable performance attributes are analytically defined. Then we compare MPPs with symmetric multiprocessors (SMPs) and clusters of workstations (COWs). The purpose is to reveal their capabilities, limits, and effectiveness in signal processing. We evaluate the IBM SP2 at MHPCC, the Intel Paragon at SDSC, the Gray T3D at Gray Eagan Center, and the Gray T3E and ASCI TeraFLOP system proposed by Intel. On the software and programming side, we evaluate existing parallel programming environments, including the models, languages, compilers, software tools, and operating systems. Some guidelines for program parallelization are provided. We examine data-parallel, shared-variable, message-passing, and implicit programming models. Communication functions and their performance overhead are discussed. Available software tools and communication libraries are also introducedpublished_or_final_versio

HKU Scholars Hub

Porting Decision Tree Algorithms to Multicore using FastFlow

Author: A.C. Sodan
I. Park
J.E. Gehrke
J.R. Quinlan
K. Asanovic
M. Aldinucci
M. Cole
M. Coppola
M. Joshi
M. Vanneschi
M. Zaki
M.K. Sreenivas
R. Jin
R.D. Blumofe
S. Ruggieri
S. Ruggieri
T. Lim
W. Thies
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an approach for easy-yet-efficient porting of an implementation of the C4.5 algorithm on multicores. The parallel porting requires minimal changes to the original sequential code, and it is able to exploit up to 7X speedup on an Intel dual-quad core machine.Comment: 18 pages + cove

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

UnipiEprints

Venice: Exploring Server Architectures for Effective Resource Sharing

Author: Cui Xiaosong
Dong Jianbo
Hou Rui
Huang Michael
Jiang Tao
McKee Sally A
Wang Haibin
Zhang Lixin
Zhao Boyan
Publication venue
Publication date: 01/01/2016
Field of study

Consolidated server racks are quickly becoming the backbone of IT infrastructure for science, engineering, and business, alike. These servers are still largely built and organized as when they were distributed, individual entities. Given that many fields increasingly rely on analytics of huge datasets, it makes sense to support flexible resource utilization across servers to improve cost-effectiveness and performance. We introduce Venice, a family of data-center server architectures that builds a strong communication substrate as a first-class resource for server chips. Venice provides a diverse set of resource-joining mechanisms that enables user programs to efficiently leverage non-local resources. To better understand the implications of design decisions about system support for resource sharing we have constructed a hardware prototype that allows us to more accurately measure end-to-end performance of at-scale applications and to explore tradeoffs among performance, power, and resource-sharing transparency. We present results from our initial studies analyzing these tradeoffs when sharing memory, accelerators, or NICs. We find that it is particularly important to reduce or hide latency, that data-sharing access patterns should match the features of the communication channels employed, and that inter-channel collaboration can be exploited for better performance

Chalmers Research

Chalmers Publication Library

Cluster-based interactive volume rendering with Simian

Author: Gribble Christiaan
Hansen Charles D.
Publication venue: University of Utah
Publication date: 03/09/2003
Field of study

technical reportCommodity-based computer clusters offer a cost-effective alternative to traditional largescale, tightly coupled computers as a means to provide high-performance computational and visualization services. The Center for the Simulation of Accidental Fires and Explosions (C-SAFE) at the University of Utah employs such a cluster, and we have begun to experiment with cluster-based visualization services. In particular, we seek to develop an interactive volume rendering tool for navigating and visualizing large-scale scientific datasets. Using Simian, an OpenGL volume renderer, we examine two approaches to cluster-based interactive volume rendering: (1) a ?cluster-aware? version of the application that makes explicit use of remote nodes through a message-passing interface, and (2) the unmodified application running atop the Chromium clustered rendering framework. This paper provides a detailed comparison of the two approaches by carefully considering the key issues that arise when parallelizing Simian. These issues include the richness of user interaction; the distribution of volumetric datasets and proxy geometry; and the degree of interactivity provided by the image rendering and compositing schemes. The results of each approach when visualizing two large-scale C-SAFE datasets are given, and we discuss the relative advantages and disadvantages that were considered when developing our cluster-based interactive volume rendering application

The University of Utah: J. Willard Marriott Digital Library

Fast scalable visualization techniques for interactive billion-particle walkthrough

Author: Liu Xinlian
Publication venue: LSU Digital Commons
Publication date: 01/01/2002
Field of study

This research develops a comprehensive framework for interactive walkthrough involving one billion particles in an immersive virtual environment to enable interrogative visualization of large atomistic simulation data. As a mixture of scientific and engineering approaches, the framework is based on four key techniques: adaptive data compression based on space-filling curves, octree-based visibility and occlusion culling, predictive caching based on machine learning, and scalable data reduction based on parallel and distributed processing. In terms of parallel rendering, this system combines functional parallelism, data parallelism, and temporal parallelism to improve interactivity. The visualization framework will be applicable not only to material simulation, but also to computational biology, applied mathematics, mechanical engineering, and nanotechnology, etc

Louisiana State University

Push-Pull Messaging: a high-performance communication mechanism for commodity SMP clusters

Author: Wang CL
Wong KP
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1999
Field of study

Push-Pull Messaging is a novel messaging mechanism for high-speed interprocess communication in a cluster of symmetric multi-processors (SMP) machines. This messaging mechanism exploits the parallelism in SMP nodes by allowing the execution of communication stages of a messaging event on different processors to achieve maximum performance. Push-Pull Messaging facilitates further improvement on communication performance by employing three optimizing techniques in our design: (1) Cross-Space Zero Buffer provides a unified buffer management mechanism to achieve a copy-less communication for the data transfer among processes within a SMP node. (2) Address Translation Overhead Masking removes the address translation overhead from the critical path in the internode communication. (3) Push-and-Acknowledge Overlapping overlaps the push and acknowledge phases to hide the acknowledge latency. Overall, Push-Pull Messaging effectively utilizes the system resources and improves the communication speed. It has been implemented to support high-speed communication for connecting quad Pentium Pro SMPs with 100 Mbit/s Fast Ethernet.published_or_final_versio

HKU Scholars Hub

Virtualisation and Thin Client : A Survey of Virtual Desktop environments

Author: Wall Tom
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2009
Field of study

This survey examines some of the leading commercial Virtualisation and Thin Client technologies. Reference is made to a number of academic research sources and to prominent industry specialists and commentators. A basic virtualisation Laboratory model is assembled to demonstrate fundamental Thin Client operations and to clarify potential problem areas

Arrow@TUDublin