Search CORE

3 research outputs found

Software Performance Modeling in PC Clusters

Author: Baer Wolfgang
Decato Steve
Publication venue: Naval Postgraduate School, Monterey CA
Publication date: 01/09/2000
Field of study

Execution of course grain parallel programs in PC clusters promises super-computer performance in low cost hardware environments. However the overhead associated with data distribution, synchronization, and peripheral access can easily eliminate any performance gain promised by the individual cluster capacity. Application specific system performance analysis is required both to engineer PC cluster hardware and evaluate the cost effectiveness of parallelizing software components. This paper presents a distributed system performance model and software analysis methodology suitable for estimating the execution times of large grain parallel application programs in clusters of PC hardware. The performance model emphasizes the use of application hardware performance results readily available in most systems. These are combined with single thread application software resource requirements in order to estimate the achievable execution rates in target clusters. A case study of the analysis of a video realistic battlefield simulator implementation in a PC cluster running under Linux is presented. Benchmark results and performance estimates for specific candidate hardware configurations are calculated and compared with actual results

Calhoun, Institutional Archive of the Naval Postgraduate School

Efficient And Scalable Evaluation Of Continuous, Spatio-temporal Queries In Mobile Computing Environments

Author: Cazalas Jonathan M
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2012
Field of study

A variety of research exists for the processing of continuous queries in large, mobile environments. Each method tries, in its own way, to address the computational bottleneck of constantly processing so many queries. For this research, we present a two-pronged approach at addressing this problem. Firstly, we introduce an efficient and scalable system for monitoring traditional, continuous queries by leveraging the parallel processing capability of the Graphics Processing Unit. We examine a naive CPU-based solution for continuous range-monitoring queries, and we then extend this system using the GPU. Additionally, with mobile communication devices becoming commodity, location-based services will become ubiquitous. To cope with the very high intensity of location-based queries, we propose a view oriented approach of the location database, thereby reducing computation costs by exploiting computation sharing amongst queries requiring the same view. Our studies show that by exploiting the parallel processing power of the GPU, we are able to significantly scale the number of mobile objects, while maintaining an acceptable level of performance. Our second approach was to view this research problem as one belonging to the domain of data streams. Several works have convincingly argued that the two research fields of spatiotemporal data streams and the management of moving objects can naturally come together. [IlMI10, ChFr03, MoXA04] For example, the output of a GPS receiver, monitoring the position of a mobile object, is viewed as a data stream of location updates. This data stream of location updates, along with those from the plausibly many other mobile objects, is received at a centralized server, which processes the streams upon arrival, effectively updating the answers to the currently active queries in real time. iv For this second approach, we present GEDS, a scalable, Graphics Processing Unit (GPU)-based framework for the evaluation of continuous spatio-temporal queries over spatiotemporal data streams. Specifically, GEDS employs the computation sharing and parallel processing paradigms to deliver scalability in the evaluation of continuous, spatio-temporal range queries and continuous, spatio-temporal kNN queries. The GEDS framework utilizes the parallel processing capability of the GPU, a stream processor by trade, to handle the computation required in this application. Experimental evaluation shows promising performance and shows the scalability and efficacy of GEDS in spatio-temporal data streaming environments. Additional performance studies demonstrate that, even in light of the costs associated with memory transfers, the parallel processing power provided by GEDS clearly counters and outweighs any associated costs. Finally, in an effort to move beyond the analysis of specific algorithms over the GEDS framework, we take a broader approach in our analysis of GPU computing. What algorithms are appropriate for the GPU? What types of applications can benefit from the parallel and stream processing power of the GPU? And can we identify a class of algorithms that are best suited for GPU computing? To answer these questions, we develop an abstract performance model, detailing the relationship between the CPU and the GPU. From this model, we are able to extrapolate a list of attributes common to successful GPU-based applications, thereby providing insight into which algorithms and applications are best suited for the GPU and also providing an estimated theoretical speedup for said GPU-based application

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Semi-empirical Multiprocessor Performance Predictions

Author: Lin Sun
Xiaodong Zhang
Zhichen Xu
Publication venue
Publication date: 01/01/1995
Field of study

This paper presents a multiprocessor performance prediction methodology supported by experimental measurements, which predicts the execution time of large application programs on large parallel architectures based on a small set of sample data. We propose a graph model to describe application program behavior. In order to precisely abstract an architecture model for the prediction, important and implicit architecture parameters are obtained by experiments. We focus on performance predictions of application programs in shared-memory and data-parallel architectures. Real world applications are implemented using the shared-memory model on the KSR-1 and using the data-parallel model on the CM-5 for performance measurements and prediction validation. We show that experimental measurements provide strong support for performance predictions on multiprocessors with implicit communications and complex memory systems, such as shared-memory and data-parallel systems, while analytical te..

CiteSeerX