Search CORE

3,001 research outputs found

A grid-based infrastructure for distributed retrieval

Author: A. Guttman
D. Kossmann
D.C. Blair
F. Simeoni
I. Foster
J. Callan
J. Risson
J.P. Callan
J.P. Callan
L. Si
L. Si
M. Kobayashi
M. Stonebraker
P. Niblett
R.R. Larson
W. Sun
Y. Manolopoulos
Y.E. Ioannidis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

In large-scale distributed retrieval, challenges of latency, heterogeneity, and dynamicity emphasise the importance of infrastructural support in reducing the development costs of state-of-the-art solutions. We present a service-based infrastructure for distributed retrieval which blends middleware facilities and a design framework to ‘lift’ the resource sharing approach and the computational services of a European Grid platform into the domain of e-Science applications. In this paper, we give an overview of the DILIGENT Search Framework and illustrate its exploitation in the ﬁeld of Earth Science

Crossref

University of Strathclyde Institutional Repository

Parallel database operations in heterogeneous environments

Author: Mach Werner
Publication venue
Publication date: 01/01/2009
Field of study

Im Gegensatz zu dem traditionellen Begriff eines Supercomputers, der aus vielen mittels superschneller, lokaler Netzwerkverbindungen miteinander verbundenen Superrechnern besteht, basieren heterogene Computerumgebungen auf "kompletten" Computersystemen, die mit Hilfe eines herkömmlichen Netzwerkanschlusses an private oder öffentliche Netzwerke angeschlossen sind. Der Bereich des Computernetzwerkens hat sich über die letzten drei Jahrzehnte entwickelt und ist, wie viele andere Technologien, in bezug auf Performance, Funktionalität und Verlässlichkeit extrem gewachsen. Zu Beginn des 21.Jahrhunderts zählt das betriebssichere Hochgeschwindigkeitsnetz genauso zur Alltäglichkeit wie Elektrizität, und auch Rechnerressourcen sind, was Verfügbarkeit und universellen Gebrauch anbelangt, ebenso Standard wie elektrischer Strom. Wissenschafter haben für die Verwendung von heterogenen Grids bei verschiedenen rechenintensiven Applikationen eine Architektur von computational Grids konzipiert und darin Modelle aufgesetzt, die zum einen Rechenleistungen defnieren und zum anderen die komplexen Eigenschaften der Grid-Organisation vor den Benutzern verborgen halten. Somit wird die Verwendung für den Benutzer genauso einfach wie es möglich ist elektrischen Strom zu beziehen. Grundsätzlich existiert keine generell akzeptierte Definition für Grids. Einige Wissenschafter bezeichnen sie als hochleistungsfähige verteilte Umgebung. Manche berücksichtigen bei der Definierung auch die geographische Verteilung und ihre Multi-Domain-Eigenschaft. Andere Wissenschafter wiederum definieren Grids über die Anzahl der Ressourcen, die sie verbinden. Parallele Datenbanksysteme haben in den letzten zwei Jahrzehnten große Bedeutung erlangt, da das rechenintensive wissenschaftliche Arbeiten, wie z.B. auf dem Gebiet der Bioinformatik, Strömungslehre und Hochenergie physik die Verarbeitung riesiger verteilter Datensätze erfordert. Diese Tendenz resultierte daraus, dass man von der fehlgeschlagenen Entwicklung hochspezialisierter Datenbankmaschinen zur Verwendung herkömmlicher paralleler Hardware-Architekturen übergegangen ist. Grundsätzlich wird die gleichzeitige Abarbeitung entweder durch verteilte Datenbankoperationen oder durch Datenparallelität gelöst. Im ersten Fall wird ein unterteilter Abfragenabarbeitungsplan durch verschiedene Datenbankoperatoren parallel durchgeführt. Im Fall der Datenparallelität erfolgt eine Unterteilung der Daten, wobei mehrere Prozessoren die gleichen Operationen parallel an Teilen der Daten durchführen. Es liegen genaue Analysen von parallelen Datenbank-Arbeitsvorgängen für sequenzielle Prozessoren vor. Eine Reihe von Publikationen haben dieses Thema abgehandelt und dabei Vorschläge und Analysen für parallele Datenbankmaschinen erstellt. Bis dato existiert allerdings noch keine spezifische Analyse paralleler Algorithmen mit dem Fokus der speziellen Eigenschaften einer "Grid"-Infrastruktur. Der spezifische Unterschied liegt in der Heterogenität von Grid-Ressourcen. In "shared nothing"-Architekturen, wie man sie bei klassischen Supercomputern und Cluster- Systemen vorfindet, sind alle Ressourcen wie z.B. Verarbeitungsknoten, Festplatten und Netzwerkverbindungen angesichts ihrer Leistung, Zugriffszeit und Bandbreite üblicherweise gleich (homogen). Im Gegensatz dazu zeigen Grid-Architekturen heterogene Ressourcen mit verschiedenen Leistungseigenschaften. Der herausfordernde Aspekt dieser Arbeit bestand darin aufzuzeigen, wie man das Problem heterogener Ressourcen löst, d.h. diese Ressourcen einerseits zur Leistungsmaximierung und andererseits zur Definition von Algorithmen einsetzt, um die Arbeitsablauf-Orchestrierung von Datenbankprozessoren zu optimieren. Um dieser Herausforderung gerecht werden zu können, wurde ein mathematisches Modell zur Untersuchung des Leistungsverhaltens paralleler Datenbankoperationen in heterogenen Umgebungen, wie z.B. in Grids, basierend auf generalisierten Multiprozessor- Architekturen entwickelt. Es wurden dabei sowohl die Parameter und deren Einfluss auf die Leistung als auch das Verhalten der Algorithmen in heterogenen Umgebungen beobachtet. Dabei konnte man feststellen, dass kleine Anpassungen an den Algorithmen zur signifikanten Leistungsverbesserung heterogener Umgebungen führen. Weiters wurde eine graphische Darstellung der Knotenkonfiguration entwickelt und ein optimierter Algorithmus, mit dem ein optimaler Knoten zur Ausführung von Datenbankoperationen gefunden werden kann. Diese Ergebnisse zum neuen Algorithmus wurden durch die Implementierung in einer serviceorientierten Architektur (SODA) bestätigt. Durch diese Implementierung konnte die Gültigkeit des Modells und des neu entwickelten optimierten Algorithmus nachgewiesen werden. In dieser Arbeit werden auch die Möglichkeiten für eine brauchbare Erweiterung des vorgestellten Modells gezeigt, wie z.B. für den Einsatz von Leistungskennziffern für Algorithmen zur Findung optimaler Knoten, die Verlässlichkeit der Knoten oder Vorgehensweisen/Lösungsaufgaben zur dynamischen Optimierung von Arbeitsabläufen.In contrast to the traditional notion of a supercomputer, which has many processors connected by a local high-speed computer bus, heterogeneous computing environments rely on "complete" computer nodes (CPU, storage, network interface, etc.) connected to a private or public network by a conventional network interface. Computer networking has evolved over the past three decades, and, like many technologies, has grown exponentially in terms of performance, functionality and reliability. At the beginning of the twenty-first century, high-speed, highly reliable Internet connectivity has become as commonplace as electricity, and computing resources have become as standard in terms of availability and universal use as electrical power. To use heterogeneous Grids for various applications requiring high-processing power, researchers propose the notion of computational Grids where rules are defined relating to both services and hiding the complexity of the Grid organization from the users. Thus, users would find it as easy to use as electrical power. Generally, there is no widely accepted definition of Grids. Some researchers define it as a high-performance distributed environment. Some take into consideration its geographically distributed, multi-domain feature. Others define Grids based on the number of resources they unify. Parallel database systems gained an important role in database research over the past two decades due to the necessity of handling large distributed datasets for scientific computing such as bioinformatics, fluid dynamics and high energy physics (HEP). This was connected with the shift from the (actually failed) development of highly specialized database machines to the usage of conventional parallel hardware architectures. Generally, concurrent execution is employed either by database operator or data parallelism. The first is achieved through parallel execution of a partitioned query execution plan by different operators, while the latter is achieved through parallel execution of the same operation on the partitioned data among multiple processors. Parallel database operation algorithms have been well analyzed for sequential processors. A number of publications have covered this topic which proposed and analyzed these algorithms for parallel database machines. Until now, to the best knowledge of the author, no specific analysis has been done so far on parallel algorithms with a focus on the specific characteristics of a Grid infrastructure. The specific difference lies in the heterogeneous nature of Grid resources. In a "shared nothing architecture", which can be found in classical supercomputers and cluster systems, all resources such as processing nodes, disks and network interconnection have typically homogeneous characteristics as regards to performance, access time and bandwidth. In contrast, in a Grid architecture heterogeneous resources are found that show different performance characteristics. The challenge of this research is to discover the way how to cope with or to exploit this situation to maximize performance and to define algorithms that lead to a solution for an optimized workflow orchestration. To address this challenge, we developed a mathematical model to investigate the performance behavior of parallel database operations in heterogeneous environments, such as a Grid, based on generalized multiprocessor architecture. We also studied the parameters and their influence on the performance as well as the behavior of the algorithms in heterogeneous environments. We discovered that only a small adjustment on the algorithm is necessary to significantly improve the performance for heterogeneous environments. A graphical representation of the node configuration and an optimized algorithm for finding the optimal node configuration for the execution of the parallel binary merge sort have been developed. Finally, we have proved our findings of the new algorithm by implementing it on a service-orientated infrastructure (SODA). The model and our new developed modified algorithms have been verified with the implementation. We also give an outlook of useful extensions to our model e.g. using performance indices, reliability of the nodes and approaches for dynamic optimization of workflow

OTHES

Hierarchical and Adaptive Filter and Refinement Algorithms for Geometric Intersection Computations on GPU

Author: Liu Yiming
Publication venue: e-Publications@Marquette
Publication date: 01/04/2021
Field of study

Geometric intersection algorithms are fundamental in spatial analysis in Geographic Information System (GIS). This dissertation explores high performance computing solution for geometric intersection on a huge amount of spatial data using Graphics Processing Unit (GPU). We have developed a hierarchical filter and refinement system for parallel geometric intersection operations involving large polygons and polylines by extending the classical filter and refine algorithm using efficient filters that leverage GPU computing. The inputs are two layers of large polygonal datasets and the computations are spatial intersection on pairs of cross-layer polygons. These intersections are the compute-intensive spatial data analytic kernels in spatial join and map overlay operations in spatial databases and GIS. Efficient filters, such as PolySketch, PolySketch++ and Point-in-polygon filters have been developed to reduce refinement workload on GPUs. We also showed the application of such filters in speeding-up line segment intersections and point-in-polygon tests. Programming models like CUDA and OpenACC have been used to implement the different versions of the Hierarchical Filter and Refine (HiFiRe) system. Experimental results show good performance of our filter and refinement algorithms. Compared to standard R-tree filter, on average, our filter technique can still discard 76% of polygon pairs which do not have segment intersection points. PolySketch filter reduces on average 99.77% of the workload of finding line segment intersections. Compared to existing Common Minimum Bounding Rectangle (CMBR) filter that is applied on each cross-layer candidate pair, the workload after using PolySketch-based CMBR filter is on average 98% smaller. The execution time of our HiFiRe system on two shapefiles, namely USA Water Bodies (contains 464K polygons) and USA Block Group Boundaries (contains 220K polygons), is about 3.38 seconds using NVidia Titan V GPU

epublications@Marquette

High Performance Computing via High Level Synthesis

Author: Roozmeh Mehdi
Publication venue: Politecnico di Torino
Publication date
Field of study

As more and more powerful integrated circuits are appearing on the market, more and more applications, with very different requirements and workloads, are making use of the available computing power. This thesis is in particular devoted to High Performance Computing applications, where those trends are carried to the extreme. In this domain, the primary aspects to be taken into consideration are (1) performance (by definition) and (2) energy consumption (since operational costs dominate over procurement costs). These requirements can be satisfied more easily by deploying heterogeneous platforms, which include CPUs, GPUs and FPGAs to provide a broad range of performance and energy-per-operation choices. In particular, as we will see, FPGAs clearly dominate both CPUs and GPUs in terms of energy, and can provide comparable performance. An important aspect of this trend is of course design technology, because these applications were traditionally programmed in high-level languages, while FPGAs required low-level RTL design. The OpenCL (Open Computing Language) developed by the Khronos group enables developers to program CPU, GPU and recently FPGAs using functionally portable (but sadly not performance portable) source code which creates new possibilities and challenges both for research and industry. FPGAs have been always used for mid-size designs and ASIC prototyping thanks to their energy efficient and flexible hardware architecture, but their usage requires hardware design knowledge and laborious design cycles. Several approaches are developed and deployed to address this issue and shorten the gap between software and hardware in FPGA design flow, in order to enable FPGAs to capture a larger portion of the hardware acceleration market in data centers. Moreover, FPGAs usage in data centers is growing already, regardless of and in addition to their use as computational accelerators, because they can be used as high performance, low power and secure switches inside data-centers. High-Level Synthesis (HLS) is the methodology that enables designers to map their applications on FPGAs (and ASICs). It synthesizes parallel hardware from a model originally written C-based programming languages .e.g. C/C++, SystemC and OpenCL. Design space exploration of the variety of implementations that can be obtained from this C model is possible through wide range of optimization techniques and directives, e.g. to pipeline loops and partition memories into multiple banks, which guide RTL generation toward application dependent hardware and benefit designers from flexible parallel architecture of FPGAs. Model Based Design (MBD) is a high-level and visual process used to generate implementations that solve mathematical problems through a varied set of IP-blocks. MBD enables developers with different expertise, e.g. control theory, embedded software development, and hardware design to share a common design framework and contribute to a shared design using the same tool. Simulink, developed by MATLAB, is a model based design tool for simulation and development of complex dynamical systems. Moreover, Simulink embedded code generators can produce verified C/C++ and HDL code from the graphical model. This code can be used to program micro-controllers and FPGAs. This PhD thesis work presents a study using automatic code generator of Simulink to target Xilinx FPGAs using both HDL and C/C++ code to demonstrate capabilities and challenges of high-level synthesis process. To do so, firstly, digital signal processing unit of a real-time radar application is developed using Simulink blocks. Secondly, generated C based model was used for high level synthesis process and finally the implementation cost of HLS is compared to traditional HDL synthesis using Xilinx tool chain. Alternative to model based design approach, this work also presents an analysis on FPGA programming via high-level synthesis techniques for computationally intensive algorithms and demonstrates the importance of HLS by comparing performance-per-watt of GPUs(NVIDIA) and FPGAs(Xilinx) manufactured in the same node running standard OpenCL benchmarks. We conclude that generation of high quality RTL from OpenCL model requires stronger hardware background with respect to the MBD approach, however, the availability of a fast and broad design space exploration ability and portability of the OpenCL code, e.g. to CPUs and GPUs, motivates FPGA industry leaders to provide users with OpenCL software development environment which promises FPGA programming in CPU/GPU-like fashion. Our experiments, through extensive design space exploration(DSE), suggest that FPGAs have higher performance-per-watt with respect to two high-end GPUs manufactured in the same technology(28 nm). Moreover, FPGAs with more available resources and using a more modern process (20 nm) can outperform the tested GPUs while consuming much less power at the cost of more expensive devices

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Re-engineering jake2 to work on a grid using the GridGain Middleware

Author: Alexandre Ricardo Jorge da Cruz
Publication venue
Publication date: 01/01/2010
Field of study

With the advent of Massively Multiplayer Online Games (MMOGs), engineers and designers of games came across with many questions that needed to be answered such as, for example, "how to allow a large amount of clients to play simultaneously on the same server?", "how to guarantee a good quality of service (QoS) to a great number of clients?", "how many resources will be necessary?", "how to optimize these resources to the maximum?". A possible answer to these questions relies on the usage of grid computing. Taking into account the parallel and distributed nature of grid computing, we can say that grid computing allows for more scalability in terms of a growing number of players, guarantees shorter communication time between clients and servers, and allows for a better resource management and usage (e.g., memory, CPU, core balancing usage, etc.) than the traditional serial computing model. However, the main focus of this thesis is not about grid computing. Instead, this thesis describes the re-engineering process of an existing multiplayer computer game, called Jake2, by transforming it into a MMOG, which is then put to run on a grid

UBibliorum repositorio digital da ubi

Acceleration of Computational Geometry Algorithms for High Performance Computing Based Geo-Spatial Big Data Analysis

Author: Paudel Anmol
Publication venue: e-Publications@Marquette
Publication date: 01/04/2022
Field of study

Geo-Spatial computing and data analysis is the branch of computer science that deals with real world location-based data. Computational geometry algorithms are algorithms that process geometry/shapes and is one of the pillars of geo-spatial computing. Real world map and location-based data can be huge in size and the data structures used to process them extremely big leading to huge computational costs. Furthermore, Geo-Spatial datasets are growing on all V’s (Volume, Variety, Value, etc.) and are becoming larger and more complex to process in-turn demanding more computational resources. High Performance Computing is a way to breakdown the problem in ways that it can run in parallel on big computers with massive processing power and hence reduce the computing time delivering the same results but much faster.This dissertation explores different techniques to accelerate the processing of computational geometry algorithms and geo-spatial computing like using Many-core Graphics Processing Units (GPU), Multi-core Central Processing Units (CPU), Multi-node setup with Message Passing Interface (MPI), Cache optimizations, Memory and Communication optimizations, load balancing, Algorithmic Modifications, Directive based parallelization with OpenMP or OpenACC and Vectorization with compiler intrinsic (AVX). This dissertation has applied at least one of the mentioned techniques to the following problems. Novel method to parallelize plane sweep based geometric intersection for GPU with directives is presented. Parallelization of plane sweep based Voronoi construction, parallelization of Segment tree construction, Segment tree queries and Segment tree-based operations has been presented. Spatial autocorrelation, computation of getis-ord hotspots are also presented. Acceleration performance and speedup results are presented in each corresponding chapter

epublications@Marquette