Search CORE

1,235 research outputs found

Computer vision algorithms on reconfigurable logic arrays

Author: A.K. Jain
N.K. Ratha
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Future scenarios of parallel computing: Distributed sensor networks

Author: CANTONI VIRGINIO
Lombardi Paolo
LOMBARDI LUCA
Publication venue
Publication date: 01/01/2007
Field of study

Over the past few years, motivated by the accelerating technological convergence of sensing, computing and communications, there has been a growing interest in potential and technological challenges of Wireless Sensor Network. This paper will introduce a wide range of current basic research lines dealing with ad hoc networks of spatially distributed systems, data rate requirements and constraints, real-time fusion and registration of data from distributed sensors, cooperative control, hypothesis generation, and network consensus filtering. This technical domain has matured to the point where a number of industrial products and systems have appeared. The presentation will also describe the state of the art regarding current and soon-to-appear applications

Archivio Istituzionale della Ricerca - Università degli Studi di Pavia

Modeling Algorithm Performance on Highly-threaded Many-core Architectures

Author: Ma Lin
Publication venue: Washington University Open Scholarship
Publication date: 15/12/2014
Field of study

The rapid growth of data processing required in various arenas of computation over the past decades necessitates extensive use of parallel computing engines. Among those, highly-threaded many-core machines, such as GPUs have become increasingly popular for accelerating a diverse range of data-intensive applications. They feature a large number of hardware threads with low-overhead context switches to hide the memory access latencies and therefore provide high computational throughput. However, understanding and harnessing such machines places great challenges on algorithm designers and performance tuners due to the complex interaction of threads and hierarchical memory subsystems of these machines. The achieved performance jointly depends on the parallelism exploited by the algorithm, the effectiveness of latency hiding, and the utilization of multiprocessors (occupancy). Contemporary work tries to model the performance of GPUs from various aspects with different emphasis and granularity. However, no model considers all of these factors together at the same time. This dissertation presents an analytical framework that jointly addresses parallelism, latency-hiding, and occupancy for both theoretical and empirical performance analysis of algorithms on highly-threaded many-core machines so that it can guide both algorithm design and performance tuning. In particular, this framework not only helps to explore and reduce the runtime configuration space for tuning kernel execution on GPUs, but also reflects performance bottlenecks and predicts how the runtime will trend as the problem and other parameters scale. The framework consists of a pair of analytical models with one focusing on higher-level asymptotic algorithm performance on GPUs and the other one emphasizing lower-level details about scheduling and runtime configuration. Based on the two models, we have conducted extensive analysis of a large set of algorithms. Two analysis provides interesting results and explains previously unexplained data. In addition, the two models are further bridged and combined as a consistent framework. The framework is able to provide an end-to-end methodology for algorithm design, evaluation, comparison, implementation, and prediction of real runtime on GPUs fairly accurately. To demonstrate the viability of our methods, the models are validated through data from implementations of a variety of classic algorithms, including hashing, Bloom filters, all-pairs shortest path, matrix multiplication, FFT, merge sort, list ranking, string matching via suffix tree/array, etc. We evaluate the models\u27 performance across a wide spectrum of parameters, data values, and machines. The results indicate that the models can be effectively used for algorithm performance analysis and runtime prediction on highly-threaded many-core machines

Washington University St. Louis: Open Scholarship

An Architecture for distributed multimedia database systems

Author: Berra P. B.
Chen C.Y.R.
Ghafoor A.
Lin C. C.
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1990
Field of study

In the past few years considerable demand for user oriented multimedia information systems has developed. These systems must provide a rich set of functionality so that new, complex, and interesting applications can be addressed. This places considerable importance on the management of diverse data types including text, images, audio and video. These requirements generate the need for a new generation of distributed heterogeneous multimedia database systems. In this paper we identify a set of functional requirements for a multimedia server considering database management, object synchronization and integration, and multimedia query processing. A generalization of the requirements to a distributed system is presented, and some of our current research and developing activities are discussed

Syracuse University Research Facility and Collaborative Environment

Parallel database operations in heterogeneous environments

Author: Mach Werner
Publication venue
Publication date: 01/01/2009
Field of study

Im Gegensatz zu dem traditionellen Begriff eines Supercomputers, der aus vielen mittels superschneller, lokaler Netzwerkverbindungen miteinander verbundenen Superrechnern besteht, basieren heterogene Computerumgebungen auf "kompletten" Computersystemen, die mit Hilfe eines herkömmlichen Netzwerkanschlusses an private oder öffentliche Netzwerke angeschlossen sind. Der Bereich des Computernetzwerkens hat sich über die letzten drei Jahrzehnte entwickelt und ist, wie viele andere Technologien, in bezug auf Performance, Funktionalität und Verlässlichkeit extrem gewachsen. Zu Beginn des 21.Jahrhunderts zählt das betriebssichere Hochgeschwindigkeitsnetz genauso zur Alltäglichkeit wie Elektrizität, und auch Rechnerressourcen sind, was Verfügbarkeit und universellen Gebrauch anbelangt, ebenso Standard wie elektrischer Strom. Wissenschafter haben für die Verwendung von heterogenen Grids bei verschiedenen rechenintensiven Applikationen eine Architektur von computational Grids konzipiert und darin Modelle aufgesetzt, die zum einen Rechenleistungen defnieren und zum anderen die komplexen Eigenschaften der Grid-Organisation vor den Benutzern verborgen halten. Somit wird die Verwendung für den Benutzer genauso einfach wie es möglich ist elektrischen Strom zu beziehen. Grundsätzlich existiert keine generell akzeptierte Definition für Grids. Einige Wissenschafter bezeichnen sie als hochleistungsfähige verteilte Umgebung. Manche berücksichtigen bei der Definierung auch die geographische Verteilung und ihre Multi-Domain-Eigenschaft. Andere Wissenschafter wiederum definieren Grids über die Anzahl der Ressourcen, die sie verbinden. Parallele Datenbanksysteme haben in den letzten zwei Jahrzehnten große Bedeutung erlangt, da das rechenintensive wissenschaftliche Arbeiten, wie z.B. auf dem Gebiet der Bioinformatik, Strömungslehre und Hochenergie physik die Verarbeitung riesiger verteilter Datensätze erfordert. Diese Tendenz resultierte daraus, dass man von der fehlgeschlagenen Entwicklung hochspezialisierter Datenbankmaschinen zur Verwendung herkömmlicher paralleler Hardware-Architekturen übergegangen ist. Grundsätzlich wird die gleichzeitige Abarbeitung entweder durch verteilte Datenbankoperationen oder durch Datenparallelität gelöst. Im ersten Fall wird ein unterteilter Abfragenabarbeitungsplan durch verschiedene Datenbankoperatoren parallel durchgeführt. Im Fall der Datenparallelität erfolgt eine Unterteilung der Daten, wobei mehrere Prozessoren die gleichen Operationen parallel an Teilen der Daten durchführen. Es liegen genaue Analysen von parallelen Datenbank-Arbeitsvorgängen für sequenzielle Prozessoren vor. Eine Reihe von Publikationen haben dieses Thema abgehandelt und dabei Vorschläge und Analysen für parallele Datenbankmaschinen erstellt. Bis dato existiert allerdings noch keine spezifische Analyse paralleler Algorithmen mit dem Fokus der speziellen Eigenschaften einer "Grid"-Infrastruktur. Der spezifische Unterschied liegt in der Heterogenität von Grid-Ressourcen. In "shared nothing"-Architekturen, wie man sie bei klassischen Supercomputern und Cluster- Systemen vorfindet, sind alle Ressourcen wie z.B. Verarbeitungsknoten, Festplatten und Netzwerkverbindungen angesichts ihrer Leistung, Zugriffszeit und Bandbreite üblicherweise gleich (homogen). Im Gegensatz dazu zeigen Grid-Architekturen heterogene Ressourcen mit verschiedenen Leistungseigenschaften. Der herausfordernde Aspekt dieser Arbeit bestand darin aufzuzeigen, wie man das Problem heterogener Ressourcen löst, d.h. diese Ressourcen einerseits zur Leistungsmaximierung und andererseits zur Definition von Algorithmen einsetzt, um die Arbeitsablauf-Orchestrierung von Datenbankprozessoren zu optimieren. Um dieser Herausforderung gerecht werden zu können, wurde ein mathematisches Modell zur Untersuchung des Leistungsverhaltens paralleler Datenbankoperationen in heterogenen Umgebungen, wie z.B. in Grids, basierend auf generalisierten Multiprozessor- Architekturen entwickelt. Es wurden dabei sowohl die Parameter und deren Einfluss auf die Leistung als auch das Verhalten der Algorithmen in heterogenen Umgebungen beobachtet. Dabei konnte man feststellen, dass kleine Anpassungen an den Algorithmen zur signifikanten Leistungsverbesserung heterogener Umgebungen führen. Weiters wurde eine graphische Darstellung der Knotenkonfiguration entwickelt und ein optimierter Algorithmus, mit dem ein optimaler Knoten zur Ausführung von Datenbankoperationen gefunden werden kann. Diese Ergebnisse zum neuen Algorithmus wurden durch die Implementierung in einer serviceorientierten Architektur (SODA) bestätigt. Durch diese Implementierung konnte die Gültigkeit des Modells und des neu entwickelten optimierten Algorithmus nachgewiesen werden. In dieser Arbeit werden auch die Möglichkeiten für eine brauchbare Erweiterung des vorgestellten Modells gezeigt, wie z.B. für den Einsatz von Leistungskennziffern für Algorithmen zur Findung optimaler Knoten, die Verlässlichkeit der Knoten oder Vorgehensweisen/Lösungsaufgaben zur dynamischen Optimierung von Arbeitsabläufen.In contrast to the traditional notion of a supercomputer, which has many processors connected by a local high-speed computer bus, heterogeneous computing environments rely on "complete" computer nodes (CPU, storage, network interface, etc.) connected to a private or public network by a conventional network interface. Computer networking has evolved over the past three decades, and, like many technologies, has grown exponentially in terms of performance, functionality and reliability. At the beginning of the twenty-first century, high-speed, highly reliable Internet connectivity has become as commonplace as electricity, and computing resources have become as standard in terms of availability and universal use as electrical power. To use heterogeneous Grids for various applications requiring high-processing power, researchers propose the notion of computational Grids where rules are defined relating to both services and hiding the complexity of the Grid organization from the users. Thus, users would find it as easy to use as electrical power. Generally, there is no widely accepted definition of Grids. Some researchers define it as a high-performance distributed environment. Some take into consideration its geographically distributed, multi-domain feature. Others define Grids based on the number of resources they unify. Parallel database systems gained an important role in database research over the past two decades due to the necessity of handling large distributed datasets for scientific computing such as bioinformatics, fluid dynamics and high energy physics (HEP). This was connected with the shift from the (actually failed) development of highly specialized database machines to the usage of conventional parallel hardware architectures. Generally, concurrent execution is employed either by database operator or data parallelism. The first is achieved through parallel execution of a partitioned query execution plan by different operators, while the latter is achieved through parallel execution of the same operation on the partitioned data among multiple processors. Parallel database operation algorithms have been well analyzed for sequential processors. A number of publications have covered this topic which proposed and analyzed these algorithms for parallel database machines. Until now, to the best knowledge of the author, no specific analysis has been done so far on parallel algorithms with a focus on the specific characteristics of a Grid infrastructure. The specific difference lies in the heterogeneous nature of Grid resources. In a "shared nothing architecture", which can be found in classical supercomputers and cluster systems, all resources such as processing nodes, disks and network interconnection have typically homogeneous characteristics as regards to performance, access time and bandwidth. In contrast, in a Grid architecture heterogeneous resources are found that show different performance characteristics. The challenge of this research is to discover the way how to cope with or to exploit this situation to maximize performance and to define algorithms that lead to a solution for an optimized workflow orchestration. To address this challenge, we developed a mathematical model to investigate the performance behavior of parallel database operations in heterogeneous environments, such as a Grid, based on generalized multiprocessor architecture. We also studied the parameters and their influence on the performance as well as the behavior of the algorithms in heterogeneous environments. We discovered that only a small adjustment on the algorithm is necessary to significantly improve the performance for heterogeneous environments. A graphical representation of the node configuration and an optimized algorithm for finding the optimal node configuration for the execution of the parallel binary merge sort have been developed. Finally, we have proved our findings of the new algorithm by implementing it on a service-orientated infrastructure (SODA). The model and our new developed modified algorithms have been verified with the implementation. We also give an outlook of useful extensions to our model e.g. using performance indices, reliability of the nodes and approaches for dynamic optimization of workflow

OTHES

Energy-Aware Data Management on NUMA Architectures

Author: Kissinger Thomas
Publication venue
Publication date: 23/03/2017
Field of study

The ever-increasing need for more computing and data processing power demands for a continuous and rapid growth of power-hungry data center capacities all over the world. As a first study in 2008 revealed, energy consumption of such data centers is becoming a critical problem, since their power consumption is about to double every 5 years. However, a recently (2016) released follow-up study points out that this threatening trend was dramatically throttled within the past years, due to the increased energy efficiency actions taken by data center operators. Furthermore, the authors of the study emphasize that making and keeping data centers energy-efficient is a continuous task, because more and more computing power is demanded from the same or an even lower energy budget, and that this threatening energy consumption trend will resume as soon as energy efficiency research efforts and its market adoption are reduced. An important class of applications running in data centers are data management systems, which are a fundamental component of nearly every application stack. While those systems were traditionally designed as disk-based databases that are optimized for keeping disk accesses as low a possible, modern state-of-the-art database systems are main memory-centric and store the entire data pool in the main memory, which replaces the disk as main bottleneck. To scale up such in-memory database systems, non-uniform memory access (NUMA) hardware architectures are employed that face a decreased bandwidth and an increased latency when accessing remote memory compared to the local memory. In this thesis, we investigate energy awareness aspects of large scale-up NUMA systems in the context of in-memory data management systems. To do so, we pick up the idea of a fine-grained data-oriented architecture and improve the concept in a way that it keeps pace with increased absolute performance numbers of a pure in-memory DBMS and scales up on NUMA systems in the large scale. To achieve this goal, we design and build ERIS, the first scale-up in-memory data management system that is designed from scratch to implement a data-oriented architecture. With the help of the ERIS platform, we explore our novel core concept for energy awareness, which is Energy Awareness by Adaptivity. The concept describes that software and especially database systems have to quickly respond to environmental changes (i.e., workload changes) by adapting themselves to enter a state of low energy consumption. We present the hierarchically organized Energy-Control Loop (ECL), which is a reactive control loop and provides two concrete implementations of our Energy Awareness by Adaptivity concept, namely the hardware-centric Resource Adaptivity and the software-centric Storage Adaptivity. Finally, we will give an exhaustive evaluation regarding the scalability of ERIS as well as our adaptivity facilities

Technische Universität Dresden: Qucosa

PRISMA/DB: A Parallel Main Memory Relational DBMS

Author: Apers P.M.G. (Peter)
Berg C.A. van den
Flokstra J.
Grefen P.W.P.J.
Kersten M.L. (Martin)
Wilschut A.N.
Publication venue: I.E.E.E. Computer Society Press
Publication date: 01/12/1992
Field of study

CWI's Institutional Repository

Graph Pattern Matching on Symmetric Multiprocessor Systems

Author: Krause Alexander
Publication venue
Publication date: 09/10/2020
Field of study

Graph-structured data can be found in nearly every aspect of today's world, be it road networks, social networks or the internet itself. From a processing perspective, finding comprehensive patterns in graph-structured data is a core processing primitive in a variety of applications, such as fraud detection, biological engineering or social graph analytics. On the hardware side, multiprocessor systems, that consist of multiple processors in a single scale-up server, are the next important wave on top of multi-core systems. In particular, symmetric multiprocessor systems (SMP) are characterized by the fact, that each processor has the same architecture, e.g. every processor is a multi-core and all multiprocessors share a common and huge main memory space. Moreover, large SMPs will feature a non-uniform memory access (NUMA), whose impact on the design of efficient data processing concepts should not be neglected. The efficient usage of SMP systems, that still increase in size, is an interesting and ongoing research topic. Current state-of-the-art architectural design principles provide different and in parts disjunct suggestions on which data should be partitioned and or how intra-process communication should be realized. In this thesis, we propose a new synthesis of four of the most well-known principles Shared Everything, Partition Serial Execution, Data Oriented Architecture and Delegation, to create the NORAD architecture, which stands for NUMA-aware DORA with Delegation. We built our research prototype called NeMeSys on top of the NORAD architecture to fully exploit the provided hardware capacities of SMPs for graph pattern matching. Being an in-memory engine, NeMeSys allows for online data ingestion as well as online query generation and processing through a terminal based user interface. Storing a graph on a NUMA system inherently requires data partitioning to cope with the mentioned NUMA effect. Hence, we need to dissect the graph into a disjunct set of partitions, which can then be stored on the individual memory domains. This thesis analyzes the capabilites of the NORAD architecture, to perform scalable graph pattern matching on SMP systems. To increase the systems performance, we further develop, integrate and evaluate suitable optimization techniques. That is, we investigate the influence of the inherent data partitioning, the interplay of messaging with and without sufficient locality information and the actual partition placement on any NUMA socket in the system. To underline the applicability of our approach, we evaluate NeMeSys against synthetic datasets and perform an end-to-end evaluation of the whole system stack on the real world knowledge graph of Wikidata

Technische Universität Dresden: Qucosa

A Case for Cooperative and Incentive-Based Coupling of Distributed Clusters

Author: Buyya Rajkumar
Harwood Aaron
Ranjan Rajiv
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Research interest in Grid computing has grown significantly over the past five years. Management of distributed resources is one of the key issues in Grid computing. Central to management of resources is the effectiveness of resource allocation as it determines the overall utility of the system. The current approaches to superscheduling in a grid environment are non-coordinated since application level schedulers or brokers make scheduling decisions independently of the others in the system. Clearly, this can exacerbate the load sharing and utilization problems of distributed resources due to suboptimal schedules that are likely to occur. To overcome these limitations, we propose a mechanism for coordinated sharing of distributed clusters based on computational economy. The resulting environment, called \emph{Grid-Federation}, allows the transparent use of resources from the federation when local resources are insufficient to meet its users' requirements. The use of computational economy methodology in coordinating resource allocation not only facilitates the QoS based scheduling, but also enhances utility delivered by resources.Comment: 22 pages, extended version of the conference paper published at IEEE Cluster'05, Boston, M

arXiv.org e-Print Archive

CiteSeerX

Crossref