Search CORE

920 research outputs found

Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster

Author: Chakraborty Abhirup
Singh Ajit
Publication venue
Publication date: 24/07/2013
Field of study

The availability of large number of processing nodes in a parallel and distributed computing environment enables sophisticated real time processing over high speed data streams, as required by many emerging applications. Sliding window stream joins are among the most important operators in a stream processing system. In this paper, we consider the issue of parallelizing a sliding window stream join operator over a shared nothing cluster. We propose a framework, based on fixed or predefined communication pattern, to distribute the join processing loads over the shared-nothing cluster. We consider various overheads while scaling over a large number of nodes, and propose solution methodologies to cope with the issues. We implement the algorithm over a cluster using a message passing system, and present the experimental results showing the effectiveness of the join processing algorithm.Comment: 11 page

arXiv.org e-Print Archive

Crossref

Scalable storage for a DBMS using transparent distribution

Author: Karlsson J.S.
Kersten M.L. (Martin)
Publication venue: CWI
Publication date: 01/01/1997
Field of study

Scalable Distributed Data Structures (SDDSs) provide a self-managing and self-organizing data storage of potentially unbounded size. This stands in contrast to common distribution schemas deployed in conventional distributed DBMS. SDDSs, however, have mostly been used in synthetic scenarios to investigate their properties. In this paper we concentrate on the integration of the LH* SDDS into our efficient and extensible DBMS, called Monet. We show that this merge permits processing very large sets of distributed data. In our implementation we extended the relational algebra interpreter in such a way that access to data, whether it is distributed or locally stored, is transparent to the user. The on-the-fly optimization of operations --- heavily used in Monet --- to deploy different strategies and scenarios inside the primary operators associated with an SDDS adds self-adaptiveness to the query system; it dynamically adopts itself to unforeseen situations. We illustrate the performance efficiency by experiments on a network of workstations. The transparent integration of SDDSs opens new perspectives for very large self-managing database systems

CWI's Institutional Repository

GRIDKIT: Pluggable overlay networks for Grid computing

Author: A. El-Sayed
A. Grimshaw
A. Rowstron
B. Li
F. Dabek
F. Kon
G. Coulson
H. Balakrishnan
K. Czajkowski
L. Mathy
M. Castro
M. Clark
N. Furmento
N. Parlavantzas
P. Grace
S. Floyd
S. Pallickara
Publication venue: SPRINGER-VERLAG BERLIN
Publication date: 01/01/2004
Field of study

A `second generation' approach to the provision of Grid middleware is now emerging which is built on service-oriented architecture and web services standards and technologies. However, advanced Grid applications have significant demands that are not addressed by present-day web services platforms. As one prime example, current platforms do not support the rich diversity of communication `interaction types' that are demanded by advanced applications (e.g. publish-subscribe, media streaming, peer-to-peer interaction). In the paper we describe the Gridkit middleware which augments the basic service-oriented architecture to address this particular deficiency. We particularly focus on the communications infrastructure support required to support multiple interaction types in a unified, principled and extensible manner-which we present in terms of the novel concept of pluggable overlay networks

CiteSeerX

Crossref

Lancaster E-Prints

On Efficiency of Distributed Password Recovery

Author: Holkovič Martin
Hranický Radek
Matoušek Petr
Publication venue: (Print) 1558-7215
Publication date: 01/01/2016
Field of study

One of the major challenges in digital forensics today is data encryption. Due to the leaked information about unlawful sniffing, many users decided to protect their data by encryption. In case of criminal activities, forensic experts are challenged how to decipher suspect\u27s data that are subject to investigation. A common method how to overcome password-based protection is a brute force password recovery using GPU-accelerated hardware. This approach seems to be expensive. This paper presents an alternative approach using task distribution based on BOINC platform. The cost, time and energy efficiency of this approach is discussed and compared to the GPU-based solution

Crossref

Embry-Riddle Aeronautical University

Building Scientific Clouds: The Distributed, Peer-to-Peer Approach

Author: Vadakedathu Linton
Publication venue: Clemson University Libraries
Publication date: 01/05/2010
Field of study

The Scientific community is constantly growing in size. The increase in personnel number and projects have resulted in the requirement of large amounts of storage, CPU power and other computing resources. It has also become necessary to acquire these resources in an affordable manner that is sensitive to work loads. In this thesis, the author presents a novel approach that provides the communication platform that will support such large scale scientific projects. These resources could be difficult to acquire due to NATs, firewalls and other site-based restrictions and policies. Methods used to overcome these hurdles have been discussed in detail along with other advantages of using such a system, which include: increased availability of necessary computing infrastructure; increased grid resource utilization; reduced user dependability; reduced job execution time. Experiments conducted included local infrastructure on the Clemson University Campus as well as resources provided by other federated grid sites

Clemson University: TigerPrints

Robust and Skew-resistant Parallel Joins in Shared-Nothing Systems

Author: Cheng L.
Cheng L.
DeWitt D. J.
Hassan M. Al Hajj
Liu B.
Walton C. B.
Publication venue
Publication date: 03/11/2014
Field of study

The performance of joins in parallel database management systems is critical for data intensive operations such as querying. Since data skew is common in many applications, poorly engineered join operations result in load imbalance and performance bottlenecks. State-of-the-art methods designed to handle this problem offer significant improvements over naive implementations. However, performance could be further improved by removing the dependency on global skew knowledge and broadcasting. In this paper, we propose PRPQ (partial redistribution & partial query), an efficient and robust join algorithm for processing large-scale joins over distributed systems. We present the detailed implementation and a quantitative evaluation of our method. The experimental results demonstrate that the proposed PRPQ algorithm is indeed robust and scalable under a wide range of skew conditions. Specifically, compared to the state-of-art PRPD method, we achieve 16% - 167% performance improvement and 24% - 54% less network communication under different join workloads

Durham Research Online

Crossref

Mobile object location discovery in unpredictable environments

Author: Ferguson R.I.
Glassey R.
Stevenson G.
Publication venue
Publication date: 01/01/2006
Field of study

Emerging mobile and ubiquitous computing environments present hard challenges to software engineering. The use of mobile code has been suggested as a natural fit for simplifing software development for these environments. However, the task of discovering mobile code location becomes a problem in unpredictable environments when using existing strategies, designed with fixed and relatively stable networks in mind. This paper introduces AMOS, a mobile code platform augmented with a structured overlay network. We demonstrate how the location discovery strategy of AMOS has better reliability and scalability properties than existing approaches, with minimal communication overhead. Finally, we demonstrate how AMOS can provide autonomous distribution of effort fairly throughout a network using probabilistic methods that requires no global knowledge of host capabilities

University of Strathclyde Institutional Repository

An evaluation of PERF joins for a two-way semijoin based algorithm.

Author: Yang Li
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2005
Field of study

Distributed database system is becoming more widely used instead of centralized database systems in business world due to business expansion and network technology development. Query optimization provides a strategy for executing each query over the networks in the most cost-effective way, which aims to minimize the transmission cost over the networks. Many techniques and algorithms have been proposed to optimize queries, such as semijoin[BC81][BGW+81], 2-way semijoin[KR87], composite semijoin[PC90], hash semijoin[TC92], PERF join[LR95], etc. In distributed query processing, the semijoin has been used as an effective operator to reduce the total amount of data transmission. 2-way semijoin is an extended version of semijoin for more cost-effective distributed query processing. PERF joins are 2-way semijoins using a bit vector during the backward phase. PERF[LR95] is designed to minimize the cost of the backward reduction. It is based on the tuple scan order instead of hashing. Thus it does not suffer any loss of join information incurred by hash collisions. Algorithm UPSJ and Algorithm CPSJ are proposed based on a 2-way semijoin algorithm. Two variants of PERF joins are applied to the 2-way semijoin algorithm. In Algorithm UPSJ, uncompressed PERF joins and 2-way semijoin techniques are combined. In Algorithm CPSJ, compressed PERF joins are applied during the backward processing. Programs are designed to implement both original and the enhanced algorithms. Several experiments are conducted and the results showed a considerable enhancement obtained by applying the PERF join concept.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .Y36. Source: Masters Abstracts International, Volume: 44-03, page: 1419. Thesis (M.Sc.)--University of Windsor (Canada), 2005

Scholarship at UWindsor

Just-in-time Data Distribution for Analytical Query Processing

Author: Groffen F.E. (Fabian)
Ivanova M.G. (Milena)
Kersten M.L. (Martin)
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/09/2012
Field of study

Distributed processing commonly requires data spread across machines using a priori static or hash-based data allocation. In this paper, we explore an alternative approach that starts from a master node in control of the complete database, and a variable number of worker nodes for delegated query processing. Data is shipped just-in-time to the worker nodes using a need to know policy, and is being reused, if possible, in subsequent queries. A bidding mechanism among the workers yields a scheduling with the most efficient reuse of previously shipped data, minimizing the data transfer costs. Just-in-time data shipment allows our system to benefit from locally available idle resources to boost overall performance. The system is maintenance-free and allocation is fully transparent to users. Our experiments show that the proposed adaptive distributed architecture is a viable and flexible alternative for small scale MapReduce-type of settings

CWI's Institutional Repository