538,512 research outputs found
Distributed Processing of Generalized Graph-Pattern Queries in SPARQL 1.1
We propose an efficient and scalable architecture for processing generalized
graph-pattern queries as they are specified by the current W3C recommendation
of the SPARQL 1.1 "Query Language" component. Specifically, the class of
queries we consider consists of sets of SPARQL triple patterns with labeled
property paths. From a relational perspective, this class resolves to
conjunctive queries of relational joins with additional graph-reachability
predicates. For the scalable, i.e., distributed, processing of this kind of
queries over very large RDF collections, we develop a suitable partitioning and
indexing scheme, which allows us to shard the RDF triples over an entire
cluster of compute nodes and to process an incoming SPARQL query over all of
the relevant graph partitions (and thus compute nodes) in parallel. Unlike most
prior works in this field, we specifically aim at the unified optimization and
distributed processing of queries consisting of both relational joins and
graph-reachability predicates. All communication among the compute nodes is
established via a proprietary, asynchronous communication protocol based on the
Message Passing Interface
Recommended from our members
Cube-4 - A Scalable Architecture for Real-Time Volume Rendering
We present Cube-4, a special-purpose volume rendering architecture
that is capable of rendering high-resolution (e.g., 1024^3)
datasets at 30 frames per second. The underlying algorithm, called
slice-parallel ray-casting, uses tri-linear interpolation of samples
between data slices for parallel and perspective projections. The
architecture uses a distributed interleavedmemory, several parallel
processing pipelines, and an innovative parallel dataflow scheme
that requires no global communication, except at the pixel level.
This leads to local, fixed bandwidth interconnections and has the
benefits of high memory bandwidth, real-time data input, modularity,
and scalability. We have simulated the architecture and have
implemented a working prototype of the complete hardware on a
configurable custom hardware machine. Our results indicate true
real-time performance for high-resolution datasets and linear scalability
of performance with the number of processing pipelines.Engineering and Applied Science
Decentralization and Mechanism Design for Online Machine Scheduling
We study the online version of the classical parallel machine scheduling problem to minimize the total weighted completion time from a new perspective: We assume a strategic setting, where the data of each job j, namely its release date r(j) , its processing time p(j) and its weight w(j) is only known to the job itself, but not to the system. Furthermore, we assume a decentralized setting, where jobs choose the machine on which they want to be processed themselves. We study this setting from the perspective of algorithmic mechanism design and present a polynomial time decentralized online scheduling mechanism that induces rational jobs to select their machine in such a way that the resulting schedule is 3.281-competitive. The mechanism deploys an online payment scheme that induces rational jobs to truthfully report about their private data: with respect to release dates and processing times, truthfully reporting is a dominant strategy equilibrium, whereas truthfully reporting the weights is a myopic best response equilibrium. We also show that the local scheduling policy used in the mechanism cannot be extended to a mechanism where truthful reports with respect to weights constitute a dominant strategy equilibrium.operations research and management science;
A unifying framework for rigid multibody dynamics and serial and parallel computational issues
A unifying framework for various formulations of the dynamics of open-chain rigid multibody systems is discussed. Their suitability for serial and parallel processing is assessed. The framework is based on the derivation of intrinsic, i.e., coordinate-free, equations of the algorithms which provides a suitable abstraction and permits a distinction to be made between the computational redundancy in the intrinsic and extrinsic equations. A set of spatial notation is used which allows the derivation of the various algorithms in a common setting and thus clarifies the relationships among them. The three classes of algorithms viz., O(n), O(n exp 2) and O(n exp 3) or the solution of the dynamics problem are investigated. Researchers begin with the derivation of O(n exp 3) algorithms based on the explicit computation of the mass matrix and it provides insight into the underlying basis of the O(n) algorithms. From a computational perspective, the optimal choice of a coordinate frame for the projection of the intrinsic equations is discussed and the serial computational complexity of the different algorithms is evaluated. The three classes of algorithms are also analyzed for suitability for parallel processing. It is shown that the problem belongs to the class of N C and the time and processor bounds are of O(log2/2(n)) and O(n exp 4), respectively. However, the algorithm that achieves the above bounds is not stable. Researchers show that the fastest stable parallel algorithm achieves a computational complexity of O(n) with O(n exp 4), respectively. However, the algorithm that achieves the above bounds is not stable. Researchers show that the fastest stable parallel algorithm achieves a computational complexity of O(n) with O(n exp 2) processors, and results from the parallelization of the O(n exp 3) serial algorithm
An Optimized Model for MapReduce Based on Hadoop
Aiming at the waste of computing resources resulting from sequential control of running mechanism of MapReduce model on Hadoop platform,Fork/Join framework has been introduced into this model to make full use of CPU resource of each node. From the perspective of fine-grained parallel data processing, combined with Fork/Join framework,a parallel and multi-thread model,this paper optimizes MapReduce model and puts forward a MapReduce+Fork/Join programming model which is a distributed and parallel architecture combined with coarse-grained and fine-grained on Hadoop platform to Support two-tier levels of parallelism architecture both in shared and distributed memory machines. A test is made under the environment of Hadoop cluster composed of four nodes. And the experimental results prove that this model really can improve performance and efficiency of the whole system and it is not only suitable for handling tasks with data intensive but also tasks with computing intensive. it is an effective optimization and improvement to the MapReduce model of big data processing
Exploiting stream parallelism of MRI reconstruction using GrPPI over multiple back-ends
Proceeding of: 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), Larnaca, Cyprus, 14-17 May 2019In recent years, on-line processing of data streams has been established as a major computing paradigm. This is due mainly to two reasons: first, more and more data are generated in near real-time that need to be processed; the second reason is given by the need of efficient parallel applications. However, the above-mentioned areas expose a tough challenge over traditional data-analysis techniques, which have been forced to evolve to a stream perspective. In this work we present an comparative study of a stream-aware multi-staged application, which has been implemented using GrPPI, a generic and reusable parallel pattern interface for C++ applications. We demonstrate the benefits of using this interface in terms of programability, performance, and scalability.This work was supported by the EU project “ASPIDE: Exascale Programing Models for Extreme Data Processing” under grant 80109
Fast parallel algorithms for a broad class of nonlinear variational diffusion approaches
Variational segmentation and nonlinear diffusion approaches have been very active research areas in the fields of image processing and computer vision during the last years. In the present paper, we review recent advances in the development of efficient numerical algorithms for these approaches. The performance of parallel implement at ions of these algorithms on general-purpose hardware is assessed. A mathematically clear connection between variational models and nonlinear diffusion filters is presented that allows to interpret one approach as an approximation of the other, and vice versa. Numerical results confirm that, depending on the parametrization, this approximation can be made quite accurate. Our results provide a perspective for uniform implement at ions of both nonlinear variational models and diffusion filters on parallel architectures
- …