3,787 research outputs found
Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster
The availability of large number of processing nodes in a parallel and
distributed computing environment enables sophisticated real time processing
over high speed data streams, as required by many emerging applications.
Sliding window stream joins are among the most important operators in a stream
processing system. In this paper, we consider the issue of parallelizing a
sliding window stream join operator over a shared nothing cluster. We propose a
framework, based on fixed or predefined communication pattern, to distribute
the join processing loads over the shared-nothing cluster. We consider various
overheads while scaling over a large number of nodes, and propose solution
methodologies to cope with the issues. We implement the algorithm over a
cluster using a message passing system, and present the experimental results
showing the effectiveness of the join processing algorithm.Comment: 11 page
CLP-based protein fragment assembly
The paper investigates a novel approach, based on Constraint Logic
Programming (CLP), to predict the 3D conformation of a protein via fragments
assembly. The fragments are extracted by a preprocessor-also developed for this
work- from a database of known protein structures that clusters and classifies
the fragments according to similarity and frequency. The problem of assembling
fragments into a complete conformation is mapped to a constraint solving
problem and solved using CLP. The constraint-based model uses a medium
discretization degree Ca-side chain centroid protein model that offers
efficiency and a good approximation for space filling. The approach adapts
existing energy models to the protein representation used and applies a large
neighboring search strategy. The results shows the feasibility and efficiency
of the method. The declarative nature of the solution allows to include future
extensions, e.g., different size fragments for better accuracy.Comment: special issue dedicated to ICLP 201
BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures
We introduce BriskStream, an in-memory data stream processing system (DSPSs)
specifically designed for modern shared-memory multicore architectures.
BriskStream's key contribution is an execution plan optimization paradigm,
namely RLAS, which takes relative-location (i.e., NUMA distance) of each pair
of producer-consumer operators into consideration. We propose a branch and
bound based approach with three heuristics to resolve the resulting nontrivial
optimization problem. The experimental evaluations demonstrate that BriskStream
yields much higher throughput and better scalability than existing DSPSs on
multi-core architectures when processing different types of workloads.Comment: To appear in SIGMOD'1
- …