369 research outputs found
Parallel Mining of Association Rules Using a Lattice Based Approach
The discovery of interesting patterns from database transactions is one of the major problems in knowledge discovery in database. One such interesting pattern is the association rules extracted from these transactions. Parallel algorithms are required for the mining of association rules due to the very large databases used to store the transactions. In this paper we present a parallel algorithm for the mining of association rules. We implemented a parallel algorithm that used a lattice approach for mining association rules. The Dynamic Distributed Rule Mining (DDRM) is a lattice-based algorithm that partitions the lattice into sublattices to be assigned to processors for processing and identification of frequent itemsets. Experimental results show that DDRM utilizes the processors efficiently and performed better than the prefix-based and partition algorithms that use a static approach to assign classes to the processors. The DDRM algorithm scales well and shows good speedup
Incremental Processing and Optimization of Update Streams
Over the recent years, we have seen an increasing number of applications in networking, sensor networks, cloud computing, and environmental monitoring, which monitor, plan, control, and make decisions over data streams from multiple sources. We are interested in extending traditional stream processing techniques to meet the new challenges of these applications. Generally, in order to support genuine continuous query optimization and processing over data streams, we need to systematically understand how to address incremental optimization and processing of update streams for a rich class of queries commonly used in the applications.
Our general thesis is that efficient incremental processing and re-optimization of update streams can be achieved by various incremental view maintenance techniques if we cast the problems as incremental view maintenance problems over data streams. We focus on two incremental processing of update streams challenges currently not addressed in existing work on stream query processing: incremental processing of transitive closure queries over data streams, and incremental re-optimization of queries. In addition to addressing these specific challenges, we also develop a working prototype system Aspen, which serves as an end-to-end stream processing system that has been deployed as the foundation for a case study of our SmartCIS application. We validate our solutions both analytically and empirically on top of our prototype system Aspen, over a variety of benchmark workloads such as TPC-H and LinearRoad Benchmarks
Implementation and Evaluation of Algorithmic Skeletons: Parallelisation of Computer Algebra Algorithms
This thesis presents design and implementation approaches for the parallel algorithms of computer algebra. We use algorithmic skeletons and also further approaches, like data parallel arithmetic and actors. We have implemented skeletons for divide and conquer algorithms and some special parallel loops, that we call ‘repeated computation with a possibility of premature termination’. We introduce in this thesis a rational data parallel arithmetic. We focus on parallel symbolic computation algorithms, for these algorithms our arithmetic provides a generic parallelisation approach.
The implementation is carried out in Eden, a parallel functional programming language based on Haskell. This choice enables us to encode both the skeletons and the programs in the same language. Moreover, it allows us to refrain from using two different languages—one for the implementation and one for the interface—for our implementation of computer algebra algorithms.
Further, this thesis presents methods for evaluation and estimation of parallel execution times. We partition the parallel execution time into two components. One of them accounts for the quality of the parallelisation, we call it the ‘parallel penalty’. The other is the sequential execution time. For the estimation, we predict both components separately, using statistical methods. This enables very confident estimations, although using drastically less measurement points than other methods. We have applied both our evaluation and estimation approaches to the parallel programs presented in this thesis. We haven also used existing estimation methods.
We developed divide and conquer skeletons for the implementation of fast parallel multiplication. We have implemented the Karatsuba algorithm, Strassen’s matrix multiplication algorithm and the fast Fourier transform. The latter was used to implement polynomial convolution that leads to a further fast multiplication algorithm. Specially for our implementation of Strassen algorithm we have designed and implemented a divide and conquer skeleton basing on actors. We have implemented the parallel fast Fourier transform, and not only did we use new divide and conquer skeletons, but also developed a map-and-transpose skeleton. It enables good parallelisation of the Fourier transform. The parallelisation of Karatsuba multiplication shows a very good performance. We have analysed the parallel penalty of our programs and compared it to the serial fraction—an approach, known from literature. We also performed execution time estimations of our divide and conquer programs.
This thesis presents a parallel map+reduce skeleton scheme. It allows us to combine the usual parallel map skeletons, like parMap, farm, workpool, with a premature termination property. We use this to implement the so-called ‘parallel repeated computation’, a special form of a speculative parallel loop. We have implemented two probabilistic primality tests: the Rabin–Miller test and the Jacobi sum test. We parallelised both with our approach. We analysed the task distribution and stated the fitting configurations of the Jacobi sum test. We have shown formally that the Jacobi sum test can be implemented in parallel. Subsequently, we parallelised it, analysed the load balancing issues, and produced an optimisation. The latter enabled a good implementation, as verified using the parallel penalty. We have also estimated the performance of the tests for further input sizes and numbers of processing elements. Parallelisation of the Jacobi sum test and our generic parallelisation scheme for the repeated computation is our original contribution.
The data parallel arithmetic was defined not only for integers, which is already known, but also for rationals. We handled the common factors of the numerator or denominator of the fraction with the modulus in a novel manner. This is required to obtain a true multiple-residue arithmetic, a novel result of our research. Using these mathematical advances, we have parallelised the determinant computation using the Gauß elimination. As always, we have performed task distribution analysis and estimation of the parallel execution time of our implementation. A similar computation in Maple emphasised the potential of our approach. Data parallel arithmetic enables parallelisation of entire classes of computer algebra algorithms.
Summarising, this thesis presents and thoroughly evaluates new and existing design decisions for high-level parallelisations of computer algebra algorithms
Managing distributed flexible manufacturing systems
Per molti anni la ricerca scientifica si è concentrata sui diversi aspetti di gestione dei sistemi
manifatturieri, dall’ottimizzazione dei singoli processi produttivi, fino alla gestione delle più
complesse imprese virtuali. Tuttavia molti aspetti inerenti il coordinamento e il controllo, ancora
presentano problematiche rilevanti in ambito industriale e temi di ricerca aperti.
L’applicazione di tecnologie avanzate e di strumenti informatici evoluti non riesce da sola a
garantire il successo nelle funzioni di controllo e di integrazione. Al fine di ottenere un alto grado di
efficienza, è necessario supportare tali tecnologie e strumenti con dei modelli che siano in grado di
rappresentare le funzionalità e i processi dei sistemi manifatturieri, e consentano di prevederne e
gestirne l’evoluzione. Ne emerge l’esigenza di politiche di controllo e di gestio ne distribuite, che
favoriscano l’auto-organizzazione e la cooperazione nei sistemi manifatturieri.
I sistemi manifatturieri flessibili distribuiti (DFMS), in risposta a tale esigenza, sono sistemi di
produzione dinamici in grado di garantire una risposta in tempo reale alla allocazione ottima delle
risorse, e organizzare efficientemente le lavorazioni dei prodotti.
In questa tesi viene proposta una modellizzazione a livelli per tali sistemi. Secondo tale
rappresentazione un DFMS può essere visto come un grafo strutturato su più livelli, tale che: i
vertici del grafo rappresentano le risorse interagenti nel sistema; ogni nodo di un livello rappresenta
a sua volta un livello successivo. Partendo da questa rappresentazione, sono stati quindi sviluppati
due modelli per lo studio dell’allocazione ottima delle risorse (task mapping) e per l’organizzazione
di lavorazioni (task scheduling) che richiedono l’uso simultaneo di risorse condivise nel sistema.
Il task mapping problem consiste nella suddivisione bilanciata di un certo insieme di lavorazioni tra
le risorse del sistema. In questa tesi si è studiato il caso in cui le lavorazioni sono omogenee, non
presentano vincoli di precedenza, ma necessitano di un certo volume di comunicazione tra le risorse
cui sono assegnate per garantirne il coordinamento, incidendo in tal senso sulla complessità di
gestione. L’analisi critica dei modelli che sono tipicamente usati in letteratura per rappresentare tale
problema, ne hanno posto in evidenza l’inadeguatezza. Attraverso alcuni risultati teorici si è quindi
dimostrato come il problema possa ricondursi ad un hypergraph partitioning problem. Studiando la
formulazione matematica di tali problemi, e limitandosi al caso di due risorse produttive, si è infine
giunti alla determinazione di una buona approssimazione sulla soluzione ottima.
Il problema di sequenziamento delle lavorazioni (task scheduling) che richiedono l’uso simultaneo
di risorse condivise è stato trattato nel caso specifico di celle robotizzate. E’ stata quindi dimostrata
l’NP-completezza di questo problema ed è stata progettata una euristica di soluzione, validandone i
risultati in diversi scenari produttivi.For several years, research has focused on several aspects of manufacturing, from the individual
processes towards the management of virtual enterprises, but several aspects, like coordination and
control, still have relevant problems in industry and remain challenging areas of research.
The application of advanced technologies and informational tools by itself does not guarantee the
success of control and integration applications. In order to get a high degree of integration and
efficiency, it is necessary to match the technologies and tools with models that describe the existing
knowledge and functionality in the system and allow the correct understanding of its behaviour. In a
global and wide market competition, the manufacturing systems present requirements that lead to
distributed, self-organised, co-operative and heterogeneous control applications.
A Distributed Flexible Manufacturing System (DFMS) is a goal-driven and data-directed dynamic
system which is designed to provide an effective operation sequence for the products to fulfil the
production goals, to meet real-time requirements and to optimally allocate resources.
In this work first a layered approach for modeling such production systems is proposed. According
to that representation, a DFMS may be seen as multi-layer resource-graph such that: vertices on a
layer represent interacting resources; a layer at level l is represented by a node in the layer at level
(l-1). Then two models are developed concerning with two relevant managerial issues in DFMS, the
task mapping problem and the task scheduling with multiple shared resources problem.
The task mapping problem concerns with the balanced partition of a given set of jobs and the
assignment of the parts to the resources of the manufacturing system. We study the case in which
the jobs are quite homogeneous, do not have precedence constraints, but need some
communications to be coordinated. So, jobs assignment to different parts causes a relevant
communication effort between those parts, increasing the managerial complexity. We show that the
standard models usually used to formal represent such a problem are wrong. Through some graph
theoretical results we relate the problem to the well-known hypergraph partitioning problem and
briefly survey the best techniques to solve the problem. A new formulation of the problem is then
presented. Some considerations on an improved version of the formulation permit the computation
of a good Lower Bound on the optimal solution in the case of the hypergraph bisection.
The task scheduling with multiple shared resources problem is addressed for a robotic cell. We
study the general problem of sequencing multiple jobs, where each job consists of multiple ordered
tasks and tasks execution requires simultaneous usage of several resources. NP-completeness results
are given. A heuristic with a guarantee approximation result is designed and evaluated
- …