8 research outputs found
Frequent itemset mining in big data with effective single scan algorithms
© 2013 IEEE. This paper considers frequent itemsets mining in transactional databases. It introduces a new accurate single scan approach for frequent itemset mining (SSFIM), a heuristic as an alternative approach (EA-SSFIM), as well as a parallel implementation on Hadoop clusters (MR-SSFIM). EA-SSFIM and MR-SSFIM target sparse and big databases, respectively. The proposed approach (in all its variants) requires only one scan to extract the candidate itemsets, and it has the advantage to generate a fixed number of candidate itemsets independently from the value of the minimum support. This accelerates the scan process compared with existing approaches while dealing with sparse and big databases. Numerical results show that SSFIM outperforms the state-of-the-art FIM approaches while dealing with medium and large databases. Moreover, EA-SSFIM provides similar performance as SSFIM while considerably reducing the runtime for large databases. The results also reveal the superiority of MR-SSFIM compared with the existing HPC-based solutions for FIM using sparse and big databases
Exploring Decomposition for Solving Pattern Mining Problems
This article introduces a highly efficient pattern mining technique called Clustering-based Pattern Mining (CBPM). This technique discovers relevant patterns by studying the correlation between transactions in the transaction database based on clustering techniques. The set of transactions is first clustered, such that highly correlated transactions are grouped together. Next, we derive the relevant patterns by applying a pattern mining algorithm to each cluster. We present two different pattern mining algorithms, one applying an approximation-based strategy and another based on an exact strategy. The approximation-based strategy takes into account only the clusters, whereas the exact strategy takes into account both clusters and shared items between clusters. To boost the performance of the CBPM, a GPU-based implementation is investigated. To evaluate the CBPM framework, we perform extensive experiments on several pattern mining problems. The results from the experimental evaluation show that the CBPM provides a reduction in both the runtime and memory usage. Also, CBPM based on the approximate strategy provides good accuracy, demonstrating its effectiveness and feasibility. Our GPU implementation achieves significant speedup of up to 552× on a single GPU using big transaction databases.publishedVersio
Representation learning in complex data via pattern discovery
This study proposes effective methods to learn meaningful representations for complex data such as sequences and graphs. It combines two important techniques in data mining and machine learning: pattern discovery and representation learning. The proposed methods can be applied to different real-world problems including healthcare analysis, business marketing, and bioinformatic
A Frequent Pattern Conjunction Heuristic for Rule Generation in Data Streams
This paper introduces a new and expressive algorithm for inducing descriptive rule-sets from streaming data in real-time in order to describe frequent patterns explicitly encoded in the stream. Data Stream Mining (DSM) is concerned with the automatic analysis of data streams in real-time. Rapid flows of data challenge the state-of-the art processing and communication infrastructure, hence the motivation for research and innovation into real-time algorithms that analyse data streams on-the-fly and can automatically adapt to concept drifts. To date, DSM techniques have largely focused on predictive data mining applications that aim to forecast the value of a particular target feature of unseen data instances, answering questions such as whether a credit card transaction is fraudulent or not. A real-time, expressive and descriptive Data Mining technique for streaming data has not been previously established as part of the DSM toolkit. This has motivated the work reported in this paper, which has resulted in developing and validating a Generalised Rule Induction (GRI) tool, thus producing expressive rules as explanations that can be easily understood by human analysts. The expressiveness of decision models in data streams serves the objectives of transparency, underpinning the vision of `explainable AI’ and yet is an area of research that has attracted less attention despite being of high practical importance. The algorithm introduced and described in this paper is termed Fast Generalised Rule Induction (FGRI). FGRI is able to induce descriptive rules incrementally for raw data from both categorical and numerical features. FGRI is able to adapt rule-sets to changes of the pattern encoded in the data stream (concept drift) on the fly as new data arrives and can thus be applied continuously in real-time. The paper also provides a theoretical, qualitative and empirical evaluation of FGRI
Rileggere il marketing. Strategie informative e Gestione della Conoscenza
Il libro analizza alcuni temi emergenti nel campo del marketing legati al trattamento dell'informazione, allo sviluppo delle nuove tecnologie, alla comunicazione sociale e all'etica. Il quadro teorico di riferimento è quelle delle ecologie del valore ovvero sistemi emergenti che producono valore attraverso l'interazione in rete estese di imprese e/o di persone
Towards the development of flexible, reliable, reconfigurable, and high-performance imaging systems
Current FPGAs can implement large systems because of the high density of
reconfigurable logic resources in a single chip. FPGAs are comprehensive devices
that combine flexibility and high performance in the same platform compared to
other platform such as General-Purpose Processors (GPPs) and Application Specific
Integrated Circuits (ASICs). The flexibility of modern FPGAs is further enhanced by
introducing Dynamic Partial Reconfiguration (DPR) feature, which allows for
changing the functionality of part of the system while other parts are functioning.
FPGAs became an important platform for digital image processing applications
because of the aforementioned features. They can fulfil the need of efficient and
flexible platforms that execute imaging tasks efficiently as well as the reliably with
low power, high performance and high flexibility. The use of FPGAs as accelerators
for image processing outperforms most of the current solutions. Current FPGA
solutions can to load part of the imaging application that needs high computational
power on dedicated reconfigurable hardware accelerators while other parts are
working on the traditional solution to increase the system performance. Moreover,
the use of the DPR feature enhances the flexibility of image processing further by
swapping accelerators in and out at run-time. The use of fault mitigation techniques
in FPGAs enables imaging applications to operate in harsh environments following
the fact that FPGAs are sensitive to radiation and extreme conditions.
The aim of this thesis is to present a platform for efficient implementations of
imaging tasks. The research uses FPGAs as the key component of this platform and
uses the concept of DPR to increase the performance, flexibility, to reduce the power
dissipation and to expand the cycle of possible imaging applications. In this context,
it proposes the use of FPGAs to accelerate the Image Processing Pipeline (IPP)
stages, the core part of most imaging devices. The thesis has a number of novel
concepts. The first novel concept is the use of FPGA hardware environment and
DPR feature to increase the parallelism and achieve high flexibility. The concept also
increases the performance and reduces the power consumption and area utilisation.
Based on this concept, the following implementations are presented in this thesis: An
implementation of Adams Hamilton Demosaicing algorithm for camera colour
interpolation, which exploits the FPGA parallelism to outperform other equivalents.
In addition, an implementation of Automatic White Balance (AWB), another IPP
stage that employs DPR feature to prove the mentioned novelty aspects. Another
novel concept in this thesis is presented in chapter 6, which uses DPR feature to
develop a novel flexible imaging system that requires less logic and can be
implemented in small FPGAs. The system can be employed as a template for any
imaging application with no limitation. Moreover, discussed in this thesis is a novel
reliable version of the imaging system that adopts novel techniques including
scrubbing, Built-In Self Test (BIST), and Triple Modular Redundancy (TMR) to
detect and correct errors using the Internal Configuration Access Port (ICAP)
primitive. These techniques exploit the datapath-based nature of the implemented
imaging system to improve the system's overall reliability. The thesis presents a
proposal for integrating the imaging system with the Robust Reliable Reconfigurable
Real-Time Heterogeneous Operating System (R4THOS) to get the best out of the
system. The proposal shows the suitability of the proposed DPR imaging system to
be used as part of the core system of autonomous cars because of its unbounded
flexibility. These novel works are presented in a number of publications as shown in section
1.3 later in this thesis