810 research outputs found

    An efficient parallel method for mining frequent closed sequential patterns

    Get PDF
    Mining frequent closed sequential pattern (FCSPs) has attracted a great deal of research attention, because it is an important task in sequences mining. In recently, many studies have focused on mining frequent closed sequential patterns because, such patterns have proved to be more efficient and compact than frequent sequential patterns. Information can be fully extracted from frequent closed sequential patterns. In this paper, we propose an efficient parallel approach called parallel dynamic bit vector frequent closed sequential patterns (pDBV-FCSP) using multi-core processor architecture for mining FCSPs from large databases. The pDBV-FCSP divides the search space to reduce the required storage space and performs closure checking of prefix sequences early to reduce execution time for mining frequent closed sequential patterns. This approach overcomes the problems of parallel mining such as overhead of communication, synchronization, and data replication. It also solves the load balance issues of the workload between the processors with a dynamic mechanism that re-distributes the work, when some processes are out of work to minimize the idle CPU time.Web of Science5174021739

    A New Data Layout For Set Intersection on GPUs

    Full text link
    Set intersection is the core in a variety of problems, e.g. frequent itemset mining and sparse boolean matrix multiplication. It is well-known that large speed gains can, for some computational problems, be obtained by using a graphics processing unit (GPU) as a massively parallel computing device. However, GPUs require highly regular control flow and memory access patterns, and for this reason previous GPU methods for intersecting sets have used a simple bitmap representation. This representation requires excessive space on sparse data sets. In this paper we present a novel data layout, "BatMap", that is particularly well suited for parallel processing, and is compact even for sparse data. Frequent itemset mining is one of the most important applications of set intersection. As a case-study on the potential of BatMaps we focus on frequent pair mining, which is a core special case of frequent itemset mining. The main finding is that our method is able to achieve speedups over both Apriori and FP-growth when the number of distinct items is large, and the density of the problem instance is above 1%. Previous implementations of frequent itemset mining on GPU have not been able to show speedups over the best single-threaded implementations.Comment: A version of this paper appears in Proceedings of IPDPS 201

    Porting Decision Tree Algorithms to Multicore using FastFlow

    Full text link
    The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an approach for easy-yet-efficient porting of an implementation of the C4.5 algorithm on multicores. The parallel porting requires minimal changes to the original sequential code, and it is able to exploit up to 7X speedup on an Intel dual-quad core machine.Comment: 18 pages + cove

    Exploiting Data Skew for Improved Query Performance

    Full text link
    Analytic queries enable sophisticated large-scale data analysis within many commercial, scientific and medical domains today. Data skew is a ubiquitous feature of these real-world domains. In a retail database, some products are typically much more popular than others. In a text database, word frequencies follow a Zipf distribution with a small number of very common words, and a long tail of infrequent words. In a geographic database, some regions have much higher populations (and data measurements) than others. Current systems do not make the most of caches for exploiting skew. In particular, a whole cache line may remain cache resident even though only a small part of the cache line corresponds to a popular data item. In this paper, we propose a novel index structure for repositioning data items to concentrate popular items into the same cache lines. The net result is better spatial locality, and better utilization of limited cache resources. We develop a theoretical model for analyzing the cache behavior, and implement database operators that are efficient in the presence of skew. Our experiments on real and synthetic data show that exploiting skew can significantly improve in-memory query performance. In some cases, our techniques can speed up queries by over an order of magnitude

    Frequent itemset mining on multiprocessor systems

    Get PDF
    Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism. In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined. For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets

    Analysis and acceleration of data mining algorithms on high performance reconfigurable computing platforms

    Get PDF
    With the continued development of computation and communication technologies, we are overwhelmed with electronic data. Ubiquitous data in governments, commercial enterprises, universities and various organizations records our decisions, transactions and thoughts. The data collection rate is undergoing tremendous increase. And there is no end in sight. On one hand, as the volume of data explodes, the gap between the human being\u27s understanding of the data and the knowledge hidden in the data will be enlarged. The algorithms and techniques, collectively known as data mining, are emerged to bridge the gap. The data mining algorithms are usually data-compute intensive. On the other hand, the overall computing system performance is not increasing at an equal rate. Consequently, there is strong requirement to design special computing systems to accelerate data mining applications. FPGAs based High Performance Reconfigurable Computing(HPRC) system is to design optimized hardware architecture for a given problem. The increased gate count, arithmetic capability, and other features of modern FPGAs now allow researcher to implement highly complicated reconfigurable computational architecture. In contrast with ASICs, FPGAs have the advantages of low power, low nonrecurring engineering costs, high design flexibility and the ability to update functionality after shipping. In this thesis, we first design the architectures for data intensive and data-compute intensive applications respectively. Then we present a general HPRC framework for data mining applications: Frequent Pattern Mining(FPM) is a data-compute intensive application which is to find commonly occurring itemsets in databases. We use systolic tree architecture in FPGA hardware to mimic the internal memory layout of FP-growth algorithm while achieving higher throughput. The experimental results demonstrate that the proposed hardware architecture is faster than the software approach. Sparse Matrix-Vector Multiplication(SMVM) is a data-intensive application which is an important computing core in many applications. We present a scalable and efficient FPGA-based SMVM architecture which can handle arbitrary matrix sizes without preprocessing or zero padding and can be dynamically expanded based on the available I/O bandwidth. The experimental results using a commercial FPGA-based acceleration system demonstrate that our reconfigurable SMVM engine is more efficient than existing state-of-the-art, with speedups over a highly optimized software implementation of 2.5X to 6.5X, depending on the sparsity of the input benchmark. Accelerating Text Classification Using SMVM is performed in Convey HC-1 HPRC platform. The SMVM engines are deployed into multiple FPGA chips. Text documents are represented as large sparse matrices using Vector Space Model(VSM). The k-nearest neighbor algorithm uses SMVM to perform classification simultaneously on multiple FPGAs. Our experiment shows that the classification in Convey HC-1 is several times faster compared with the traditional computing architecture. MapReduce Reconfigurable Framework for Data Mining Applications is a pipelined and high performance framework for FPGA design based on the MapReduce model. Our goal is to lessen the FPGA programmer burden while minimizing performance degradation. The designer only need focus on the mapper and reducer modules design. We redesigned the SMVM architecture using the MapReduce Framework. The manual VHDL code is only 15 percent of that used in the customized architecture

    On Parallelizing Matrix Multiplication by the Column-Row Method

    Full text link
    We consider the problem of sparse matrix multiplication by the column row method in a distributed setting where the matrix product is not necessarily sparse. We present a surprisingly simple method for “consistent ” parallel processing of sparse outer products (column-row vector products) over several processors, in a communication-avoiding setting where each processor has a copy of the input. The method is consistent in the sense that a given output entry is always assigned to the same processor independently of the specific structure of the outer product. We show guarantees on the work done by each processor, and achieve linear speedup down to the point where the cost is dominated by reading the input. Our method gives a way of distributing (or parallelizing) matrix product computations in settings where the main bottlenecks are storing the result matrix, and inter-processor communication. Motivated by observations on real data that often the absolute values of the entries in the product adhere to a power law, we combine our approach with frequent items mining algorithms and show how to obtain a tight approximation of the weight of the heaviest entries in the product matrix. As a case study we present the application of our approach to frequent pair mining in transactional data streams, a problem that can be phrased in terms of sparse {0, 1}integer matrix multiplication by the column-row method. Experimental evaluation of the proposed method on real-life data supports the theoretical findings.
    corecore