620 research outputs found

    Seed selection for information cascade in multilayer networks

    Full text link
    Information spreading is an interesting field in the domain of online social media. In this work, we are investigating how well different seed selection strategies affect the spreading processes simulated using independent cascade model on eighteen multilayer social networks. Fifteen networks are built based on the user interaction data extracted from Facebook public pages and tree of them are multilayer networks downloaded from public repository (two of them being Twitter networks). The results indicate that various state of the art seed selection strategies for single-layer networks like K-Shell or VoteRank do not perform so well on multilayer networks and are outperformed by Degree Centrality

    Constraint-based sequence mining using constraint programming

    Full text link
    The goal of constraint-based sequence mining is to find sequences of symbols that are included in a large number of input sequences and that satisfy some constraints specified by the user. Many constraints have been proposed in the literature, but a general framework is still missing. We investigate the use of constraint programming as general framework for this task. We first identify four categories of constraints that are applicable to sequence mining. We then propose two constraint programming formulations. The first formulation introduces a new global constraint called exists-embedding. This formulation is the most efficient but does not support one type of constraint. To support such constraints, we develop a second formulation that is more general but incurs more overhead. Both formulations can use the projected database technique used in specialised algorithms. Experiments demonstrate the flexibility towards constraint-based settings and compare the approach to existing methods.Comment: In Integration of AI and OR Techniques in Constraint Programming (CPAIOR), 201

    RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework

    Full text link
    Initially, a number of frequent itemset mining (FIM) algorithms have been designed on the Hadoop MapReduce, a distributed big data processing framework. But, due to heavy disk I/O, MapReduce is found to be inefficient for such highly iterative algorithms. Therefore, Spark, a more efficient distributed data processing framework, has been developed with in-memory computation and resilient distributed dataset (RDD) features to support the iterative algorithms. On the Spark RDD framework, Apriori and FP-Growth based FIM algorithms have been designed, but Eclat-based algorithm has not been explored yet. In this paper, RDD-Eclat, a parallel Eclat algorithm on the Spark RDD framework is proposed with its five variants. The proposed algorithms are evaluated on the various benchmark datasets, which shows that RDD-Eclat outperforms the Spark-based Apriori by many times. Also, the experimental results show the scalability of the proposed algorithms on increasing the number of cores and size of the dataset.Comment: 16 pages, 6 figures, ICCNCT 201

    Prefix-Projection Global Constraint for Sequential Pattern Mining

    Full text link
    Sequential pattern mining under constraints is a challenging data mining task. Many efficient ad hoc methods have been developed for mining sequential patterns, but they are all suffering from a lack of genericity. Recent works have investigated Constraint Programming (CP) methods, but they are not still effective because of their encoding. In this paper, we propose a global constraint based on the projected databases principle which remedies to this drawback. Experiments show that our approach clearly outperforms CP approaches and competes well with ad hoc methods on large datasets

    Discovering sequential rental patterns by fleet tracking

    Full text link
    © Springer International Publishing Switzerland 2015. As one of the most well-known methods on customer analysis, sequential pattern mining generally focuses on customer business transactions to discover their behaviors. However in the real-world rental industry, behaviors are usually linked to other factors in terms of actual equipment circumstance. Fleet tracking factors, such as location and usage, have been widely considered as important features to improve work performance and predict customer preferences. In this paper, we propose an innovative sequential pattern mining method to discover rental patterns by combining business transactions with the fleet tracking factors. A novel sequential pattern mining framework is designed to detect the effective items by utilizing both business transactions and fleet tracking information. Experimental results on real datasets testify the effectiveness of our approach

    Sparsest factor analysis for clustering variables: a matrix decomposition approach

    Get PDF
    We propose a new procedure for sparse factor analysis (FA) such that each variable loads only one common factor. Thus, the loading matrix has a single nonzero element in each row and zeros elsewhere. Such a loading matrix is the sparsest possible for certain number of variables and common factors. For this reason, the proposed method is named sparsest FA (SSFA). It may also be called FA-based variable clustering, since the variables loading the same common factor can be classified into a cluster. In SSFA, all model parts of FA (common factors, their correlations, loadings, unique factors, and unique variances) are treated as fixed unknown parameter matrices and their least squares function is minimized through specific data matrix decomposition. A useful feature of the algorithm is that the matrix of common factor scores is re-parameterized using QR decomposition in order to efficiently estimate factor correlations. A simulation study shows that the proposed procedure can exactly identify the true sparsest models. Real data examples demonstrate the usefulness of the variable clustering performed by SSFA
    corecore