620 research outputs found
Seed selection for information cascade in multilayer networks
Information spreading is an interesting field in the domain of online social
media. In this work, we are investigating how well different seed selection
strategies affect the spreading processes simulated using independent cascade
model on eighteen multilayer social networks. Fifteen networks are built based
on the user interaction data extracted from Facebook public pages and tree of
them are multilayer networks downloaded from public repository (two of them
being Twitter networks). The results indicate that various state of the art
seed selection strategies for single-layer networks like K-Shell or VoteRank do
not perform so well on multilayer networks and are outperformed by Degree
Centrality
Constraint-based sequence mining using constraint programming
The goal of constraint-based sequence mining is to find sequences of symbols
that are included in a large number of input sequences and that satisfy some
constraints specified by the user. Many constraints have been proposed in the
literature, but a general framework is still missing. We investigate the use of
constraint programming as general framework for this task. We first identify
four categories of constraints that are applicable to sequence mining. We then
propose two constraint programming formulations. The first formulation
introduces a new global constraint called exists-embedding. This formulation is
the most efficient but does not support one type of constraint. To support such
constraints, we develop a second formulation that is more general but incurs
more overhead. Both formulations can use the projected database technique used
in specialised algorithms. Experiments demonstrate the flexibility towards
constraint-based settings and compare the approach to existing methods.Comment: In Integration of AI and OR Techniques in Constraint Programming
(CPAIOR), 201
RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework
Initially, a number of frequent itemset mining (FIM) algorithms have been
designed on the Hadoop MapReduce, a distributed big data processing framework.
But, due to heavy disk I/O, MapReduce is found to be inefficient for such
highly iterative algorithms. Therefore, Spark, a more efficient distributed
data processing framework, has been developed with in-memory computation and
resilient distributed dataset (RDD) features to support the iterative
algorithms. On the Spark RDD framework, Apriori and FP-Growth based FIM
algorithms have been designed, but Eclat-based algorithm has not been explored
yet. In this paper, RDD-Eclat, a parallel Eclat algorithm on the Spark RDD
framework is proposed with its five variants. The proposed algorithms are
evaluated on the various benchmark datasets, which shows that RDD-Eclat
outperforms the Spark-based Apriori by many times. Also, the experimental
results show the scalability of the proposed algorithms on increasing the
number of cores and size of the dataset.Comment: 16 pages, 6 figures, ICCNCT 201
Prefix-Projection Global Constraint for Sequential Pattern Mining
Sequential pattern mining under constraints is a challenging data mining
task. Many efficient ad hoc methods have been developed for mining sequential
patterns, but they are all suffering from a lack of genericity. Recent works
have investigated Constraint Programming (CP) methods, but they are not still
effective because of their encoding. In this paper, we propose a global
constraint based on the projected databases principle which remedies to this
drawback. Experiments show that our approach clearly outperforms CP approaches
and competes well with ad hoc methods on large datasets
Discovering sequential rental patterns by fleet tracking
© Springer International Publishing Switzerland 2015. As one of the most well-known methods on customer analysis, sequential pattern mining generally focuses on customer business transactions to discover their behaviors. However in the real-world rental industry, behaviors are usually linked to other factors in terms of actual equipment circumstance. Fleet tracking factors, such as location and usage, have been widely considered as important features to improve work performance and predict customer preferences. In this paper, we propose an innovative sequential pattern mining method to discover rental patterns by combining business transactions with the fleet tracking factors. A novel sequential pattern mining framework is designed to detect the effective items by utilizing both business transactions and fleet tracking information. Experimental results on real datasets testify the effectiveness of our approach
Recommended from our members
Follow the blue bird: A study on threat data published on Twitter
Open Source Intelligence (OSINT) has taken the interest of cybersecurity practitioners due to its completeness and timeliness. In particular, Twitter has proven to be a discussion hub regarding the latest vulnerabilities and exploits. In this paper, we present a study comparing vulnerability databases between themselves and against Twitter. Although there is evidence of OSINT advantages, no methodological studies have addressed the quality and benefits of the sources available. We compare the publishing dates of more than nine-thousand vulnerabilities in the sources considered. We show that NVD is not the most timely or the most complete vulnerability database, that Twitter provides timely and impactful security alerts, that using diverse OSINT sources provides better completeness and timeliness of vulnerabilities, and provide insights on how to capture cybersecurity-relevant tweets
Sparsest factor analysis for clustering variables: a matrix decomposition approach
We propose a new procedure for sparse factor analysis (FA) such that each variable loads only one common factor. Thus, the loading matrix has a single nonzero element in each row and zeros elsewhere. Such a loading matrix is the sparsest possible for certain number of variables and common factors. For this reason, the proposed method is named sparsest FA (SSFA). It may also be called FA-based variable clustering, since the variables loading the same common factor can be classified into a cluster. In SSFA, all model parts of FA (common factors, their correlations, loadings, unique factors, and unique variances) are treated as fixed unknown parameter matrices and their least squares function is minimized through specific data matrix decomposition. A useful feature of the algorithm is that the matrix of common factor scores is re-parameterized using QR decomposition in order to efficiently estimate factor correlations. A simulation study shows that the proposed procedure can exactly identify the true sparsest models. Real data examples demonstrate the usefulness of the variable clustering performed by SSFA
- …