Search CORE

620 research outputs found

Seed selection for information cascade in multilayer networks

Author: E Omodei
F Erlandsson
J Demšar
M Kitsak
M Salehi
MJ Zaki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/10/2017
Field of study

Information spreading is an interesting field in the domain of online social media. In this work, we are investigating how well different seed selection strategies affect the spreading processes simulated using independent cascade model on eighteen multilayer social networks. Fifteen networks are built based on the user interaction data extracted from Facebook public pages and tree of them are multilayer networks downloaded from public repository (two of them being Twitter networks). The results indicate that various state of the art seed selection strategies for single-layer networks like K-Shell or VoteRank do not perform so well on multilayer networks and are outperformed by Degree Centrality

arXiv.org e-Print Archive

Blekinge Institute of Technology

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Constraint-based sequence mining using constraint programming

Author: H Mannila
K Ye
MJ Zaki
T Fannes
T Guns
W Ugarte Rojas
Publication venue
Publication date: 25/02/2015
Field of study

The goal of constraint-based sequence mining is to find sequences of symbols that are included in a large number of input sequences and that satisfy some constraints specified by the user. Many constraints have been proposed in the literature, but a general framework is still missing. We investigate the use of constraint programming as general framework for this task. We first identify four categories of constraints that are applicable to sequence mining. We then propose two constraint programming formulations. The first formulation introduces a new global constraint called exists-embedding. This formulation is the most efficient but does not support one type of constraint. To support such constraints, we develop a second formulation that is more general but incurs more overhead. Both formulations can use the projected database technique used in specialised algorithms. Experiments demonstrate the flexibility towards constraint-based settings and compare the approach to existing methods.Comment: In Integration of AI and OR Techniques in Constraint Programming (CPAIOR), 201

arXiv.org e-Print Archive

Crossref

RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework

Author: F Zhang
J Dean
J Han
KK Sethi
KW Chon
MJ Zaki
MJ Zaki
S Rathee
S Singh
Y Xun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/12/2019
Field of study

Initially, a number of frequent itemset mining (FIM) algorithms have been designed on the Hadoop MapReduce, a distributed big data processing framework. But, due to heavy disk I/O, MapReduce is found to be inefficient for such highly iterative algorithms. Therefore, Spark, a more efficient distributed data processing framework, has been developed with in-memory computation and resilient distributed dataset (RDD) features to support the iterative algorithms. On the Spark RDD framework, Apriori and FP-Growth based FIM algorithms have been designed, but Eclat-based algorithm has not been explored yet. In this paper, RDD-Eclat, a parallel Eclat algorithm on the Spark RDD framework is proposed with its five variants. The proposed algorithms are evaluated on the various benchmark datasets, which shows that RDD-Eclat outperforms the Spark-based Apriori by many times. Also, the experimental results show the scalability of the proposed algorithms on increasing the number of cores and size of the dataset.Comment: 16 pages, 6 figures, ICCNCT 201

arXiv.org e-Print Archive

Crossref

Prefix-Projection Global Constraint for Sequential Pattern Mining

Author: B Negrevergne
G Pesant
G Yang
MJ Zaki
MN Garofalakis
N Beldiceanu
P Fournier-Viger
T Guns
Publication venue
Publication date: 23/06/2015
Field of study

Sequential pattern mining under constraints is a challenging data mining task. Many efficient ad hoc methods have been developed for mining sequential patterns, but they are all suffering from a lack of genericity. Recent works have investigated Constraint Programming (CP) methods, but they are not still effective because of their encoding. In this paper, we propose a global constraint based on the projected databases principle which remedies to this drawback. Experiments show that our approach clearly outperforms CP approaches and competes well with ad hoc methods on large datasets

arXiv.org e-Print Archive

HAL - Normandie Université

Crossref

Discovering sequential rental patterns by fleet tracking

Author: CH Mooney
GC Lan
J Pei
KM Kumar
MJ Zaki
VCC Liao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

© Springer International Publishing Switzerland 2015. As one of the most well-known methods on customer analysis, sequential pattern mining generally focuses on customer business transactions to discover their behaviors. However in the real-world rental industry, behaviors are usually linked to other factors in terms of actual equipment circumstance. Fleet tracking factors, such as location and usage, have been widely considered as important features to improve work performance and predict customer preferences. In this paper, we propose an innovative sequential pattern mining method to discover rental patterns by combining business transactions with the fleet tracking factors. A novel sequential pattern mining framework is designed to detect the effective items by utilizing both business transactions and fleet tracking information. Experimental results on real datasets testify the effectiveness of our approach

Crossref

OPUS - University of Technology Sydney

Recommended from our members

Follow the blue bird: A study on threat data published on Twitter

Author: MJ Zaki
N Alsaedi
R Syed
RD Steele
W Xie
Y Roumani
Publication venue
Publication date: 01/01/2020
Field of study

Open Source Intelligence (OSINT) has taken the interest of cybersecurity practitioners due to its completeness and timeliness. In particular, Twitter has proven to be a discussion hub regarding the latest vulnerabilities and exploits. In this paper, we present a study comparing vulnerability databases between themselves and against Twitter. Although there is evidence of OSINT advantages, no methodological studies have addressed the quality and benefits of the sources available. We compare the publishing dates of more than nine-thousand vulnerabilities in the sources considered. We show that NVD is not the most timely or the most complete vulnerability database, that Twitter provides timely and impactful security alerts, that using diverse OSINT sources provides better completeness and timeliness of vulnerabilities, and provide insights on how to capture cybersecurity-relevant tweets

City Research Online

Crossref

IMSR_PreTree: an improved algorithm for mining sequential rules based on the prefix-tree

Author: D Lo
E Baralis
J Pei
K Gouda
MJ Zaki
P Fournier-Viger
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Frequent pattern mining: current status and future directions

Author: A Nanopoulos
Dong Xin
E Omiecinski
H Mannila
Hong Cheng
J Wang
J Yang
Jiawei Han
M Eirinaki
M Zaki
MJ Zaki
MJ Zaki
R Agrawal
RM Karp
T Imielinski
Xifeng Yan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Mining transposed motifs in music

Author: A Jimenez
Aída Jiménez
D Meredith
E Cambouropoulos
E Narmour
Fernando Berzal
G Dong
HJ Böckenhauer
J Han
J Paulus
J Yang
L Jiang
M Bartsch
M Levy
Miguel Molina-Solana
MJ Zaki
MJ Zaki
MJ Zaki
PY Rolland
R Srikant
Waldo Fajardo
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Sparsest factor analysis for clustering variables: a matrix decomposition approach

Author: A Stegeman
AJ Izenman
BS Everitt
C Spearman
CC Aggarwal
D Knowles
DM Zou
G Gan
GAF Seber
HH Harman
IT Jolliffe
J de Leeuw
JMF ten Berge
JMF ten Berge
K Adachi
K Adachi
K Adachi
K Hirose
K Hirose
Kohei Adachi
L Eldén
LR Goldberg
M Rattray
M Vichi
MJ Zaki
Nickolay T. Trendafilov
Nickolay T. Trendafilov
NT Trendafilov
NT Trendafilov
PT Costa
R Mazumder
R Reyment
S Unkel
SA Mulaik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2017
Field of study

We propose a new procedure for sparse factor analysis (FA) such that each variable loads only one common factor. Thus, the loading matrix has a single nonzero element in each row and zeros elsewhere. Such a loading matrix is the sparsest possible for certain number of variables and common factors. For this reason, the proposed method is named sparsest FA (SSFA). It may also be called FA-based variable clustering, since the variables loading the same common factor can be classified into a cluster. In SSFA, all model parts of FA (common factors, their correlations, loadings, unique factors, and unique variances) are treated as fixed unknown parameter matrices and their least squares function is minimized through specific data matrix decomposition. A useful feature of the algorithm is that the matrix of common factor scores is re-parameterized using QR decomposition in order to efficiently estimate factor correlations. A simulation study shows that the proposed procedure can exactly identify the true sparsest models. Real data examples demonstrate the usefulness of the variable clustering performed by SSFA

Crossref

Open Research Online