Search CORE

1,875 research outputs found

Extending local features with contextual information in graph kernels

Author: CC Aggarwal
G San Martino Da
M Collins
M Collins
N Shervashidze
S Vishwanathan
SVN Vishwanathan
T Gärtner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Graph kernels are usually defined in terms of simpler kernels over local substructures of the original graphs. Different kernels consider different types of substructures. However, in some cases they have similar predictive performances, probably because the substructures can be interpreted as approximations of the subgraphs they induce. In this paper, we propose to associate to each feature a piece of information about the context in which the feature appears in the graph. A substructure appearing in two different graphs will match only if it appears with the same context in both graphs. We propose a kernel based on this idea that considers trees as substructures, and where the contexts are features too. The kernel is inspired from the framework in [6], even if it is not part of it. We give an efficient algorithm for computing the kernel and show promising results on real-world graph classification datasets.Comment: To appear in ICONIP 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Padova

Approximate Minimum Diameter

Author: A Jørgensen
C Fan
CC Aggarwal
M Löffler
PK Agarwal
R Fleischer
R Fleischer
S Har-Peled
W Ju
Publication venue
Publication date: 31/03/2017
Field of study

We study the minimum diameter problem for a set of inexact points. By inexact, we mean that the precise location of the points is not known. Instead, the location of each point is restricted to a contineus region (\impre model) or a finite set of points (\indec model). Given a set of inexact points in one of \impre or \indec models, we wish to provide a lower-bound on the diameter of the real points. In the first part of the paper, we focus on \indec model. We present an

O(2^{\frac{1}{\epsilon^d}} \cdot \epsilon^{-2d} \cdot n^3 )

time approximation algorithm of factor

(1+\epsilon)

for finding minimum diameter of a set of points in

d

dimensions. This improves the previously proposed algorithms for this problem substantially. Next, we consider the problem in \impre model. In

d

-dimensional space, we propose a polynomial time

\sqrt{d}

-approximation algorithm. In addition, for

d=2

, we define the notion of

\alpha

-separability and use our algorithm for \indec model to obtain

(1+\epsilon)

-approximation algorithm for a set of

\alpha

-separable regions in time

O(2^{\frac{1}{\epsilon^2}}\allowbreak . \frac{n^3}{\epsilon^{10} .\sin(\alpha/2)^3} )

arXiv.org e-Print Archive

Crossref

The Early Bird Catches The Term: Combining Twitter and News Data For Event Detection and Situational Awareness

Author: A Hermida
A Marcus
A Sadilek
CC Aggarwal
CC Chang
DA Broniatowski
E Aramaki
E Diaz-Aviles
F Chierichetti
H Abdelhaq
H Becker
H Kwak
J Yin
M Thelwall
M Walther
ML Hutwagner
P Shaver
R Long
Publication venue
Publication date: 09/04/2015
Field of study

Twitter updates now represent an enormous stream of information originating from a wide variety of formal and informal sources, much of which is relevant to real-world events. In this paper we adapt existing bio-surveillance algorithms to detect localised spikes in Twitter activity corresponding to real events with a high level of confidence. We then develop a methodology to automatically summarise these events, both by providing the tweets which fully describe the event and by linking to highly relevant news articles. We apply our methods to outbreaks of illness and events strongly affecting sentiment. In both case studies we are able to detect events verifiable by third party sources and produce high quality summaries

arXiv.org e-Print Archive

Crossref

PubMed Central

Spiral - Imperial College Digital Repository

Towards Efficient Sequential Pattern Mining in Temporal Uncertain Databases

Author: C-K Chui
C-K Chui
CC Aggarwal
H Zhang
J Jestes
M Muzammal
Y Tong
Z Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Uncertain sequence databases are widely used to model data with inaccurate or imprecise timestamps in many real world applications. In this paper, we use uniform distributions to model uncertain timestamps and adopt possible world semantics to interpret temporal uncertain database. We design an incremental approach to manage temporal uncertainty efficiently, which is integrated into the classic pattern-growth SPM algorithm to mine uncertain sequential patterns. Extensive experiments prove that our algorithm performs well in both efficiency and scalability

Crossref

IUPUIScholarWorks

Conformative Filtering for Implicit Feedback Data

Author: CC Aggarwal
D Goldberg
E Christakopoulou
J Pearl
K Wang
M Knott
NL Zhang
P Chen
T Chen
T Liu
TF Liu
W Pan
Y Koren
Y Koren
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Implicit feedback is the simplest form of user feedback that can be used for item recommendation. It is easy to collect and is domain independent. However, there is a lack of negative examples. Previous work tackles this problem by assuming that users are not interested or not as much interested in the unconsumed items. Those assumptions are often severely violated since non-consumption can be due to factors like unawareness or lack of resources. Therefore, non-consumption by a user does not always mean disinterest or irrelevance. In this paper, we propose a novel method called Conformative Filtering (CoF) to address the issue. The motivating observation is that if there is a large group of users who share the same taste and none of them have consumed an item before, then it is likely that the item is not of interest to the group. We perform multidimensional clustering on implicit feedback data using hierarchical latent tree analysis (HLTA) to identify user `tastes' groups and make recommendations for a user based on her memberships in the groups and on the past behavior of the groups. Experiments on two real-world datasets from different domains show that CoF has superior performance compared to several common baselines

arXiv.org e-Print Archive

Crossref

Hong Kong University of Science and Technology Institutional Repository

Mining Uncertain Sequential Patterns in Iterative MapReduce

Author: B-S Jeong
CC Aggarwal
H Chernoff
J Jestes
M Muzammal
Y Tong
Z Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

This paper proposes a sequential pattern mining (SPM) algorithm in large scale uncertain databases. Uncertain sequence databases are widely used to model inaccurate or imprecise timestamped data in many real applications, where traditional SPM algorithms are inapplicable because of data uncertainty and scalability. In this paper, we develop an efficient approach to manage data uncertainty in SPM and design an iterative MapReduce framework to execute the uncertain SPM algorithm in parallel. We conduct extensive experiments in both synthetic and real uncertain datasets. And the experimental results prove that our algorithm is efficient and scalable

Crossref

IUPUIScholarWorks

Measuring Relations Between Concepts In Conceptual Spaces

Author: A Chella
A Chella
B Adams
B Bouchon-Meunier
CC Aggarwal
CR Smith
D Billman
F Attneave
I Douven
J Aisbett
J Derrac
JT Rickard
KP Bogart
LA Zadeh
M Lewis
M Mas
M Warglien
RN Shepard
S Harnad
S Schockaert
SR Fiorini
VR Young
Publication venue
Publication date: 06/12/2017
Field of study

The highly influential framework of conceptual spaces provides a geometric way of representing knowledge. Instances are represented by points in a high-dimensional space and concepts are represented by regions in this space. Our recent mathematical formalization of this framework is capable of representing correlations between different domains in a geometric way. In this paper, we extend our formalization by providing quantitative mathematical definitions for the notions of concept size, subsethood, implication, similarity, and betweenness. This considerably increases the representational power of our formalization by introducing measurable ways of describing relations between concepts.Comment: Accepted at SGAI 2017 (http://www.bcs-sgai.org/ai2017/). The final publication is available at Springer via https://doi.org/10.1007/978-3-319-71078-5_7. arXiv admin note: substantial text overlap with arXiv:1707.05165, arXiv:1706.0636

arXiv.org e-Print Archive

Crossref

A review on corpus annotation for arabic sentiment analysis

Author: A alOwisheq
A Kaur
A Mountassir
AM Azmi
CC Aggarwal
G Leech
H ElSahar
H Ibrahim
J Carletta
J Cohen
M Saleh
NY Habash
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Mining publicly available data for meaning and value is an important research direction within social media analysis. To automatically analyze collected textual data, a manual effort is needed for a successful machine learning algorithm to effectively classify text. This pertains to annotating the text adding labels to each data entry. Arabic is one of the languages that are growing rapidly in the research of sentiment analysis, despite limited resources and scares annotated corpora. In this paper, we review the annotation process carried out by those papers. A total of 27 papers were reviewed between the years of 2010 and 2016

Crossref

Warwick Research Archives Portal Repository

How to combine visual features with tags to improve movie recommendation accuracy?

Author: CC Aggarwal
D Bogdanov
D Brezeale
DR Hardoon
FM Harper
H Hotelling
I Fernández-Tobías
JC Pereira
M Degemmis
M Elahi
M Haghighat
N Rubens
P Cremonesi
T Mei
W Hu
Y Deldjoo
Y Deldjoo
Z Rasheed
Z Rasheed
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Previous works have shown the effectiveness of using stylistic visual features, indicative of the movie style, in content-based movie recommendation. However, they have mainly focused on a particular recommendation scenario, i.e., when a new movie is added to the catalogue and no information is available for that movie (New Item scenario). However, the stylistic visual features can be also used when other sources of information is available (Existing Item scenario). In this work, we address the second scenario and propose a hybrid technique that exploits not only the typical content available for the movies (e.g., tags), but also the stylistic visual content extracted form the movie files and fuse them by applying a fusion method called Canonical Correlation Analysis (CCA). Our experiments on a large catalogue of 13K movies have shown very promising results which indicates a considerable improvement of the recommendation quality by using a proper fusion of the stylistic visual features with other type of features

Politecnio die Bari - Catalogo di prodotti della Ricerca

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Sparsest factor analysis for clustering variables: a matrix decomposition approach

Author: A Stegeman
AJ Izenman
BS Everitt
C Spearman
CC Aggarwal
D Knowles
DM Zou
G Gan
GAF Seber
HH Harman
IT Jolliffe
J de Leeuw
JMF ten Berge
JMF ten Berge
K Adachi
K Adachi
K Adachi
K Hirose
K Hirose
Kohei Adachi
L Eldén
LR Goldberg
M Rattray
M Vichi
MJ Zaki
Nickolay T. Trendafilov
Nickolay T. Trendafilov
NT Trendafilov
NT Trendafilov
PT Costa
R Mazumder
R Reyment
S Unkel
SA Mulaik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2017
Field of study

We propose a new procedure for sparse factor analysis (FA) such that each variable loads only one common factor. Thus, the loading matrix has a single nonzero element in each row and zeros elsewhere. Such a loading matrix is the sparsest possible for certain number of variables and common factors. For this reason, the proposed method is named sparsest FA (SSFA). It may also be called FA-based variable clustering, since the variables loading the same common factor can be classified into a cluster. In SSFA, all model parts of FA (common factors, their correlations, loadings, unique factors, and unique variances) are treated as fixed unknown parameter matrices and their least squares function is minimized through specific data matrix decomposition. A useful feature of the algorithm is that the matrix of common factor scores is re-parameterized using QR decomposition in order to efficiently estimate factor correlations. A simulation study shows that the proposed procedure can exactly identify the true sparsest models. Real data examples demonstrate the usefulness of the variable clustering performed by SSFA

Crossref

Open Research Online