Search CORE

67 research outputs found

DeepWalk: Online Learning of Social Representations

Author: Al-Rfou R.
Bottou L.
Dean J.
Hinton G. E.
Kondor R. I.
Krizhevsky A.
Macskassy S. A.
Mikolov T.
Mikolov T.
Morin F.
Neville J.
Recht B.
Vishwanathan S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/06/2014
Field of study

We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk's latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk's representations can provide

F_1

scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk's representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.Comment: 10 pages, 5 figures, 4 table

arXiv.org e-Print Archive

Crossref

The effect of interfirm financial transactions on the credit risk of small and medium-sized enterprises

Author: Macskassy S. A.
Petrone D.
Probst P.
Probst P.
van Buuren S.
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

© 2019 The Authors. Despite the recognized importance of interfirm financial links in determining a company's performance, only a few studies have incorporated proxies for interfirm links in credit risk models, and none of these use real financial transactions. We estimate a credit risk model for small and medium-sized enterprises, augmented with information on observed interfirm financial transactions. We exploit a novel data set on about 60000 companies based in the UK and their financial transactions over the years 2015 and 2016. We develop several network-augmented credit risk models and compare their prediction performance with that of a conventional credit risk model that includes only a set of financial ratios. We find that augmenting a default risk model with information on the transaction network makes a significant contribution to increasing the default prediction power of risk models built specifically for small and medium-sized enterprises. Our results may help bankers and credit scoring agencies to improve the credit scoring of these companies, ultimately reducing their propensity to apply excessive lending restrictions.Engineering and Physical Sciences Research Council (grant EP/L021250/1)

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Brunel University Research Archive

A bias/variance decomposition for models using collective inference

Author: D. Heckerman
David Jensen
G. James
J. Friedman
Jennifer Neville
L. Getoor
L. Goodman
P. Domingos
R. Duda
R. Holte
S. Geman
S. Macskassy
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

From Popularity Prediction to Ranking Online News

Author: A Clauset
F Wu
G Szabo
JH Friedman
K Järvelin
M Cha
M Mitzenmacher
P Van Mieghem
Q Wu
R Crane
S Fortunato
SA Macskassy
TY Liu
Y Freund
Z Dezsö
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/01/2014
Field of study

International audienceNews articles are an engaging type of online content that captures the attention of a significant amount of Internet users. They are particularly enjoyed by mobile users and massively spread through online social platforms. As a result, there is an increased interest in discovering the articles that will become popular among users. This objective falls under the broad scope of content popularity prediction and has direct implications in the development of new services for online advertisement and content distribution. In this paper, we address the problem of predicting the popularity of news articles based on user comments. We formulate the prediction task as a ranking problem, where the goal is not to infer the precise attention that a content will receive but to accurately rank articles based on their predicted popularity. Using data obtained from two important news sites in France and Netherlands, we analyze the ranking effectiveness of two prediction models. Our results indicate that popularity prediction methods are adequate solutions for this ranking task and could be considered as a valuable alternative for automatic online news ranking

Crossref

INRIA a CCSD electronic archive server

Lifted graphical models: a survey

Author: Angelika Kimmig
B Ahmadi
B London
BL Richards
C Yanover
CM Bishop
D Heckerman
D Heckerman
D Koller
D Poole
D Suciu
EM Airoldi
FR Kschischang
H Khosravi
HR Lourenço
J Besag
J Neville
J Pearl
J Pearl
JR Quinlan
L Getoor
L Getoor
L Raedt De
L Raedt De
L Raedt De
L Raedt De
L Tierney
Lilyana Mihalkova
Lise Getoor
LR Rabiner
M Otterlo Van
M Richardson
MJ Wainwright
MP Wellman
N Lavrac̆
N Taghipour
O Schulte
O Schulte
P Damien
P Domingos
P Sen
P Spirtes
R Salvo Braz de
R Salvo Braz de
S Macskassy
S Muggleton
S Muggleton
S Natarajan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2015
Field of study

Lifted graphical models provide a language for expressing dependencies between different types of entities, their attributes, and their diverse relations, as well as techniques for probabilistic reasoning in such multi-relational domains. In this survey, we review a general form for a lifted graphical model, a par-factor graph, and show how a number of existing statistical relational representations map to this formalism. We discuss inference algorithms, including lifted inference algorithms, that efficiently compute the answers to probabilistic queries over such models. We also review work in learning lifted graphical models from data. There is a growing need for statistical relational models (whether they go by that name or another), as we are inundated with data which is a mix of structured and unstructured, with entities and relations extracted in a noisy manner from text, and with the need to reason effectively with this data. We hope that this synthesis of ideas from many different research groups will provide an accessible starting point for new researchers in this expanding field

Crossref

Online Research @ Cardiff

Beyond tissueInfo: functional prediction using tissue expression profile similarity searches

Author: Adams
Alfarano
Allocco
Amzallag
Aragues
Baldo Oliva
Ben-Hur
Boguski
Bortoluzzi
Brenner
Brown
Campagne
Cans
Chatr-aryamontri
Chen
Christoffels
Cockell
Daniel Aguilar
Derakhshan
Espadaler
Ewing
Fabien Campagne
Ferguson
Greco
Hao
Haridas
Hibbs
Huttenhower
Jansen
Kanehisa
Kasprzyk
Kerrien
Lane
Lee
Lee
Lucy Skrabanek
Macskassy
Marchese
Marra
Max
Mewes
Mishra
Nilsson
O’Dowd
Pei
Salwinski
Schena
Schuler
Sengupta
Shklar
Skrabanek
Steven S. Gross
Uehara
Urbich
Velculescu
von Mering
Publication venue: Oxford University Press
Publication date
Field of study

We present and validate tissue expression profile similarity searches (TEPSS), a computational approach to identify transcripts that share similar tissue expression profiles to one or more transcripts in a group of interest. We evaluated TEPSS for its ability to discriminate between pairs of transcripts coding for interacting proteins and non-interacting pairs. We found that ordering protein–protein pairs by TEPSS score produces sets significantly enriched in reported pairs of interacting proteins [interacting versus non-interacting pairs, Odds-ratio (OR) = 157.57, 95% confidence interval (CI) (36.81–375.51) at 1% coverage, employing a large dataset of about 50 000 human protein interactions]. When used with multiple transcripts as input, we find that TEPSS can predict non-obvious members of the cytosolic ribosome. We used TEPSS to predict S-nitrosylation (SNO) protein targets from a set of brain proteins that undergo SNO upon exposure to physiological levels of S-nitrosoglutathione in vitro. While some of the top TEPSS predictions have been validated independently, several of the strongest SNO TEPSS predictions await experimental validation. Our data indicate that TEPSS is an effective and flexible approach to functional prediction. Since the approach does not use sequence similarity, we expect that TEPSS will be useful for various gene discovery applications. TEPSS programs and data are distributed at http://icb.med.cornell.edu/crt/tepss/index.xml

Crossref

PubMed Central

Structured machine learning: the next ten years

Author: A. Amini
A. Fern
A. Fern
A. Paes
A. Rosenfeld
A. Tamaddoni-Nezhad
A. Tamaddoni-Nezhad
B. Milch
C. Bryant
C. Parker
C. Parker
D. Bertsekas
D. Lowd
D. Poole
E. Shapiro
F. DiMaio
G. DeJong
G. E. Hinton
G. Plotkin
H. Daumé III
H. Pasula
I. Tsochantaridis
J. Cussens
J. Cussens
J. Duchi
J. Kubica
J. Leathwick
J. Neville
J. Nocedal
J. Quinlan
K. Crammer
K. Kersting
K. Kersting
L. Getoor
L. Getoor
L. Raedt De
Lise Getoor
M. Reid
M. Richardson
M. Wellman
N. Friedman
N. Lavrač
P. Domingos
P. Finn
P. Winston
Pedro Domingos
Prasad Tadepalli
R. Fikes
R. King
S. Colton
S. Dz̆eroski
S. Kok
S. Kok
S. Macskassy
S. Muggleton
S. Muggleton
S. Muggleton
S. Muggleton
S. Muggleton
S. Muggleton
S. Wrobel
Stephen Muggleton
T. G. Dietterich
T. G. Evans
T. Gärtner
T. M. Mitchell
T. Sato
Thomas G. Dietterich
V. Costa
Y. Anzai
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Survey of Text Classification Algorithms

Author: A Blum
A Dayanik
A McCallum
A McCallum
A Weigand
AP Dempster
B Liu
C Apte
C Cortes
CC Aggarwal
D Boley
D Chickering
D Hardin
D Hull
D Jensen
D Johnson
D Lewis
D Lewis
D Lewis
G-R Xue
H Drucker
H Li
H Raghavan
H Schutze
J Zhang
JR Quinlan
K Myers
K Nigam
L Breiman
L Brieman
L Cai
LS Larkey
M Aizerman
M Craven
M Craven
M Ruiz
N Littlestone
N Slonim
N Slonim
P Domingos
P Howland
P Howland
P Long
R Bekkerman
R El-Yaniv
R Fisher
R Iyer
R Schapire
R Shapire
S Basu
S Chakrabarti
S Chakrabarti
S Chakraborti
S Deerwester
S Dumais
S Dumais
S Gopal
S Lam
S Zhu
SA Macskassy
SE Robertson
SM Weiss
T Salles
TM Cover
V Castelli
V Sindhwani
V Vapnik
W Cohen
W Cooper
W Lam
Y Li
Y Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Moderated Class membership Interchange in Iterative Multi relational Graph Classifier

Author: A. Galstyan
B. Liu
D. Jensen
G. Xue
M. Bieliková
S. Chakrabarti
S. Macskassy
S.A. Macskassy
Publication venue
Publication date: 01/01/2010
Field of study

Organizing information resources into classes helps significantly in searching in massive volumes of on line documents available through the Web or other information sources such as electronic mail, digital libraries, corporate databases. Existing classification methods are often based only on own content of document, i.e. its attributes. Considering relations in the web document space brings better results. We adopt multi relational classification that interconnects attribute based classifiers with iterative optimization based on relational heterogeneous graph structures, while different types of instances and various relation types can be classified together. We establish moderated class membership spreading mechanism in multi relational graphs and compare the impact of various levels of regulation in collective inference classifier. The experiments based on large scale graphs originated in MAPEKUS research project data set (web portals of scientific libraries) demonstrate that moderated class membership spreading significantly increases accuracy of the relational classifier (up to 10%) and protects instances with heterophilic neighborhood to be misclassified

CiteSeerX

Crossref