Search CORE

1,852 research outputs found

A Survey of Probabilistic Models for Relational Data

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Graphs in machine learning: an introduction

Author: Latouche Pierre
Rossi Fabrice
Publication venue
Publication date: 22/04/2015
Field of study

Graphs are commonly used to characterise interactions between objects of interest. Because they are based on a straightforward formalism, they are used in many scientific fields from computer science to historical sciences. In this paper, we give an introduction to some methods relying on graphs for learning. This includes both unsupervised and supervised methods. Unsupervised learning algorithms usually aim at visualising graphs in latent spaces and/or clustering the nodes. Both focus on extracting knowledge from graph topologies. While most existing techniques are only applicable to static graphs, where edges do not evolve through time, recent developments have shown that they could be extended to deal with evolving networks. In a supervised context, one generally aims at inferring labels or numerical values attached to nodes using both the graph and, when they are available, node characteristics. Balancing the two sources of information can be challenging, especially as they can disagree locally or globally. In both contexts, supervised and un-supervised, data can be relational (augmented with one or several global graphs) as described above, or graph valued. In this latter case, each object of interest is given as a full graph (possibly completed by other characteristics). In this context, natural tasks include graph clustering (as in producing clusters of graphs rather than clusters of nodes in a single graph), graph classification, etc. 1 Real networks One of the first practical studies on graphs can be dated back to the original work of Moreno [51] in the 30s. Since then, there has been a growing interest in graph analysis associated with strong developments in the modelling and the processing of these data. Graphs are now used in many scientific fields. In Biology [54, 2, 7], for instance, metabolic networks can describe pathways of biochemical reactions [41], while in social sciences networks are used to represent relation ties between actors [66, 56, 36, 34]. Other examples include powergrids [71] and the web [75]. Recently, networks have also been considered in other areas such as geography [22] and history [59, 39]. In machine learning, networks are seen as powerful tools to model problems in order to extract information from data and for prediction purposes. This is the object of this paper. For more complete surveys, we refer to [28, 62, 49, 45]. In this section, we introduce notations and highlight properties shared by most real networks. In Section 2, we then consider methods aiming at extracting information from a unique network. We will particularly focus on clustering methods where the goal is to find clusters of vertices. Finally, in Section 3, techniques that take a series of networks into account, where each network i

arXiv.org e-Print Archive

HAL-Paris1

Hal-Diderot

Transforming Graph Representations for Statistical Relational Learning

Author: Aha David W.
McDowell Luke K.
Neville Jennifer
Rossi Ryan A.
Publication venue
Publication date: 01/01/2012
Field of study

Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation for the nodes, links, and features can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

A Survey of Probabilistic Models for Relational Data

Author: Koutsourelakis P S
Publication venue: Lawrence Livermore National Laboratory
Publication date: 13/10/2006
Field of study

Traditional data mining methodologies have focused on ''flat'' data i.e. a collection of identically structured entities, assumed to be independent and identically distributed. However, many real-world datasets are innately relational in that they consist of multi-modal entities and multi-relational links (where each entity- or link-type is characterized by a different set of attributes). Link structure is an important characteristic of a dataset and should not be ignored in modeling efforts, especially when statistical dependencies exist between related entities. These dependencies can in fact significantly improve the accuracy of inference and prediction results, if the relational structure is appropriately leveraged (Figure 1). The need for models that can incorporate relational structure has been accentuated by new technological developments which allow us to easily track, store, and make accessible large amounts of data. Recently, there has been a surge of interest in statistical models for dealing with richly interconnected, heterogeneous data, fueled largely by information mining of web/hypertext data, social networks, bibliographic citation data, epidemiological data and communication networks. Graphical models have a natural formalism for representing complex relational data and for predicting the underlying evolving system in a dynamic framework. The present survey provides an overview of probabilistic methods and techniques that have been developed over the last few years for dealing with relational data. Particular emphasis is paid to approaches pertinent to the research areas of pattern recognition, group discovery, entity/node classification, and anomaly detection. We start with supervised learning tasks, where two basic modeling approaches are discussed--i.e. discriminative and generative. Several discriminative techniques are reviewed and performance results are presented. Generative methods are discussed in a separate survey. A special section is devoted to latent variable models due to their unique characteristics and usefulness in static and dynamic frameworks and in both supervised and unsupervised learning processes. Section 4 contains a brief discussion of unsupervised learning techniques with an emphasis on computational efficiency and large networks. Finally, section 5 discusses performance metrics with an emphasis on classification problems

UNT Digital Library

On Member Labelling in Social Networks

Author: Corchuelo Gil Rafael
Jiménez Aguirre Patricia
Reina Quintero Antonia María
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Software agents are increasingly used to search for experts, recommend resources, assess opinions, and other similar tasks in the context of social networks, which requires to have accurate information that describes the features of the members of the network. Unfortu-nately, many member profiles are incomplete, which has motivated many authors to work on automatic member labelling, that is, on techniques that can infer the null features of a member from his or her neighbour-hood. Current proposals are based on local or global approaches; the former compute predictors from local neighbourhoods, whereas the lat-ter analyse social networks as a whole. Their main problem is that they tend to be inefficient and their effectiveness degrades significantly as the percentage of null labels increases. In this paper, we present Katz, which is a novel hybrid proposal to solve the member labelling problem using neural networks. Our experiments prove that it outperforms other pro-posals in the literature in terms of both effectiveness and efficiency.Ministerio de Educación y Ciencia TIN2007-64119Junta de Andalucía P07-TIC-2602Junta de Andalucía P08-TIC-4100Ministerio de Ciencia e Innovación TIN2008-04718-EMinisterio de Ciencia e Innovación TIN2010-21744Ministerio de Economía, Industria y Competitividad TIN2010-09809-EMinisterio de Ciencia e Innovación TIN2010-10811-EMinisterio de Ciencia e Innovación TIN2010-09988-EMinisterio de Economía y Competitividad TIN2011-15497-EMinisterio de Economía y Competitividad TIN2013-40848-

Crossref

idUS. Depósito de Investigación Universidad de Sevilla