Search CORE

54 research outputs found

Transforming Graph Representations for Statistical Relational Learning

Author: Aha David W.
McDowell Luke K.
Neville Jennifer
Rossi Ryan A.
Publication venue
Publication date: 01/01/2012
Field of study

Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation for the nodes, links, and features can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed

arXiv.org e-Print Archive

CiteSeerX

Preference learning and similarity learning perspectives on personalized recommendation

Author: LE Duy Dung
Publication venue: Singapore Management University
Publication date: 01/09/2019
Field of study

Institutional Knowledge at Singapore Management University

Recommended from our members

On Applications of Relational Data

Author: Khemmarat Samamon
Publication venue: ScholarWorks@UMass Amherst
Publication date: 09/11/2015
Field of study

With the advances of technology and the popularity of the Internet, a large amount of data is being generated and collected. Much of these data is relational data, which describe how people and things, or entities, are related to one another. For example, data from sale transactions on e-commerce websites tell us which customers buy or view which products. Analyzing the known relationships from relational data can help us to discover knowledge that can benefit businesses, organizations, and our lives. For instance, learning the products that are commonly bought together allows businesses to recommend products to customers and increase their sales. Hidden or new relationships can also be inferred based on relational data. In addition, based on the connections among the entities, we can approximate the level of relatedness between two entities, even though their relationship may be hard to observe or quantify. This research aims to explore novel applications of relational data that will help to improve our life in various aspects, such as improving business operations, improving experiences in using online services, and improving health care services. In applying relational data in any domain, there are two common challenges. First, the size of the data can be massive, but many applications require that results are obtained within a short time. Second, relational data are often noisy and incomplete. Many relationships are extracted automatically from text resources, and hence they are prone to errors. Our goal is not only to propose novel applications of relational data but also to develop techniques and algorithms that will facilitate and make such applications practical. This work addresses three novel applications of relational data. The first application is to use relational data to improve user experiences in online video sharing services. Second, we propose the use of relational data to find entities that are closely related to one another. Such problems arise in various domains, such as product recommendation and query suggestion. Third, we propose the use of relational data to assist medical practitioners in drug prescription. For these applications, we introduce several techniques and algorithms to address the aforementioned challenges in using relational data. Our approaches are evaluated extensively to demonstrate their effectiveness. The approaches proposed in this work not only can be used in the specific applications we discuss but also can help to facilitate and promote the use of relational data in other application domains

ScholarWorks@UMass Amherst

Graph based Anomaly Detection and Description: A Survey

Author: Danai Koutra
Hanghang Tong
Leman Akoglu
Publication venue
Publication date: 28/04/2014
Field of study

Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field

arXiv.org e-Print Archive

CiteSeerX

Social, Structured and Semantic Search

Author: Bonaque Raphaël
Cautis Bogdan
Goasdoué François
Manolescu Ioana
Publication venue: HAL CCSD
Publication date: 15/03/2016
Field of study

International audienceSocial content such as blogs, tweets, news etc. is a rich source of interconnected information. We identify a set of requirements for the meaningful exploitation of such rich content, and present a new data model, called S3, which is the first to satisfy them. S3 captures social relationships between users, and between users and content, but also the structure present in rich social content, as well as its semantics. We provide the first top-k keyword search algorithm taking into account the social, structured, and semantic dimensions and formally establish its termination and correctness. Experiments on real social networks demonstrate the efficiency and qualitative advantage of our algorithm through the joint exploitation of the social, structured, and semantic dimensions of S3

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Polytechnique

HAL-Rennes 1

Alignment of multi-cultural knowledge repositories

Author: Boldyrev Natalia
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2017
Field of study

The ability to interconnect multiple knowledge repositories within a single framework is a key asset for various use cases such as document retrieval and question answering. However, independently created repositories are inherently heterogeneous, reflecting their diverse origins. Thus, there is a need to align concepts and entities across knowledge repositories. A limitation of prior work is the assumption of high afinity between the repositories at hand, in terms of structure and terminology. The goal of this dissertation is to develop methods for constructing and curating alignments between multi-cultural knowledge repositories. The first contribution is a system, ACROSS, for reducing the terminological gap between repositories. The second contribution is two alignment methods, LILIANA and SESAME, that cope with structural diversity. The third contribution, LAIKA, is an approach to compute alignments between dynamic repositories. Experiments with a suite ofWeb-scale knowledge repositories show high quality alignments. In addition, the application benefits of LILIANA and SESAME are demonstrated by use cases in search and exploration.Die Fähigkeit mehrere Wissensquellen in einer Anwendung miteinander zu verbinden ist ein wichtiger Bestandteil für verschiedene Anwendungsszenarien wie z.B. dem Auffinden von Dokumenten und der Beantwortung von Fragen. Unabhängig erstellte Datenquellen sind allerdings von Natur aus heterogen, was ihre unterschiedlichen Herkünfte widerspiegelt. Somit besteht ein Bedarf darin, die Konzepte und Entitäten zwischen den Wissensquellen anzugleichen. Frühere Arbeiten sind jedoch auf Datenquellen limitiert, die eine hohe Ähnlichkeit im Sinne von Struktur und Terminologie aufweisen. Das Ziel dieser Dissertation ist, Methoden für Aufbau und Pflege zum Angleich zwischen multikulturellen Wissensquellen zu entwickeln. Der erste Beitrag ist ein System names ACROSS, das auf die Reduzierung der terminologischen Kluft zwischen den Datenquellen abzielt. Der zweite Beitrag sind die Systeme LILIANA und SESAME, welche zum Angleich eben dieser Datenquellen unter Berücksichtigung deren struktureller Unterschiede dienen. Der dritte Beitrag ist ein Verfahren names LAIKA, das den Angleich dynamischer Quellen unterstützt. Unsere Experimente mit einer Reihe von Wissensquellen in Größenordnung des Web zeigen eine hohe Qualität unserer Verfahren. Zudem werden die Vorteile in der Verwendung von LILIANA und SESAME in Anwendungsszenarien für Suche und Exploration dargelegt

Universaar

Acronym

MPG.PuRe

Proceedings of the 2019 International Conference on Management of Data

Author
Publication venue
Publication date: 30/06/2019
Field of study

CWI's Institutional Repository

Welcome to Sigmod 2019 - The 2019 ACM SIGMOD International Conference on the Management of Data!

Author: Ailamaki A. (Anastasia)
Boncz P.A. (Peter)
Manegold S. (Stefan)
Publication venue
Publication date: 30/06/2019
Field of study

CWI's Institutional Repository

Hypergraph-based optimisations for scalable graph analytics and learning

Author: Haldar Aparajita
Publication venue
Publication date
Field of study

Graph-structured data has benefits of capturing inter-connectivity (topology) and hetero geneous knowledge (node/edge features) simultaneously. Hypergraphs may glean even more information reflecting complex non-pairwise relationships and additional metadata. Graph- and hypergraph-based partitioners can model workload or communication patterns of analytics and learning algorithms, enabling data-parallel scalability while preserving the solution quality. Hypergraph-based optimisations remain under-explored for graph neural networks (GNNs), which have complex access patterns compared to analytics workloads. Furthermore, special optimisations are needed when representing dynamic graph topologies and learning incrementally from streaming data. This thesis explores hypergraph-based optimisations for several scalable graph analytics and learning tasks. First, a hypergraph sampling approach is presented that supports large-scale dynamic graphs when modelling information cascades. Next, hypergraph partitioning is applied to scale approximate similarity search, by caching the computed features of replicated vertices. Moving from analytics to learning tasks, a data-parallel GNN training algorithm is developed using hypergraph-based construction and partitioning. Its communication scheme allows scalable distributed full-batch GNN training on static graphs. Sparse adja cency patterns are captured to perform non-blocking asynchronous communications for considerable speedups (10x single machine state-of-the-art baseline) in limited memory and bandwidth environments. Distributing GNNs using the hypergraph approach, compared to the graph approach, halves the running time and achieves 15% lower message volume. A new stochastic hypergraph sampling strategy further improves communication efficiency in distributed mini-batch GNN training. The final contribution is the design of streaming partitioners to handle dynamic data within a dataflow framework. This online partitioning pipeline allows complex graph or hypergraph streams to be processed asynchronously. It facilitates low latency distributed GNNs through replication and caching. Overall, the hypergraph-based optimisations in this thesis enable the development of scalable dynamic graph applications

Warwick Research Archives Portal Repository