54 research outputs found
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
Recommended from our members
On Applications of Relational Data
With the advances of technology and the popularity of the Internet, a large amount of data is being generated and collected. Much of these data is relational data, which describe how people and things, or entities, are related to one another. For example, data from sale transactions on e-commerce websites tell us which customers buy or view which products. Analyzing the known relationships from relational data can help us to discover knowledge that can benefit businesses, organizations, and our lives. For instance, learning the products that are commonly bought together allows businesses to recommend products to customers and increase their sales. Hidden or new relationships can also be inferred based on relational data. In addition, based on the connections among the entities, we can approximate the level of relatedness between two entities, even though their relationship may be hard to observe or quantify.
This research aims to explore novel applications of relational data that will help to improve our life in various aspects, such as improving business operations, improving experiences in using online services, and improving health care services. In applying relational data in any domain, there are two common challenges. First, the size of the data can be massive, but many applications require that results are obtained within a short time. Second, relational data are often noisy and incomplete. Many relationships are extracted automatically from text resources, and hence they are prone to errors. Our goal is not only to propose novel applications of relational data but also to develop techniques and algorithms that will facilitate and make such applications practical. This work addresses three novel applications of relational data. The first application is to use relational data to improve user experiences in online video sharing services. Second, we propose the use of relational data to find entities that are closely related to one another. Such problems arise in various domains, such as product recommendation and query suggestion. Third, we propose the use of relational data to assist medical practitioners in drug prescription. For these applications, we introduce several techniques and algorithms to address the aforementioned challenges in using relational data. Our approaches are evaluated extensively to demonstrate their effectiveness. The approaches proposed in this work not only can be used in the specific applications we discuss but also can help to facilitate and promote the use of relational data in other application domains
Graph based Anomaly Detection and Description: A Survey
Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the âwhyâ, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field
Social, Structured and Semantic Search
International audienceSocial content such as blogs, tweets, news etc. is a rich source of interconnected information. We identify a set of requirements for the meaningful exploitation of such rich content, and present a new data model, called S3, which is the first to satisfy them. S3 captures social relationships between users, and between users and content, but also the structure present in rich social content, as well as its semantics. We provide the first top-k keyword search algorithm taking into account the social, structured, and semantic dimensions and formally establish its termination and correctness. Experiments on real social networks demonstrate the efficiency and qualitative advantage of our algorithm through the joint exploitation of the social, structured, and semantic dimensions of S3
Alignment of multi-cultural knowledge repositories
The ability to interconnect multiple knowledge repositories within a single framework is a key asset for various use cases such as document retrieval and question answering. However, independently created repositories are inherently heterogeneous, reflecting their diverse origins. Thus, there is a need to align concepts and entities across knowledge repositories. A limitation of prior work is the assumption of high afinity between the repositories at hand, in terms of structure and terminology. The goal of this dissertation is to develop methods for constructing and curating alignments between multi-cultural knowledge repositories. The first contribution is a system, ACROSS, for reducing the terminological gap between repositories. The second contribution is two alignment methods, LILIANA and SESAME, that cope with structural diversity. The third contribution, LAIKA, is an approach to compute alignments between dynamic repositories. Experiments with a suite ofWeb-scale knowledge repositories show high quality alignments. In addition, the application benefits of LILIANA and SESAME are demonstrated by use cases in search and exploration.Die FĂ€higkeit mehrere Wissensquellen in einer Anwendung miteinander zu verbinden ist ein wichtiger Bestandteil fĂŒr verschiedene Anwendungsszenarien wie z.B. dem Auffinden von Dokumenten und der Beantwortung von Fragen. UnabhĂ€ngig erstellte Datenquellen sind allerdings von Natur aus heterogen, was ihre unterschiedlichen HerkĂŒnfte widerspiegelt. Somit besteht ein Bedarf darin, die Konzepte und EntitĂ€ten zwischen den Wissensquellen anzugleichen. FrĂŒhere Arbeiten sind jedoch auf Datenquellen limitiert, die eine hohe Ăhnlichkeit im Sinne von Struktur und Terminologie aufweisen. Das Ziel dieser Dissertation ist, Methoden fĂŒr Aufbau und Pflege zum Angleich zwischen multikulturellen Wissensquellen zu entwickeln. Der erste Beitrag ist ein System names ACROSS, das auf die Reduzierung der terminologischen Kluft zwischen den Datenquellen abzielt. Der zweite Beitrag sind die Systeme LILIANA und SESAME, welche zum Angleich eben dieser Datenquellen unter BerĂŒcksichtigung deren struktureller Unterschiede dienen. Der dritte Beitrag ist ein Verfahren names LAIKA, das den Angleich dynamischer Quellen unterstĂŒtzt. Unsere Experimente mit einer Reihe von Wissensquellen in GröĂenordnung des Web zeigen eine hohe QualitĂ€t unserer Verfahren. Zudem werden die Vorteile in der Verwendung von LILIANA und SESAME in Anwendungsszenarien fĂŒr Suche und Exploration dargelegt
Hypergraph-based optimisations for scalable graph analytics and learning
Graph-structured data has benefits of capturing inter-connectivity (topology) and hetero geneous knowledge (node/edge features) simultaneously. Hypergraphs may glean even more information reflecting complex non-pairwise relationships and additional metadata. Graph- and hypergraph-based partitioners can model workload or communication patterns of analytics and learning algorithms, enabling data-parallel scalability while preserving the solution quality. Hypergraph-based optimisations remain under-explored for graph neural networks (GNNs), which have complex access patterns compared to analytics workloads. Furthermore, special optimisations are needed when representing dynamic graph topologies and learning incrementally from streaming data. This thesis explores hypergraph-based optimisations for several scalable graph analytics and learning tasks. First, a hypergraph sampling approach is presented that supports large-scale dynamic graphs when modelling information cascades. Next, hypergraph partitioning is applied to scale approximate similarity search, by caching the computed features of replicated vertices. Moving from analytics to learning tasks, a data-parallel GNN training algorithm is developed using hypergraph-based construction and partitioning. Its communication scheme allows scalable distributed full-batch GNN training on static graphs. Sparse adja cency patterns are captured to perform non-blocking asynchronous communications for considerable speedups (10x single machine state-of-the-art baseline) in limited memory and bandwidth environments. Distributing GNNs using the hypergraph approach, compared to the graph approach, halves the running time and achieves 15% lower message volume. A new stochastic hypergraph sampling strategy further improves communication efficiency in distributed mini-batch GNN training. The final contribution is the design of streaming partitioners to handle dynamic data within a dataflow framework. This online partitioning pipeline allows complex graph or hypergraph streams to be processed asynchronously. It facilitates low latency distributed GNNs through replication and caching. Overall, the hypergraph-based optimisations in this thesis enable the development of scalable dynamic graph applications
- âŠ