Search CORE

304 research outputs found

Explicit probabilistic models for databases and networks

Author: De Bie Tijl
Publication venue
Publication date: 01/01/2009
Field of study

Recent work in data mining and related areas has highlighted the importance of the statistical assessment of data mining results. Crucial to this endeavour is the choice of a non-trivial null model for the data, to which the found patterns can be contrasted. The most influential null models proposed so far are defined in terms of invariants of the null distribution. Such null models can be used by computation intensive randomization approaches in estimating the statistical significance of data mining results. Here, we introduce a methodology to construct non-trivial probabilistic models based on the maximum entropy (MaxEnt) principle. We show how MaxEnt models allow for the natural incorporation of prior information. Furthermore, they satisfy a number of desirable properties of previously introduced randomization approaches. Lastly, they also have the benefit that they can be represented explicitly. We argue that our approach can be used for a variety of data types. However, for concreteness, we have chosen to demonstrate it in particular for databases and networks.Comment: Submitte

arXiv.org e-Print Archive

Explore Bristol Research

Formalising the subjective interestingness of a linear projection of a data set : two examples

Author: De Bie Tijl
Publication venue
Publication date: 01/01/2014
Field of study

Ghent University Academic Bibliography

Conditional network embeddings

Author: De Bie Tijl
Kang Bo
Lijffijt Jefrey
Publication venue
Publication date: 01/01/2019
Field of study

Network Embeddings (NEs) map the nodes of a given network into

d

-dimensional Euclidean space

\mathbb{R}^d

. Ideally, this mapping is such that 'similar' nodes are mapped onto nearby points, such that the NE can be used for purposes such as link prediction (if 'similar' means being 'more likely to be connected') or classification (if 'similar' means 'being more likely to have the same label'). In recent years various methods for NE have been introduced, all following a similar strategy: defining a notion of similarity between nodes (typically some distance measure within the network), a distance measure in the embedding space, and a loss function that penalizes large distances for similar nodes and small distances for dissimilar nodes. A difficulty faced by existing methods is that certain networks are fundamentally hard to embed due to their structural properties: (approximate) multipartiteness, certain degree distributions, assortativity, etc. To overcome this, we introduce a conceptual innovation to the NE literature and propose to create \emph{Conditional Network Embeddings} (CNEs); embeddings that maximally add information with respect to given structural properties (e.g. node degrees, block densities, etc.). We use a simple Bayesian approach to achieve this, and propose a block stochastic gradient descent algorithm for fitting it efficiently. We demonstrate that CNEs are superior for link prediction and multi-label classification when compared to state-of-the-art methods, and this without adding significant mathematical or computational complexity. Finally, we illustrate the potential of CNE for network visualization

Ghent University Academic Bibliography

ExplaiNE: An Approach for Explaining Network Embedding-based Link Predictions

Author: De Bie Tijl
Kang Bo
Lijffijt Jefrey
Publication venue
Publication date: 01/01/2019
Field of study

Networks are powerful data structures, but are challenging to work with for conventional machine learning methods. Network Embedding (NE) methods attempt to resolve this by learning vector representations for the nodes, for subsequent use in downstream machine learning tasks. Link Prediction (LP) is one such downstream machine learning task that is an important use case and popular benchmark for NE methods. Unfortunately, while NE methods perform exceedingly well at this task, they are lacking in transparency as compared to simpler LP approaches. We introduce ExplaiNE, an approach to offer counterfactual explanations for NE-based LP methods, by identifying existing links in the network that explain the predicted links. ExplaiNE is applicable to a broad class of NE algorithms. An extensive empirical evaluation for the NE method `Conditional Network Embedding' in particular demonstrates its accuracy and scalability

arXiv.org e-Print Archive

Ghent University Academic Bibliography

The boundary coefficient : a vertex measure for visualizing and finding structure in weighted graphs

Author: De Bie Tijl
Saeys Yvan
Vandaele Robin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Ghent University Academic Bibliography

Quantifying and minimizing risk of conflict in social networks

Author: Chen Xi
De Bie Tijl
Lijffijt Jefrey
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Controversy, disagreement, conflict, polarization and opinion divergence in social networks have been the subject of much recent research. In particular, researchers have addressed the question of how such concepts can be quantified given people’s prior opinions, and how they can be optimized by influencing the opinion of a small number of people or by editing the network’s connectivity. Here, rather than optimizing such concepts given a specific set of prior opinions, we study whether they can be optimized in the average case and in the worst case over all sets of prior opinions. In particular, we derive the worst-case and average-case conflict risk of networks, and we propose algorithms for optimizing these. For some measures of conflict, these are non-convex optimization problems with many local minima. We provide a theoretical and empirical analysis of the nature of some of these local minima, and show how they are related to existing organizational structures. Empirical results show how a small number of edits quickly decreases its conflict risk, both average-case and worst-case. Furthermore, it shows that minimizing average-case conflict risk often does not reduce worst-case conflict risk. Minimizing worst-case conflict risk on the other hand, while computationally more challenging, is generally effective at minimizing both worst-case as well as average-case conflict risk

Crossref

Ghent University Academic Bibliography

Benchmarking Network Embedding Models for Link Prediction: Are We Making Progress?

Author: De Bie Tijl
Lijffijt Jefrey
Mara Alexandru
Publication venue
Publication date: 01/01/2020
Field of study

Network embedding methods map a network's nodes to vectors in an embedding space, in such a way that these representations are useful for estimating some notion of similarity or proximity between pairs of nodes in the network. The quality of these node representations is then showcased through results of downstream prediction tasks. Commonly used benchmark tasks such as link prediction, however, present complex evaluation pipelines and an abundance of design choices. This, together with a lack of standardized evaluation setups can obscure the real progress in the field. In this paper, we aim to shed light on the state-of-the-art of network embedding methods for link prediction and show, using a consistent evaluation pipeline, that only thin progress has been made over the last years. The newly conducted benchmark that we present here, including 17 embedding methods, also shows that many approaches are outperformed even by simple heuristics. Finally, we argue that standardized evaluation tools can repair this situation and boost future progress in this field

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

Subjectively interesting connecting trees

Author: Adriaens Florian
De Bie Tijl
Lijffijt Jefrey
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

Ghent University Academic Bibliography

DeBayes : a Bayesian method for debiasing network embeddings

Author: Buyl Maarten
De Bie Tijl
Publication venue
Publication date: 01/01/2020
Field of study

As machine learning algorithms are increasingly deployed for high-impact automated decision making, ethical and increasingly also legal standards demand that they treat all individuals fairly, without discrimination based on their age, gender, race or other sensitive traits. In recent years much progress has been made on ensuring fairness and reducing bias in standard machine learning settings. Yet, for network embedding, with applications in vulnerable domains ranging from social network analysis to recommender systems, current options remain limited both in number and performance. We thus propose DeBayes: a conceptually elegant Bayesian method that is capable of learning debiased embeddings by using a biased prior. Our experiments show that these representations can then be used to perform link prediction that is significantly more fair in terms of popular metrics such as demographic parity and equalized opportunity

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Inherent Limitations of AI Fairness

Author: Buyl Maarten
De Bie Tijl
Publication venue
Publication date: 13/12/2022
Field of study

As the real-world impact of Artificial Intelligence (AI) systems has been steadily growing, so too have these systems come under increasing scrutiny. In particular, the study of AI fairness has rapidly developed into a rich field of research with links to computer science, social science, law, and philosophy. Though many technical solutions for measuring and achieving AI fairness have been proposed, their model of AI fairness has been widely criticized in recent years for being misleading and unrealistic. In our paper, we survey these criticisms of AI fairness and identify key limitations that are inherent to the prototypical paradigm of AI fairness. By carefully outlining the extent to which technical solutions can realistically help in achieving AI fairness, we aim to provide readers with the background necessary to form a nuanced opinion on developments in the field of fair AI. This delineation also provides research opportunities for non-AI solutions peripheral to AI systems in supporting fair decision processes

arXiv.org e-Print Archive