1,379 research outputs found
Bayesian stochastic blockmodeling
This chapter provides a self-contained introduction to the use of Bayesian
inference to extract large-scale modular structures from network data, based on
the stochastic blockmodel (SBM), as well as its degree-corrected and
overlapping generalizations. We focus on nonparametric formulations that allow
their inference in a manner that prevents overfitting, and enables model
selection. We discuss aspects of the choice of priors, in particular how to
avoid underfitting via increased Bayesian hierarchies, and we contrast the task
of sampling network partitions from the posterior distribution with finding the
single point estimate that maximizes it, while describing efficient algorithms
to perform either one. We also show how inferring the SBM can be used to
predict missing and spurious links, and shed light on the fundamental
limitations of the detectability of modular structures in networks.Comment: 44 pages, 16 figures. Code is freely available as part of graph-tool
at https://graph-tool.skewed.de . See also the HOWTO at
https://graph-tool.skewed.de/static/doc/demos/inference/inference.htm
Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain
The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio
Unsupervised robust nonparametric learning of hidden community properties
We consider learning of fundamental properties of communities in large noisy
networks, in the prototypical situation where the nodes or users are split into
two classes according to a binary property, e.g., according to their opinions
or preferences on a topic. For learning these properties, we propose a
nonparametric, unsupervised, and scalable graph scan procedure that is, in
addition, robust against a class of powerful adversaries. In our setup, one of
the communities can fall under the influence of a knowledgeable adversarial
leader, who knows the full network structure, has unlimited computational
resources and can completely foresee our planned actions on the network. We
prove strong consistency of our results in this setup with minimal assumptions.
In particular, the learning procedure estimates the baseline activity of normal
users asymptotically correctly with probability 1; the only assumption being
the existence of a single implicit community of asymptotically negligible
logarithmic size. We provide experiments on real and synthetic data to
illustrate the performance of our method, including examples with adversaries.Comment: Experiments with new types of adversaries adde
Proceedings of the Fifth Workshop on Information Theoretic Methods in Science and Engineering
These are the online proceedings of the Fifth Workshop on Information Theoretic Methods in Science and Engineering (WITMSE), which was held in the Trippenhuis, Amsterdam, in August 2012
Flexible estimation of temporal point processes and graphs
Handling complex data types with spatial structures, temporal dependencies, or discrete values, is generally a challenge in statistics and machine learning. In the recent years, there has been an increasing need of methodological and theoretical work to analyse non-standard data types, for instance, data collected on protein structures, genes interactions, social networks or physical sensors. In this thesis, I will propose a methodology and provide theoretical guarantees for analysing two general types of discrete data emerging from interactive phenomena, namely temporal point processes and graphs.
On the one hand, temporal point processes are stochastic processes used to model event data, i.e., data that comes as discrete points in time or space where some phenomenon occurs. Some of the most successful applications of these discrete processes include online messages, financial transactions, earthquake strikes, and neuronal spikes. The popularity of these processes notably comes from their ability to model unobserved interactions and dependencies between temporally and spatially distant events. However, statistical methods for point processes generally rely on estimating a latent, unobserved, stochastic intensity process. In this context, designing flexible models and consistent estimation methods is often a challenging task.
On the other hand, graphs are structures made of nodes (or agents) and edges (or links), where an edge represents an interaction or relationship between two nodes. Graphs are ubiquitous to model real-world social, transport, and mobility networks, where edges can correspond to virtual exchanges, physical connections between places, or migrations across geographical areas. Besides, graphs are used to represent correlations and lead-lag relationships between time series, and local dependence between random objects. Graphs are typical examples of non-Euclidean data, where adequate distance measures, similarity functions, and generative models need to be formalised. In the deep learning community, graphs have become particularly popular within the field of geometric deep learning.
Structure and dependence can both be modelled by temporal point processes and graphs, although predominantly, the former act on the temporal domain while the latter conceptualise spatial interactions. Nonetheless, some statistical models combine graphs and point processes in order to account for both spatial and temporal dependencies. For instance, temporal point processes have been used to model the birth times of edges and nodes in temporal graphs. Moreover, some multivariate point processes models have a latent graph parameter governing the pairwise causal relationships between the components of
the process. In this thesis, I will notably study such a model, called the Hawkes model, as well as graphs evolving in time.
This thesis aims at designing inference methods that provide flexibility in the contexts of temporal point processes and graphs. This manuscript is presented in an integrated format, with four main chapters and two appendices. Chapters 2 and 3 are dedicated to the study of Bayesian nonparametric inference methods in the generalised Hawkes point process model. While Chapter 2 provides theoretical guarantees for existing methods, Chapter 3 also proposes, analyses, and evaluates a novel variational Bayes methodology. The other main chapters introduce and study model-free inference approaches for two estimation problems on graphs, namely spectral methods for the signed graph clustering problem in Chapter 4, and a deep learning algorithm for the network change point detection task on temporal graphs in Chapter 5.
Additionally, Chapter 1 provides an introduction and background preliminaries on point processes and graphs. Chapter 6 concludes this thesis with a summary and critical thinking on the works in this manuscript, and proposals for future research. Finally, the appendices contain two supplementary papers. The first one, in Appendix A, initiated after the COVID-19 outbreak in March 2020, is an application of a discrete-time Hawkes model to COVID-related deaths counts during the first wave of the pandemic. The second work, in Appendix B, was conducted during an internship at Amazon Research in 2021, and proposes an explainability method for anomaly detection models acting on multivariate time series
- …