35 research outputs found
Statistical modelling approaches with Bayesian tensor factorisations
We propose a flexible nonparametric Bayesian modelling of univariate and multivariate time series of count data based on conditional tensor factorisations. Our models can be viewed as infinite state space Markov chains of known maximal order with non-linear serial dependence or, with an introduction of appropriate latent variables, as a Bayesian hierarchical model with conditionally independent Poisson distributed observations. Inference about the important lags and their complex interactions is achieved via Markov chain Monte Carlo. When the observed counts are large, we deal with the resulting computational complexity of the model by performing an initial analysis in a training set of the data that is not used further in the inference and prediction. Our methodology is illustrated using simulation experiments and real-world data. Our Bayesian tensor factorisations model can have a good performance in inference and prediction on time series of count data that tends to be non-linear, and in the meanwhile, can deal with Markov chains of linear or log-linear count data. Moreover, our Bayesian tensor factorisations model can capture higher-order interactions among the lags and then, maximal orders, in time series where the actual order of Markov chain of count data and serial dependence are unknown
Beyond the arithmetic mean : extensions of spectral clustering and semi-supervised learning for signed and multilayer graphs via matrix power means
In this thesis we present extensions of spectral clustering and semi-supervised learning to signed and multilayer graphs. These extensions are based on a one-parameter family of matrix functions called Matrix Power Means. In the scalar case, this family has the arithmetic, geometric and harmonic means as particular cases. We study the effectivity of this family of matrix functions through suitable versions of the stochastic block model to signed and multilayer graphs. We provide provable properties in expectation and further identify regimes where the state of the art fails whereas our approach provably performs well. Some of the settings that we analyze are as follows: first, the case where each layer presents a reliable approximation to the overall clustering; second, the case when one single layer has information about the clusters whereas the remaining layers are potentially just noise; third, the case when each layer has only partial information but all together show global information about the underlying clustering structure. We present extensive numerical verifications of all our results and provide matrix-free numerical schemes. With these numerical schemes we are able to show that our proposed approach based on matrix power means is scalable to large sparse signed and multilayer graphs. Finally, we evaluate our methods in real world datasets. For instance, we show that our approach consistently identifies clustering structure in a real signed network where previous approaches failed. This further verifies that our methods are competitive to the state of the art.In dieser Arbeit stellen wir Erweiterungen von spektralem Clustering und teilüberwachtem Lernen auf signierte und mehrschichtige Graphen vor. Diese Erweiterungen basieren auf einer einparametrischen Familie von Matrixfunktionen, die Potenzmittel genannt werden. Im skalaren Fall hat diese Familie die arithmetischen, geometrischen und harmonischen Mittel als Spezialfälle. Wir untersuchen die Effektivität dieser Familie von Matrixfunktionen durch Versionen des stochastischen Blockmodells, die für signierte und mehrschichtige Graphen geeignet sind. Wir stellen beweisbare Eigenschaften vor und identifizieren darüber hinaus Situationen in denen neueste, gegenwärtig verwendete Methoden versagen, während unser Ansatz nachweislich gut abschneidet. Wir untersuchen unter anderem folgende Situationen: erstens den Fall, dass jede Schicht eine zuverlässige Approximation an die Gesamtclusterung darstellt; zweitens den Fall, dass eine einzelne Schicht Informationen über die Cluster hat, während die übrigen Schichten möglicherweise nur Rauschen sind; drittens den Fall, dass jede Schicht nur partielle Informationen hat, aber alle zusammen globale Informationen über die zugrunde liegende Clusterstruktur liefern. Wir präsentieren umfangreiche numerische Verifizierungen aller unserer Ergebnisse und stellen matrixfreie numerische Verfahren zur Verfügung. Mit diesen numerischen Methoden sind wir in der Lage zu zeigen, dass unser vorgeschlagener Ansatz, der auf Potenzmitteln basiert, auf große, dünnbesetzte signierte und mehrschichtige Graphen skalierbar ist. Schließlich evaluieren wir unsere Methoden an realen Datensätzen. Zum Beispiel zeigen wir, dass unser Ansatz konsistent Clustering-Strukturen in einem realen signierten Netzwerk identifiziert, wo frühere Ansätze versagten. Dies ist ein weiterer Nachweis, dass unsere Methoden konkurrenzfähig zu den aktuell verwendeten Methoden sind
Fragmentation Coagulation Based Mixed Membership Stochastic Blockmodel
The Mixed-Membership Stochastic Blockmodel~(MMSB) is proposed as one of the
state-of-the-art Bayesian relational methods suitable for learning the complex
hidden structure underlying the network data. However, the current formulation
of MMSB suffers from the following two issues: (1), the prior information~(e.g.
entities' community structural information) can not be well embedded in the
modelling; (2), community evolution can not be well described in the
literature. Therefore, we propose a non-parametric fragmentation coagulation
based Mixed Membership Stochastic Blockmodel (fcMMSB). Our model performs
entity-based clustering to capture the community information for entities and
linkage-based clustering to derive the group information for links
simultaneously. Besides, the proposed model infers the network structure and
models community evolution, manifested by appearances and disappearances of
communities, using the discrete fragmentation coagulation process (DFCP). By
integrating the community structure with the group compatibility matrix we
derive a generalized version of MMSB. An efficient Gibbs sampling scheme with
Polya Gamma (PG) approach is implemented for posterior inference. We validate
our model on synthetic and real world data.Comment: AAAI 202
Spectral clustering via adaptive layer aggregation for multi-layer networks
One of the fundamental problems in network analysis is detecting community
structure in multi-layer networks, of which each layer represents one type of
edge information among the nodes. We propose integrative spectral clustering
approaches based on effective convex layer aggregations. Our aggregation
methods are strongly motivated by a delicate asymptotic analysis of the
spectral embedding of weighted adjacency matrices and the downstream -means
clustering, in a challenging regime where community detection consistency is
impossible. In fact, the methods are shown to estimate the optimal convex
aggregation, which minimizes the mis-clustering error under some specialized
multi-layer network models. Our analysis further suggests that clustering using
Gaussian mixture models is generally superior to the commonly used -means in
spectral clustering. Extensive numerical studies demonstrate that our adaptive
aggregation techniques, together with Gaussian mixture model clustering, make
the new spectral clustering remarkably competitive compared to several
popularly used methods.Comment: 71 page
Algorithms, applications and systems towards interpretable pattern mining from multi-aspect data
How do humans move around in the urban space and how do they differ when the city undergoes terrorist attacks? How do users behave in Massive Open Online courses~(MOOCs) and how do they differ if some of them achieve certificates while some of them not? What areas in the court elite players, such as Stephen Curry, LeBron James, like to make their shots in the course of the game? How can we uncover the hidden habits that govern our online purchases? Are there unspoken agendas in how different states pass legislation of certain kinds? At the heart of these seemingly unconnected puzzles is this same mystery of multi-aspect mining, i.g., how can we mine and interpret the hidden pattern from a dataset that simultaneously reveals the associations, or changes of the associations, among various aspects of the data (e.g., a shot could be described with three aspects, player, time of the game, and area in the court)? Solving this problem could open gates to a deep understanding of underlying mechanisms for many real-world phenomena. While much of the research in multi-aspect mining contribute broad scope of innovations in the mining part, interpretation of patterns from the perspective of users (or domain experts) is often overlooked. Questions like what do they require for patterns, how good are the patterns, or how to read them, have barely been addressed. Without efficient and effective ways of involving users in the process of multi-aspect mining, the results are likely to lead to something difficult for them to comprehend.
This dissertation proposes the M^3 framework, which consists of multiplex pattern discovery, multifaceted pattern evaluation, and multipurpose pattern presentation, to tackle the challenges of multi-aspect pattern discovery. Based on this framework, we develop algorithms, applications, and analytic systems to enable interpretable pattern discovery from multi-aspect data. Following the concept of meaningful multiplex pattern discovery, we propose PairFac to close the gap between human information needs and naive mining optimization. We demonstrate its effectiveness in the context of impact discovery in the aftermath of urban disasters. We develop iDisc to target the crossing of multiplex pattern discovery with multifaceted pattern evaluation. iDisc meets the specific information need in understanding multi-level, contrastive behavior patterns. As an example, we use iDisc to predict student performance outcomes in Massive Open Online Courses given users' latent behaviors. FacIt is an interactive visual analytic system that sits at the intersection of all three components and enables for interpretable, fine-tunable, and scrutinizable pattern discovery from multi-aspect data. We demonstrate each work's significance and implications in its respective problem context. As a whole, this series of studies is an effort to instantiate the M^3 framework and push the field of multi-aspect mining towards a more human-centric process in real-world applications
Recommended from our members
Deep Probabilistic Graphical Modeling
Probabilistic graphical modeling (PGM) provides a framework for formulating an interpretable generative process of data and expressing uncertainty about unknowns. This makes PGM very useful for understanding phenomena underlying data and for decision making. PGM has seen great success in domains where interpretable inferences are key, e.g. marketing, medicine, neuroscience, and social science. However, PGM tends to lack flexibility, which has hindered its use when it comes to modeling large scale high-dimensional complex data and performing tasks that require flexibility (e.g. in vision and language applications.)
Deep learning (DL) is another framework for modeling and learning from data that has seen great empirical success in recent years. DL is very powerful and offers great flexibility, but it lacks the interpretability and calibration of PGM.
This thesis develops deep probabilistic graphical modeling (DPGM). DPGM consists in leveraging DL to make PGM more flexible. DPGM brings about new methods for learning from data that exhibit the advantages of both PGM and DL.
We use DL within PGM to build flexible models endowed with an interpretable latent structure. One family of models we develop extends exponential family principal component analysis (EF-PCA) using neural networks to improve predictive performance while enforcing the interpretability of the latent factors. Another model class we introduce enables accounting for long-term dependencies when modeling sequential data, which is a challenge when using purely DL or PGM approaches. This model class for sequential data was successfully applied to language modeling, unsupervised document representation learning for sentiment analysis, conversation modeling, and patient representation learning for hospital readmission prediction. Finally, DPGM successfully solves several outstanding problems of probabilistic topic models.
Leveraging DL within PGM also brings about new algorithms for learning with complex data. For example, we develop entropy-regularized adversarial learning, a learning paradigm that deviates from the traditional maximum likelihood approach used in PGM. From the DL perspective, entropy-regularized adversarial learning provides a solution to the long-standing mode collapse problem of generative adversarial networks
A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium
When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available
A Statistical Approach to the Alignment of fMRI Data
Multi-subject functional Magnetic Resonance Image studies are critical. The anatomical and functional structure varies across subjects, so the image alignment is necessary. We define a probabilistic model to describe functional alignment. Imposing a prior distribution, as the matrix Fisher Von Mises distribution, of the orthogonal transformation parameter, the anatomical information is embedded in the estimation of the parameters, i.e., penalizing the combination of spatially distant voxels. Real applications show an improvement in the classification and interpretability of the results compared to various functional alignment methods
Community detection with node attributes in multilayer networks
Community detection in networks is commonly performed using information about interactions between nodes. Recent advances have been made to incorporate multiple types of interactions, thus generalizing standard methods to multilayer networks. Often, though, one can access additional information regarding individual nodes, attributes, or covariates. A relevant question is thus how to properly incorporate this extra information in such frameworks. Here we develop a method that incorporates both the topology of interactions and node attributes to extract communities in multilayer networks. We propose a principled probabilistic method that does not assume any a priori correlation structure between attributes and communities but rather infers this from data. This leads to an efficient algorithmic implementation that exploits the sparsity of the dataset and can be used to perform several inference tasks; we provide an open-source implementation of the code online. We demonstrate our method on both synthetic and real-world data and compare performance with methods that do not use any attribute information. We find that including node information helps in predicting missing links or attributes. It also leads to more interpretable community structures and allows the quantification of the impact of the node attributes given in input