Search CORE

58 research outputs found

Novel semi-metrics for multivariate change point analysis and anomaly detection

Author: Azizi Lamiae
Chan Jennifer
James Nick
Menzies Max
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

This paper proposes a new method for determining similarity and anomalies between time series, most practically effective in large collections of (likely related) time series, by measuring distances between structural breaks within such a collection. We introduce a class of \emph{semi-metric} distance measures, which we term \emph{MJ distances}. These semi-metrics provide an advantage over existing options such as the Hausdorff and Wasserstein metrics. We prove they have desirable properties, including better sensitivity to outliers, while experiments on simulated data demonstrate that they uncover similarity within collections of time series more effectively. Semi-metrics carry a potential disadvantage: without the triangle inequality, they may not satisfy a "transitivity property of closeness." We analyse this failure with proof and introduce an computational method to investigate, in which we demonstrate that our semi-metrics violate transitivity infrequently and mildly. Finally, we apply our methods to cryptocurrency and measles data, introducing a judicious application of eigenvalue analysis.Comment: Accepted manuscript. Minor edits since v2. Equal contribution from first two author

arXiv.org e-Print Archive

Sydney eScholarship

University of Melbourne Institutional Repository

Multilayer Networks for Text Analysis with Multiple Data Types

Author: Altmann Eduardo G.
Azizi Lamiae
Gerlach Martin
Hyland Charles C.
Peixoto Tiago P.
Tao Yuanming
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2021
Field of study

We are interested in the widespread problem of clustering documents and finding topics in large collections of written documents in the presence of metadata and hyperlinks. To tackle the challenge of accounting for these different types of datasets, we propose a novel framework based on Multilayer Networks and Stochastic Block Models. The main innovation of our approach over other techniques is that it applies the same non-parametric probabilistic framework to the different sources of datasets simultaneously. The key difference to other multilayer complex networks is the strong unbalance between the layers, with the average degree of different node types scaling differently with system size. We show that the latter observation is due to generic properties of text, such as Heaps' law, and strongly affects the inference of communities. We present and discuss the performance of our method in different datasets (hundreds of Wikipedia documents, thousands of scientific papers, and thousands of E-mails) showing that taking into account multiple types of information provides a more nuanced view on topic- and document-clusters and increases the ability to predict missing links.Comment: 17 pages, 6 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Spatial risk mapping for rare disease with hidden Markov fields and variational EM

Author: Abrial David
Azizi Lamiae
Charras-Garrido Myriam
Doyle Senan
Forbes Florence
Publication venue: INRIA
Publication date: 18/03/2011
Field of study

We recast the disease mapping issue of automatically classifying geographical units into risk classes as a clustering task using a discrete hidden Markov model and Poisson class-dependent distributions. The designed hidden Markov prior is non standard and consists of a variation of the Potts model where the interaction parameter can depend on the risk classes. The model parameters are estimated using an EM algorithm and the mean field approximation. This provides a way to face the intractability of the standard EM in this spatial context, with a computationally efficient alternative to more intensive simulation based Monte Carlo Markov Chain (MCMC) procedures. We then focus on the issue of dealing with very low risk values and small numbers of observed cases and population sizes. We address the problem of finding good initial parameter values in this context and develop a new initialization strategy appropriate for spatial Poisson mixtures in the case of not so well separated classes as encountered in animal disease risk analysis. Using both simulated and real data, we compare this strategy to other standard strategies and show that it performs well in a lot of situations.Nous abordons la cartographie automatique d' unités géographiques en classes de risque comme un problème de clustering à l'aide de modèles de Markov cachés discrets et de modèles de mélange de Poisson. Le modèle de Markov caché proposé est une variante du modèle de Potts, où le paramètre d'interaction dépend des classes de risque. Afin d'estimer les paramètres du modèle, nous utilisons l'algorithme EM combiné à une approche variationnelle champ-moyen. Cette approche nous permet d'appliquer l'algorithme EM dans un cadre spatial et présente une alternative efficace aux méthodes d'estimation basées sur des simulations intensives de type Markov chain Monte Carlo (MCMC). Nous abordons également les problèmes d'initialisation, spécialement quand les taux de risque sont petits (cas des maladies animales). Nous proposons une nouvelle stratégie d'initialisation appropriée aux modèles de mélange de Poisson quand les classes sont mal séparées. Pour illustrer notre méthodologie, nous présentons des résultats d'application sur des données épidémiologiques réelles et simulées et montrons la performance de la stratégie d'initialisation présentée en comparaison à celles utilisées usuellement

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Rennes 1

A semi-parametric approach to estimate risk functions associated with multi-dimensional exposure profiles: application to smoking and lung cancer

Author: A Lacourt
AE Gelfand
B Pesch
C Tarnaud
CE Antoniak
D Consonni
D Dahl
D Luce
David I Hastie
DI Ohlssen
H Ishwaran
H Ishwaran
H Ishwaran
H Zhang
Isabelle Stücker
J Molitor
J Peto
JH Lubin
JH Lubin
JS Liu
L Breiman
L Kaufman
Lamiae Azizi
M Abrahamowicz
M Kalli
M Papathomas
M Papathomas
MD Ritchie
P Papaspiliopoulos
PJ Green
R Goel
R Peto
RF MacLehose
SC Lemon
SG Walker
Silvia Liverani
SW Thurston
Sylvia Richardson
W Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/10/2013
Field of study

A common characteristic of environmental epidemiology is the multi-dimensional aspect of exposure patterns, frequently reduced to a cumulative exposure for simplicity of analysis. By adopting a flexible Bayesian clustering approach, we explore the risk function linking exposure history to disease. This approach is applied here to study the relationship between different smoking characteristics and lung cancer in the framework of a population based case control study

Crossref

Springer - Publisher Connector

HAL-Inserm

PubMed Central

Queen Mary Research Online

Brunel University Research Archive

HAL UVSQ

Champs aléatoires de Markov cachés pour la cartographie du risque en épidémiologie

Author: Azizi Lamiae
Publication venue: HAL CCSD
Publication date: 13/12/2011
Field of study

The analysis of the geographical variations of a disease and their representation on a mapis an important step in epidemiology. The goal is to identify homogeneous regions in termsof disease risk and to gain better insights into the mechanisms underlying the spread of thedisease. We recast the disease mapping issue of automatically classifying geographical unitsinto risk classes as a clustering task using a discrete hidden Markov model and Poisson classdependent distributions. The designed hidden Markov prior is non standard and consists of avariation of the Potts model where the interaction parameter can depend on the risk classes.The model parameters are estimated using an EM algorithm and the mean ﬁeld approximation. This provides a way to face the intractability of the standard EM in this spatial context,with a computationally efﬁcient alternative to more intensive simulation based Monte CarloMarkov Chain (MCMC) procedures.We then focus on the issue of dealing with very low risk values and small numbers of observedcases and population sizes. We address the problem of ﬁnding good initial parameter values inthis context and develop a new initialization strategy appropriate for spatial Poisson mixturesin the case of not so well separated classes as encountered in animal disease risk analysis.We illustrate the performance of the proposed methodology on some animal epidemiologicaldatasets provided by INRA.La cartographie du risque en épidémiologie permet de mettre en évidence des régionshomogènes en terme du risque aﬁn de mieux comprendre l’étiologie des maladies. Nousabordons la cartographie automatique d’unités géographiques en classes de risque commeun problème de classiﬁcation à l’aide de modèles de Markov cachés discrets et de modèlesde mélange de Poisson. Le modèle de Markov caché proposé est une variante du modèle dePotts, où le paramètre d’interaction dépend des classes de risque.Aﬁn d’estimer les paramètres du modèle, nous utilisons l’algorithme EM combiné à une approche variationnelle champ-moyen. Cette approche nous permet d’appliquer l’algorithmeEM dans un cadre spatial et présente une alternative efﬁcace aux méthodes d’estimation deMonte Carlo par chaîne de Markov (MCMC).Nous abordons également les problèmes d’initialisation, spécialement quand les taux de risquesont petits (cas des maladies animales). Nous proposons une nouvelle stratégie d’initialisationappropriée aux modèles de mélange de Poisson quand les classes sont mal séparées. Pourillustrer ces solutions proposées, nous présentons des résultats d’application sur des jeux dedonnées épidémiologiques animales fournis par l’INRA

Thèses en Ligne

Hal - Université Grenoble Alpes

Hidden Markov random fields for risk mapping in epidemiology

Author: Azizi Lamiae
Publication venue
Publication date: 13/12/2011
Field of study

La cartographie du risque en épidémiologie permet de mettre en évidence des régionshomogènes en terme du risque aﬁn de mieux comprendre l’étiologie des maladies. Nousabordons la cartographie automatique d’unités géographiques en classes de risque commeun problème de classiﬁcation à l’aide de modèles de Markov cachés discrets et de modèlesde mélange de Poisson. Le modèle de Markov caché proposé est une variante du modèle dePotts, où le paramètre d’interaction dépend des classes de risque.Aﬁn d’estimer les paramètres du modèle, nous utilisons l’algorithme EM combiné à une approche variationnelle champ-moyen. Cette approche nous permet d’appliquer l’algorithmeEM dans un cadre spatial et présente une alternative efﬁcace aux méthodes d’estimation deMonte Carlo par chaîne de Markov (MCMC).Nous abordons également les problèmes d’initialisation, spécialement quand les taux de risquesont petits (cas des maladies animales). Nous proposons une nouvelle stratégie d’initialisationappropriée aux modèles de mélange de Poisson quand les classes sont mal séparées. Pourillustrer ces solutions proposées, nous présentons des résultats d’application sur des jeux dedonnées épidémiologiques animales fournis par l’INRA.The analysis of the geographical variations of a disease and their representation on a mapis an important step in epidemiology. The goal is to identify homogeneous regions in termsof disease risk and to gain better insights into the mechanisms underlying the spread of thedisease. We recast the disease mapping issue of automatically classifying geographical unitsinto risk classes as a clustering task using a discrete hidden Markov model and Poisson classdependent distributions. The designed hidden Markov prior is non standard and consists of avariation of the Potts model where the interaction parameter can depend on the risk classes.The model parameters are estimated using an EM algorithm and the mean ﬁeld approximation. This provides a way to face the intractability of the standard EM in this spatial context,with a computationally efﬁcient alternative to more intensive simulation based Monte CarloMarkov Chain (MCMC) procedures.We then focus on the issue of dealing with very low risk values and small numbers of observedcases and population sizes. We address the problem of ﬁnding good initial parameter values inthis context and develop a new initialization strategy appropriate for spatial Poisson mixturesin the case of not so well separated classes as encountered in animal disease risk analysis.We illustrate the performance of the proposed methodology on some animal epidemiologicaldatasets provided by INRA

Theses.fr