258 research outputs found
Exploring the segmentation space for the assessment of multiple change-point models
This paper addresses the retrospective or off-line multiple change-point detection problem. Methods for exploring the space of possible segmentations of a sequence for a fixed number of change points may be divided into two categories: (i) enumeration of segmentations, (ii) summary of the possible segmentations in change-point or segment profiles. Concerning the first category, a forward dynamic programming algorithm for computing the top L most probable segmentations and a forward-backward algorithm for sampling segmentations are derived. Concerning the second category, a forward-backward dynamic programming algorithm and a smoothing-type forward-backward algorithm for computing two types of change-point and segment profiles are derived. The proposed methods are mainly useful for exploring the space of possible segmentations for successive numbers of change points and provide a set of assessment tools for multiple change-point models. We show using examples that the proposed methods may help to compare alternative multiple change-point models (e.g. Gaussian model with piecewise constant variances or global variance), predict supplementary change points, highlight overestimation of the number of change points and summarize the uncertainty concerning the location of change points
Estimating hidden semi-Markov chains from discrete sequences.
International audienceThis article addresses the estimation of hidden semi-Markov chains from nonstationary discrete sequences. Hidden semi-Markov chains are particularly useful to model the succession of homogeneous zones or segments along sequences. A discrete hidden semi-Markov chain is composed of a nonobservable state process, which is a semi-Markov chain, and a discrete output process. Hidden semi-Markov chains generalize hidden Markov chains and enable the modeling of various durational structures. From an algorithmic point of view, a new forward-backward algorithm is proposed whose complexity is similar to that of the Viterbi algorithm in terms of sequence length (quadratic in the worst case in time and linear in space). This opens the way to the maximum likelihood estimation of hidden semi-Markov chains from long sequences. This statistical modeling approach is illustrated by the analysis of branching and flowering patterns in plants
Hidden hybrid Markov/semi-Markov chains.
http://www.sciencedirect.com/science?âb=IssueURL&_tockey=%23TOC%235880%232005%23999509996%23596026%23FLA%23&âuth=y&view=c&âcct=C000056834&_version=1&_urlVersion=0&_userid=2292769&md5=87e7f8be94f92a8574da566c600ce631International audienceModels that combine Markovian states with implicit geometric state occupancy distributions and semi-Markovian states with explicit state occupancy distributions, are investigated. This type of model retains the flexibility of hidden semi-Markov chains for the modeling of short or medium size homogeneous zones along sequences but also enables the modeling of long zones with Markovian states. The forward-backward algorithm, which in particular enables to implement efficiently the E-step of the EM algorithm, and the Viterbi algorithm for the restoration of the most likely state sequence are derived. It is also shown that macro-states, i.e. series-parallel networks of states with common observation distribution, are not a valid alternative to semi-Markovian states but may be useful at a more macroscopic level to combine Markovian states with semi-Markovian states. This statistical modeling approach is illustrated by the analysis of branching and flowering patterns in plants
Slope heuristics for multiple change-point models
International audienceWith regard to multiple change-point models, much effort has been devoted to the selection of the number of change points. But, the proposed approaches are either dedicated to specific segment models or give unsatisfactory results for short or medium length sequences. We propose to apply the slope heuristic, a recently proposed non-asymptotic penalized likelihood criterion, for selecting the number of change points. In particular we apply the data-driven slope estimation method, the key point being to define a relevant penalty shape. The proposed approach is illustrated using two benchmark data sets
Heuristique de pente pour les modÚles de détection de ruptures multiples
National audienceWith regard to the retrospective multiple change-point detection problem, much effort has been devoted in recent years to the selection of the number of change points. But, the proposed approaches are either dedicated to specific models (e.g. Gaussian change in the mean model) or give unsatisfactory results for short or medium length sequences. We propose to apply the slope heuristic, a recently proposed non-asymptotic penalized likelihood criterion, for selecting the number of change points. We in particular apply the data-driven slope estimation method, the key point being to define a relevant penalty shape. The proposed approach is illustrated using two benchmark data sets.En ce qui concerne la détection de ruptures multiples, la sélection du nombre de ruptures a fait l'objet ces derniÚres années de nombreux travaux. Mais les approches proposées sont soit dédiées a un modÚle particulier (par exemple modÚle gaussien de changement sur la moyenne) soit donnent des résultats peu satisfaisants sur des séquences de taille petite ou moyenne. Nous proposons ici d'appliquer l'heuristique de pente, un critÚre non-asymptotique de vraisemblance pénalisée récemment proposé, pour sélectionner le nombre de ruptures. Nous appliquons en particulier la méthode d'estimation de la pente dirigée par les données , le point clé étant de définir la forme de la pénalité. L'approche proposée est illustrée sur deux jeux de données de référence pour les modÚles de détection de ruptures multiples
A new specification of generalized linear models for categorical data
Regression models for categorical data are specified in heterogeneous ways.
We propose to unify the specification of such models. This allows us to define
the family of reference models for nominal data. We introduce the notion of
reversible models for ordinal data that distinguishes adjacent and cumulative
models from sequential ones. The combination of the proposed specification with
the definition of reference and reversible models and various invariance
properties leads to a new view of regression models for categorical data.Comment: 31 pages, 13 figure
Partitioned conditional generalized linear models for categorical data
In categorical data analysis, several regression models have been proposed
for hierarchically-structured response variables, e.g. the nested logit model.
But they have been formally defined for only two or three levels in the
hierarchy. Here, we introduce the class of partitioned conditional generalized
linear models (PCGLMs) defined for any numbers of levels. The hierarchical
structure of these models is fully specified by a partition tree of categories.
Using the genericity of the (r,F,Z) specification, the PCGLM can handle
nominal, ordinal but also partially-ordered response variables.Comment: 25 pages, 13 figure
- âŠ