15 research outputs found
Model Selection Criteria for Segmented Time Series from a Bayesian Approach to Information Compression
The principle that the simplest model capable of describing observed phenomena should also correspond to the best description has long been a guiding rule of inference. In this paper a Bayesian approach to formally implementing this principle is employed to develop model selection criteria for detecting structural change in financial and economic time series. Model selection criteria which allow for multiple structural breaks and which seek the optimal model order and parameter choices within regimes are derived. Comparative simulations against other popular information based model selection criteria are performed. Application of the derived criteria are also made to example financial and economic time series.Complexity theory; segmentation; break points; change points; model selection; model choice.
ΠΠ°ΡΡΠΎΡΡΠ²Π°Π½Π½Ρ ΠΌΠ΅ΡΠΎΠ΄ΡΠ² Π½Π΅Π»ΡΠ½ΡΠΉΠ½ΠΎΠ³ΠΎ Π°Π½Π°Π»ΡΠ·Ρ Π΄Π»Ρ ΠΏΠΎΠ±ΡΠ΄ΠΎΠ²ΠΈ ΡΠΈΡΡΠ΅ΠΌΠΈ ΠΌΠΎΠ½ΡΡΠΎΡΡΠ½Π³Ρ ΡΠΎΠ½Π΄ΠΎΠ²ΠΈΡ ΡΠΈΠ½ΠΊΡΠ²
ΠΠ΅ΡΠΎΡ Π΄Π°Π½ΠΎΡ ΡΠΎΠ±ΠΎΡΠΈ Ρ Π²ΠΈΡΡΡΠ΅Π½Π½Ρ ΠΏΡΠΎΠ±Π»Π΅ΠΌΠΈ Π·Π³Π»Π°Π΄ΠΆΡΠ²Π°Π½Π½Ρ ΠΌΡΡΠΈ Π»Π°ΠΌΡΠ½Π°ΡΠ½ΡΡΡΡ ΡΠ΅ΠΊΡΡΠ΅Π½ΡΠ½ΠΎΠ³ΠΎ ΠΊΡΠ»ΡΠΊΡΡΠ½ΠΎΠ³ΠΎ
Π°Π½Π°Π»ΡΠ·Ρ Π΄Π»Ρ ΠΏΠΎΠ±ΡΠ΄ΠΎΠ²ΠΈ ΡΠΈΡΡΠ΅ΠΌΠΈ ΠΌΠΎΠ½ΡΡΠΎΡΠΈΠ½Π³Ρ ΡΠΎΠ½Π΄ΠΎΠ²ΠΈΡ
ΡΠΈΠ½ΠΊΡΠ²
ΠΡΠΎΠ³Π½ΠΎΠ·ΠΈΡΠΎΠ²Π°Π½ΠΈΠ΅ ΡΡΡΠ΅ΡΡΠ²Π΅Π½Π½ΠΎ Π½Π΅ΡΡΠ°ΡΠΈΠΎΠ½Π°ΡΠ½ΡΡ ΠΌΠ½ΠΎΠ³ΠΎΡΠ°ΠΊΡΠΎΡΠ½ΡΡ Π²ΡΠ΅ΠΌΠ΅Π½Π½ΡΡ ΡΡΠ΄ΠΎΠ² Π½Π° ΠΏΡΠΈΠΌΠ΅ΡΠ΅ ΠΏΠΎΠΊΠ°Π·Π°ΡΠ΅Π»Ρ ΠΈΠ½Π²Π΅ΡΡΠΈΡΠΈΠΉ ΡΠΎΡΡΠΈΠΉΡΠΊΠΈΡ Π½Π΅Π±Π°Π½ΠΊΠΎΠ²ΡΠΊΠΈΡ ΠΊΠΎΡΠΏΠΎΡΠ°ΡΠΈΠΉ Π·Π° ΡΡΠ±Π΅ΠΆ
Π Π΄Π°Π½Π½ΠΎΠΉ ΡΠ°Π±ΠΎΡΠ΅ ΠΏΡΠ΅Π΄ΠΏΡΠΈΠ½ΠΈΠΌΠ°Π΅ΡΡΡ ΠΏΠΎΠΏΡΡΠΊΠ° ΡΠ°Π·ΡΠ°Π±ΠΎΡΠΊΠΈ Π°Π»Π³ΠΎΡΠΈΡΠΌΠ°, ΠΏΠΎΠ·Π²ΠΎΠ»ΡΡΡΠ΅Π³ΠΎ ΠΏΡΠΎΠ³Π½ΠΎΠ·ΠΈΡΠΎΠ²Π°ΡΡ
Π±ΡΠ΄ΡΡΠΈΠ΅ Π·Π½Π°ΡΠ΅Π½ΠΈΡ ΠΌΠ°ΠΊΡΠΎΡΠΊΠΎΠ½ΠΎΠΌΠΈΡΠ΅ΡΠΊΠΈΡ
ΠΏΠΎΠΊΠ°Π·Π°ΡΠ΅Π»Π΅ΠΉ, ΠΏΡΠΈΠ½ΠΈΠΌΠ°Ρ Π²ΠΎ Π²Π½ΠΈΠΌΠ°Π½ΠΈΠ΅ Π½Π΅ΡΡΠ°ΡΠΈΠΎΠ½Π°ΡΠ½ΠΎΡΡΡ ΠΏΡΠΎΡΠ΅ΡΡΠΎΠ²
ΠΏΡΠΈ ΠΈΠ·ΠΌΠ΅Π½Π΅Π½ΠΈΠΈ ΡΡΡΡΠΊΡΡΡΡ ΠΌΠΎΠ΄Π΅Π»ΠΈ Π½Π° ΠΏΡΠΈΠΌΠ΅ΡΠ΅ ΠΏΠΎΠΊΠ°Π·Π°ΡΠ΅Π»Ρ ΠΎΠ±ΡΠ΅ΠΌΠ° ΠΈΠ½Π²Π΅ΡΡΠΈΡΠΈΠΉ ΡΠΎΡΡΠΈΠΉΡΠΊΠΈΡ
Π½Π΅Π±Π°Π½ΠΊΠΎΠ²ΡΠΊΠΈΡ
ΠΊΠΎΡΠΏΠΎΡΠ°ΡΠΈΠΉ Π·Π° ΡΡΠ±Π΅ΠΆ
Stability Selection of the Number of Clusters
Selecting the number of clusters is one of the greatest challenges in clustering analysis. In this thesis, we propose a variety of stability selection criteria based on cross validation for determining the number of clusters. Clustering stability measures the agreement of clusterings obtained by applying the same clustering algorithm on multiple independent and identically distributed samples. We propose to measure the clustering stability by the correlation between two clustering functions. These criteria are motivated by the concept of clustering instability proposed by Wang (2010), which is based on a form of clustering distance. In addition, the effectiveness and robustness of the proposed methods are numerically demonstrated on a variety of simulated and real world samples
A spectroscopy of texts for effective clustering
For many clustering algorithms, such as k-means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images or biological data. The fundamental question this paper addresses is: ldquoHow can we effectively estimate the natural number of clusters in a given text collection?rdquo. We propose to use spectral analysis, which analyzes the eigenvalues (not eigenvectors) of the collection, as the solution to the above. We first present the relationship between a text collection and its underlying spectra. We then show how the answer to this question enhances the clustering process. Finally, we conclude with empirical results and related work.<br /
Behavioural motifs of larval Drosophila melanogaster and Caenorhabditis elegans
I present a novel method for the unsupervised discovery of behavioural motifs in larval
Drosophila melanogaster and Caenorhabditis elegans. Most current approaches to
behavioural annotation suffer from the requirement of training data. As a result, automated
programs carry the same observational biases as the humans who have annotated
the data. The key novel element of my work is that it does not require training data;
rather, behavioural motifs are discovered from the data itself. The method is based on
an eigenshape representation of posture. Hence, my approach is called the eigenshape
annotator (ESA).
First, I examine the annotation consistency for a specific behaviour, the Omega turn
of C. elegans, and find significant inconsistency in both expert annotation and the various
Omega turn detection algorithms. This finding highlights the need for unbiased
tools to study behaviour.
A behavioural motif is defined as a particular sequence of postures that recurs frequently.
In ESA, posture is represented by an eigenshape time series, and motifs are
discovered in this representation. To find motifs, the time series is segmented, and the
resulting segments are then clustered. The result is a set of self-similar time series
segments, i.e. motifs. The advantage of this novel framework over the popular sliding
windows approaches is twofold. First, it does not rely on the βclosest neighboursβ definition
of motifs, by which every motif has exactly two instances. Second, it does not
require the assumption of exactly equal length for motifs of the same class.
Behavioural motifs discovered using the segmentation-clustering framework are
used as the basis of the ESA annotator. ESA is fully probabilistic, therefore avoiding
rigid threshold values and allowing classification uncertainty to be quantified. I apply
eigenshape annotation to both larval Drosophila and C. elegans, and produce a close
match to hand annotation of behavioural states. However, many behavioural events
cannot be unambiguously classified. By comparing the results to eigenshape annotation
of an artificial agentβs behaviour, I argue that the ambiguity is due to greater
continuity between behavioural states than is generally assumed for these organisms