233 research outputs found
The use of positive and negative equivalence constraints in model-based clustering
Cluster analysis is a popular technique in statistics and computer science with the
objective to group similar observations into relatively distinct groups known as clusters. Semi-supervised
model-based clustering assumes that some additional information about group memberships is available
ClickClust: An R Package for Model-Based Clustering of Categorical Sequences
The R package ClickClust is a new piece of software devoted to finite mixture modeling and model-based clustering of categorical sequences. As a special kind of time series, categorical sequences, also known as categorical time series, exhibit a time-dependent nature and are traditionally modeled by means of Markov chains. Clustering categorical sequences is an important problem with multiple applications, but grouping sequences of sites or web-pages, also known as clickstreams, is one of the most well-known problems that helps discover common navigation patterns and routes taken by users. This popular application is recognized in the package title ClickClust. The paper discusses methodological and algorithmic foundations of the package based on finite mixtures of Markov models. The number of Markov chain states can often be large leading to high-dimensional transition probability matrices. The high number of model parameters can affect clustering performance severely. As a remedy to this problem, backward and forward selection algorithms are proposed for grouping states. This extends the original clustering problem to a biclustering framework. Among other capabilities of ClickClust, there are the estimation of the variance-covariance matrix corresponding to model parameter estimates, prediction of future states visited, and the construction of a display named click-plot that helps illustrate the obtained clustering solutions. All available functions and the utility of the package are thoroughly discussed and illustrated on multiple examples
Some theoretical contributions to the evaluation and assessment of finite mixture models with applications
This dissertation develops theory and methodology for the evaluation and assessment of finite mixture models. New methods for simulating finite mixture models satisfying a pre-specified level of complexity defined through the notion of pairwise overlap, are developed. Corresponding software is publicly available at CRAN. This dissertation also develops methodology for assessing significance in finite mixture models with applications to model-based unsupervised and semi-supervised clustering frameworks. The dissertation concludes with an application of finite mixture models to two-dimensional gel electrophoresis
Assessing Significance in Finite Mixture Models
A new method is proposed to quantify significance in finite mixture models. The basis for this new methodology is an approach that calculates the p-value for testing a simpler model against a more complicated one in a way that is able to obviate the failure of regularity conditions for likelihood ratio tests. The developed testing procedure allows for pairwise comparison of any two mixture models with failure to reject the null hypothesis implying insignificant likelihood improvement under the more complex model. This leads to a comprehensive tool called a quantitation map which displays significance and quantitatively summarizes all model comparisons. This map can be used, among other applications, to decide on the best among a set of candidate mixture models. The performance of the procedure is illustrated on some classification datasets and a comprehensive simulation study. The methodology is also applied to a study of voting preferences of senators in the 109th US Congress. Although the development of our testing strategy is based on large-sample theory, we note that it has impressive performance even in cases with moderate sample sizes
Finite mixture models and model-based clustering
Finite mixture models have a long history in statistics, having been used to model population heterogeneity, generalize distributional assumptions, and lately, for providing a convenient yet formal framework for clustering and classification. This paper provides a detailed review into mixture models and model-based clustering. Recent trends as well as open problems in the area are also discussed
CARP: Software for Fishing Out Good Clustering Algorithms
This paper presents the CLUSTERING ALGORITHMSβ REFEREE PACKAGE or CARP, an open source GNU GPL-licensed C package for evaluating clustering algorithms. Calibrating performance of such algorithms is important and CARP addresses this need by generating datasets of different clustering complexity and by assessing the performance of the concerned algorithm in terms of its ability to classify each dataset relative to the true grouping. This paper briefly describes the software and its capabilities
Π€ΠΎΡΠΌΠ°Π»ΠΈΠ·Π°ΡΠΈΡ ΠΏΡΠ΅Π΄ΠΌΠ΅ΡΠ½ΠΎΠΉ ΠΎΠ±Π»Π°ΡΡΠΈ ΡΠΏΡΠ°Π²Π»Π΅Π½ΠΈΡ ΠΈΠ·Π΄Π°ΡΠ΅Π»ΡΡΠΊΠΎ-ΠΏΠΎΠ»ΠΈΠ³ΡΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠΉ ΠΎΡΡΠ°ΡΠ»ΡΡ
ΠΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½Π° ΡΠ½ΡΠΎΡΠΌΠ°ΡΡΠΉΠ½ΠΎ-Π»ΠΎΠ³ΡΡΠ½Π° ΠΌΠΎΠ΄Π΅Π»Ρ Π²ΠΈΠ΄Π°Π²Π½ΠΈΡΠΎ-ΠΏΠΎΠ»ΡΠ³ΡΠ°ΡΡΡΠ½ΠΎΡ Π³Π°Π»ΡΠ·Ρ (ΠΠΠ), ΡΠΎ ΠΌΡΡΡΠΈΡΡ ΡΡ
Π΅ΠΌΡ ΡΠΎΡΠΌΡΠ²Π°Π½Π½Ρ Π±Π°Π½ΠΊΡ Π°Π½Π°Π»ΡΡΠΈΡΠ½ΠΈΡ
Π΄Π°Π½ΠΈΡ
ΠΠΠ ΡΠ° ΡΡΡΡΠΊΡΡΡΠΎΠ²Π°Π½Ρ Π±Π°Π·ΠΈ Π΄Π°Π½ΠΈΡ
.Presented information and the logical model of publishing and printing industry (PΠ I), which contains a scheme for generating analytical data bank PΠ I and structured databases.ΠΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½Π° ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΎΠ½Π½ΠΎ-Π»ΠΎΠ³ΠΈΡΠ΅ΡΠΊΠ°Ρ ΠΌΠΎΠ΄Π΅Π»Ρ ΠΈΠ·Π΄Π°ΡΠ΅Π»ΡΡΠΊΠΎ-ΠΏΠΎΠ»ΠΈΠ³ΡΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠΉ ΠΎΡΡΠ°ΡΠ»ΠΈ (ΠΠΠ), ΠΊΠΎΡΠΎΡΠ°Ρ ΡΠΎΠ΄Π΅ΡΠΆΠΈΡ ΡΡ
Π΅ΠΌΡ ΡΠΎΡΠΌΠΈΡΠΎΠ²Π°Π½ΠΈΡ Π±Π°Π½ΠΊΠ° Π°Π½Π°Π»ΠΈΡΠΈΡΠ΅ΡΠΊΠΈΡ
Π΄Π°Π½Π½ΡΡ
ΠΠΠ ΠΈ ΡΡΡΡΠΊΡΡΡΠΈΡΠΎΠ²Π°Π½Π½ΡΠ΅ Π±Π°Π·Ρ Π΄Π°Π½Π½ΡΡ
ΠΡΠ΅ΠΌΡ ΡΠ°Π±ΠΎΡΡ ΠΈ ΡΠ²Π΅ΡΡΠ΅Π½ΠΈΠΉ (ΠΊ 80-Π»Π΅ΡΠΈΡ Π£ΠΊΡΠ°ΠΈΠ½ΡΠΊΠΎΠΉ Π°ΠΊΠ°Π΄Π΅ΠΌΠΈΠΈ ΠΊΠ½ΠΈΠ³ΠΎΠΏΠ΅ΡΠ°ΡΠ°Π½ΠΈΡ)
ΠΠ° ΠΎΡΠ½ΠΎΠ²Ρ Π°Π½Π°Π»ΡΠ·Ρ ΠΎΠΏΡΠ°ΡΡΠΎΠ²Π°Π½ΠΈΡ
Π°ΡΡ
ΡΠ²Π½ΠΈΡ
Π΄ΠΆΠ΅ΡΠ΅Π», ΠΎΠΊΡΠ΅ΠΌΠΈΡ
Π²ΠΈΠ΄Π°Π½Ρ Ρ ΠΏΡΠ±Π»ΡΠΊΠ°ΡΡΠΉ Ρ ΠΏΠ΅ΡΡΠΎΠ΄ΠΈΡΠ½ΠΈΡ
ΡΠ° ΠΏΡΠΎΠ΄ΠΎΠ²ΠΆΡΠ²Π°Π½ΠΈΡ
Π²ΠΈΠ΄Π°Π½Π½ΡΡ
Π½Π°Π²Π΅Π΄Π΅Π½ΠΎ ΠΎΡΠ³Π°Π½ΡΠ·Π°ΡΡΠΉΠ½Ρ ΡΡΡΡΠΊΡΡΡΡ ΡΠ° ΡΠΏΠ΅ΡΡΠ°Π»ΡΠ½ΠΎΡΡΡ, Π·Π° ΡΠΊΠΈΠΌΠΈ ΡΠΏΡΠΎΠ΄ΠΎΠ²ΠΆ 1930-2010 ΡΡ. ΠΏΡΠΎΠ²ΠΎΠ΄ΠΈΠ»Π°ΡΡ ΠΏΡΠ΄Π³ΠΎΡΠΎΠ²ΠΊΠ° ΡΠΏΠ΅ΡΡΠ°Π»ΡΡΡΡΠ² Π² Π£ΠΊΡΠ°ΡΠ½ΡΡΠΊΡΠΉ Π°ΠΊΠ°Π΄Π΅ΠΌΡΡ Π΄ΡΡΠΊΠ°ΡΡΡΠ²Π°.On the basis of the analysis of the processed archival sources, separate editions and publications in periodic and continued editions the organizational structure and specialities behind which throughout 1930-2010 preparation of experts in the Ukrainian academy of publishing was spent is resulted.ΠΠ° ΠΎΡΠ½ΠΎΠ²Π΅ Π°Π½Π°Π»ΠΈΠ·Π° ΠΎΠ±ΡΠ°Π±ΠΎΡΠ°Π½Π½ΡΡ
Π°ΡΡ
ΠΈΠ²Π½ΡΡ
ΠΈΡΡΠΎΡΠ½ΠΈΠΊΠΎΠ², ΠΎΡΠ΄Π΅Π»ΡΠ½ΡΡ
ΠΈΠ·Π΄Π°Π½ΠΈΠΉ ΠΈ ΠΏΡΠ±Π»ΠΈΠΊΠ°ΡΠΈΠΉ Π² ΠΏΠ΅ΡΠΈΠΎΠ΄ΠΈΡΠ΅ΡΠΊΠΈΡ
ΠΈ ΠΏΡΠΎΠ΄ΠΎΠ»ΠΆΠ°Π΅ΠΌΡΡ
ΠΈΠ·Π΄Π°Π½ΠΈΡΡ
ΠΏΡΠΈΠ²Π΅Π΄Π΅Π½Π° ΠΎΡΠ³Π°Π½ΠΈΠ·Π°ΡΠΈΠΎΠ½Π½Π°Ρ ΡΡΡΡΠΊΡΡΡΠ° ΠΈ ΡΠΏΠ΅ΡΠΈΠ°Π»ΡΠ½ΠΎΡΡΠΈ, Π·Π° ΠΊΠΎΡΠΎΡΡΠΌΠΈ Π½Π° ΠΏΡΠΎΡΡΠΆΠ΅Π½ΠΈΠΈ 1930-2010 Π³Π³. ΠΏΡΠΎΠ²ΠΎΠ΄ΠΈΠ»Π°ΡΡ ΠΏΠΎΠ΄Π³ΠΎΡΠΎΠ²ΠΊΠ° ΡΠΏΠ΅ΡΠΈΠ°Π»ΠΈΡΡΠΎΠ² Π² Π£ΠΊΡΠ°ΠΈΠ½ΡΠΊΠΎΠΉ Π°ΠΊΠ°Π΄Π΅ΠΌΠΈΠΈ ΠΊΠ½ΠΈΠ³ΠΎΠΏΠ΅ΡΠ°ΡΠ°Π½ΠΈΡ
- β¦