233 research outputs found

    The use of positive and negative equivalence constraints in model-based clustering

    Get PDF
    Cluster analysis is a popular technique in statistics and computer science with the objective to group similar observations into relatively distinct groups known as clusters. Semi-supervised model-based clustering assumes that some additional information about group memberships is available

    ClickClust: An R Package for Model-Based Clustering of Categorical Sequences

    Get PDF
    The R package ClickClust is a new piece of software devoted to finite mixture modeling and model-based clustering of categorical sequences. As a special kind of time series, categorical sequences, also known as categorical time series, exhibit a time-dependent nature and are traditionally modeled by means of Markov chains. Clustering categorical sequences is an important problem with multiple applications, but grouping sequences of sites or web-pages, also known as clickstreams, is one of the most well-known problems that helps discover common navigation patterns and routes taken by users. This popular application is recognized in the package title ClickClust. The paper discusses methodological and algorithmic foundations of the package based on finite mixtures of Markov models. The number of Markov chain states can often be large leading to high-dimensional transition probability matrices. The high number of model parameters can affect clustering performance severely. As a remedy to this problem, backward and forward selection algorithms are proposed for grouping states. This extends the original clustering problem to a biclustering framework. Among other capabilities of ClickClust, there are the estimation of the variance-covariance matrix corresponding to model parameter estimates, prediction of future states visited, and the construction of a display named click-plot that helps illustrate the obtained clustering solutions. All available functions and the utility of the package are thoroughly discussed and illustrated on multiple examples

    Some theoretical contributions to the evaluation and assessment of finite mixture models with applications

    Get PDF
    This dissertation develops theory and methodology for the evaluation and assessment of finite mixture models. New methods for simulating finite mixture models satisfying a pre-specified level of complexity defined through the notion of pairwise overlap, are developed. Corresponding software is publicly available at CRAN. This dissertation also develops methodology for assessing significance in finite mixture models with applications to model-based unsupervised and semi-supervised clustering frameworks. The dissertation concludes with an application of finite mixture models to two-dimensional gel electrophoresis

    Assessing Significance in Finite Mixture Models

    Get PDF
    A new method is proposed to quantify significance in finite mixture models. The basis for this new methodology is an approach that calculates the p-value for testing a simpler model against a more complicated one in a way that is able to obviate the failure of regularity conditions for likelihood ratio tests. The developed testing procedure allows for pairwise comparison of any two mixture models with failure to reject the null hypothesis implying insignificant likelihood improvement under the more complex model. This leads to a comprehensive tool called a quantitation map which displays significance and quantitatively summarizes all model comparisons. This map can be used, among other applications, to decide on the best among a set of candidate mixture models. The performance of the procedure is illustrated on some classification datasets and a comprehensive simulation study. The methodology is also applied to a study of voting preferences of senators in the 109th US Congress. Although the development of our testing strategy is based on large-sample theory, we note that it has impressive performance even in cases with moderate sample sizes

    Finite mixture models and model-based clustering

    Get PDF
    Finite mixture models have a long history in statistics, having been used to model population heterogeneity, generalize distributional assumptions, and lately, for providing a convenient yet formal framework for clustering and classification. This paper provides a detailed review into mixture models and model-based clustering. Recent trends as well as open problems in the area are also discussed

    CARP: Software for Fishing Out Good Clustering Algorithms

    Get PDF
    This paper presents the CLUSTERING ALGORITHMS’ REFEREE PACKAGE or CARP, an open source GNU GPL-licensed C package for evaluating clustering algorithms. Calibrating performance of such algorithms is important and CARP addresses this need by generating datasets of different clustering complexity and by assessing the performance of the concerned algorithm in terms of its ability to classify each dataset relative to the true grouping. This paper briefly describes the software and its capabilities

    Ѐормализация ΠΏΡ€Π΅Π΄ΠΌΠ΅Ρ‚Π½ΠΎΠΉ области управлСния ΠΈΠ·Π΄Π°Ρ‚Π΅Π»ΡŒΡΠΊΠΎ-полиграфичСской ΠΎΡ‚Ρ€Π°ΡΠ»ΡŒΡŽ

    Get PDF
    ΠŸΡ€Π΅Π΄ΡΡ‚Π°Π²Π»Π΅Π½Π° Ρ–Π½Ρ„ΠΎΡ€ΠΌΠ°Ρ†Ρ–ΠΉΠ½ΠΎ-Π»ΠΎΠ³Ρ–Ρ‡Π½Π° модСль Π²ΠΈΠ΄Π°Π²Π½ΠΈΡ‡ΠΎ-ΠΏΠΎΠ»Ρ–Π³Ρ€Π°Ρ„Ρ–Ρ‡Π½ΠΎΡ— Π³Π°Π»ΡƒΠ·Ρ– (Π’ΠŸΠ“), Ρ‰ΠΎ ΠΌΡ–ΡΡ‚ΠΈΡ‚ΡŒ схСму формування Π±Π°Π½ΠΊΡƒ Π°Π½Π°Π»Ρ–Ρ‚ΠΈΡ‡Π½ΠΈΡ… Π΄Π°Π½ΠΈΡ… Π’ΠŸΠ“ Ρ‚Π° структуровані Π±Π°Π·ΠΈ Π΄Π°Π½ΠΈΡ….Presented information and the logical model of publishing and printing industry (PΠ I), which contains a scheme for generating analytical data bank PΠ I and structured databases.ΠŸΡ€Π΅Π΄ΡΡ‚Π°Π²Π»Π΅Π½Π° ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΠΎΠ½Π½ΠΎ-логичСская модСль ΠΈΠ·Π΄Π°Ρ‚Π΅Π»ΡŒΡΠΊΠΎ-полиграфичСской отрасли (ИПО), которая содСрТит схСму формирования Π±Π°Π½ΠΊΠ° аналитичСских Π΄Π°Π½Π½Ρ‹Ρ… ИПО ΠΈ структурированныС Π±Π°Π·Ρ‹ Π΄Π°Π½Π½Ρ‹Ρ…

    ВрСмя Ρ€Π°Π±ΠΎΡ‚Ρ‹ ΠΈ ΡΠ²Π΅Ρ€ΡˆΠ΅Π½ΠΈΠΉ (ΠΊ 80-Π»Π΅Ρ‚ΠΈΡŽ Украинской Π°ΠΊΠ°Π΄Π΅ΠΌΠΈΠΈ книгопСчатания)

    Get PDF
    На основі Π°Π½Π°Π»Ρ–Π·Ρƒ ΠΎΠΏΡ€Π°Ρ†ΡŒΠΎΠ²Π°Π½ΠΈΡ… Π°Ρ€Ρ…Ρ–Π²Π½ΠΈΡ… Π΄ΠΆΠ΅Ρ€Π΅Π», ΠΎΠΊΡ€Π΅ΠΌΠΈΡ… видань Ρ– ΠΏΡƒΠ±Π»Ρ–ΠΊΠ°Ρ†Ρ–ΠΉ Ρƒ ΠΏΠ΅Ρ€Ρ–ΠΎΠ΄ΠΈΡ‡Π½ΠΈΡ… Ρ‚Π° ΠΏΡ€ΠΎΠ΄ΠΎΠ²ΠΆΡƒΠ²Π°Π½ΠΈΡ… виданнях Π½Π°Π²Π΅Π΄Π΅Π½ΠΎ ΠΎΡ€Π³Π°Π½Ρ–Π·Π°Ρ†Ρ–ΠΉΠ½Ρƒ структуру Ρ‚Π° ΡΠΏΠ΅Ρ†Ρ–Π°Π»ΡŒΠ½ΠΎΡΡ‚Ρ–, Π·Π° якими ΡƒΠΏΡ€ΠΎΠ΄ΠΎΠ²ΠΆ 1930-2010 Ρ€Ρ€. проводилася ΠΏΡ–Π΄Π³ΠΎΡ‚ΠΎΠ²ΠΊΠ° спСціалістів Π² Π£ΠΊΡ€Π°Ρ—Π½ΡΡŒΠΊΡ–ΠΉ Π°ΠΊΠ°Π΄Π΅ΠΌΡ–Ρ— друкарства.On the basis of the analysis of the processed archival sources, separate editions and publications in periodic and continued editions the organizational structure and specialities behind which throughout 1930-2010 preparation of experts in the Ukrainian academy of publishing was spent is resulted.На основС Π°Π½Π°Π»ΠΈΠ·Π° ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚Π°Π½Π½Ρ‹Ρ… Π°Ρ€Ρ…ΠΈΠ²Π½Ρ‹Ρ… источников, ΠΎΡ‚Π΄Π΅Π»ΡŒΠ½Ρ‹Ρ… ΠΈΠ·Π΄Π°Π½ΠΈΠΉ ΠΈ ΠΏΡƒΠ±Π»ΠΈΠΊΠ°Ρ†ΠΈΠΉ Π² пСриодичСских ΠΈ ΠΏΡ€ΠΎΠ΄ΠΎΠ»ΠΆΠ°Π΅ΠΌΡ‹Ρ… изданиях ΠΏΡ€ΠΈΠ²Π΅Π΄Π΅Π½Π° организационная структура ΠΈ ΡΠΏΠ΅Ρ†ΠΈΠ°Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ, Π·Π° ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΌΠΈ Π½Π° протяТСнии 1930-2010 Π³Π³. ΠΏΡ€ΠΎΠ²ΠΎΠ΄ΠΈΠ»Π°ΡΡŒ ΠΏΠΎΠ΄Π³ΠΎΡ‚ΠΎΠ²ΠΊΠ° спСциалистов Π² Украинской Π°ΠΊΠ°Π΄Π΅ΠΌΠΈΠΈ книгопСчатания
    • …
    corecore