31 research outputs found

    Discovering Clusters in Motion Time-Series Data

    Full text link
    A new approach is proposed for clustering time-series data. The approach can be used to discover groupings of similar object motions that were observed in a video collection. A finite mixture of hidden Markov models (HMMs) is fitted to the motion data using the expectation-maximization (EM) framework. Previous approaches for HMM-based clustering employ a k-means formulation, where each sequence is assigned to only a single HMM. In contrast, the formulation presented in this paper allows each sequence to belong to more than a single HMM with some probability, and the hard decision about the sequence class membership can be deferred until a later time when such a decision is required. Experiments with simulated data demonstrate the benefit of using this EM-based approach when there is more "overlap" in the processes generating the data. Experiments with real data show the promising potential of HMM-based motion clustering in a number of applications.Office of Naval Research (N000140310108, N000140110444); National Science Foundation (IIS-0208876, CAREER Award 0133825

    On the Feature Discovery for App Usage Prediction in Smartphones

    Full text link
    With the increasing number of mobile Apps developed, they are now closely integrated into daily life. In this paper, we develop a framework to predict mobile Apps that are most likely to be used regarding the current device status of a smartphone. Such an Apps usage prediction framework is a crucial prerequisite for fast App launching, intelligent user experience, and power management of smartphones. By analyzing real App usage log data, we discover two kinds of features: The Explicit Feature (EF) from sensing readings of built-in sensors, and the Implicit Feature (IF) from App usage relations. The IF feature is derived by constructing the proposed App Usage Graph (abbreviated as AUG) that models App usage transitions. In light of AUG, we are able to discover usage relations among Apps. Since users may have different usage behaviors on their smartphones, we further propose one personalized feature selection algorithm. We explore minimum description length (MDL) from the training data and select those features which need less length to describe the training data. The personalized feature selection can successfully reduce the log size and the prediction time. Finally, we adopt the kNN classification model to predict Apps usage. Note that through the features selected by the proposed personalized feature selection algorithm, we only need to keep these features, which in turn reduces the prediction time and avoids the curse of dimensionality when using the kNN classifier. We conduct a comprehensive experimental study based on a real mobile App usage dataset. The results demonstrate the effectiveness of the proposed framework and show the predictive capability for App usage prediction.Comment: 10 pages, 17 figures, ICDM 2013 short pape

    Minimum Description Length Principle in Discriminating Marginal Distributions

    Get PDF
    2010 Mathematics Subject Classification: 94A17, 62B10, 62F03.In this paper the MDL principle is explored in discriminating between a model with normal marginal distributions vs a model with Student-T marginal distributions. The shape complexity of a distribution is defined with insights from the closed-form solution for model complexity for normal distribution. An optimised numerical approach for the Student-T distribution is devised with the aim of extending it to the fat-tailed distributions commonly found in econometric time series

    MDL Convergence Speed for Bernoulli Sequences

    Get PDF
    The Minimum Description Length principle for online sequence estimation/prediction in a proper learning setup is studied. If the underlying model class is discrete, then the total expected square loss is a particularly interesting performance measure: (a) this quantity is finitely bounded, implying convergence with probability one, and (b) it additionally specifies the convergence speed. For MDL, in general one can only have loss bounds which are finite but exponentially larger than those for Bayes mixtures. We show that this is even the case if the model class contains only Bernoulli distributions. We derive a new upper bound on the prediction error for countable Bernoulli classes. This implies a small bound (comparable to the one for Bayes mixtures) for certain important model classes. We discuss the application to Machine Learning tasks such as classification and hypothesis testing, and generalization to countable classes of i.i.d. models.Comment: 28 page

    Correlating Integrative Complexity With System Modularity

    Get PDF
    Modularity is the degree to which a system is made up of relatively independent but interacting elements. Modularization is not necessarily a means of reducing intrinsic complexity of the system, but it is a means of effectively redistributing the total complexity across the system. High degree of modularization enable reductionist strategies of system development and is an effective mechanism for complexity redistribution that can be better managed by system developers by enabling design encapsulation. In this paper, we introduce a complexity attribution framework to enable consistent complexity accounting and management procedure and show that integrative complexity has a strong inverse relationship with system modularity and its implication on complexity management for engineered system design and development.Korea (South). Ministry of Education, Science and Technology (MEST) (National Research Foundation of Korea. NRF-2016R1D1A1A09916273

    Hierarchical modularity: Decomposition of function structures with the minimal description length principle

    Get PDF
    In engineering design and analysis, complex systems often need to be decomposed into a hierarchical combination of different simple subsystems. It's necessary to provide formal, computable methods to hierarchically decompose complex structures. Since graph structures are commonly used as modeling methods in engineering practice, this paper presents a method to hierarchically decompose graph structures. The Minimal Description Length (MDL) principle is introduced as a measure to compare different decompositions. The best hierarchical decomposition is searched by, evolutionary computation methods with newly defined crossover and mutation operators of tree structures. The results on abstract graph without attributes and a real function structure show that the technique is promising

    Integrative Complexity: An Alternative Measure for System Modularity

    Get PDF
    Complexity and modularity are important inherent properties of the system. Complexity is the property of the system that has to do with individual system elements and their connective relationship, while modularity is the degree to which a system is made up of relatively independent but interacting elements, with each module typically carrying an isolated set of functionality. Modularization is not necessarily a means of reducing intrinsic complexity of the system but is a mechanism for complexity redistribution that can be better managed by enabling design encapsulation. In this paper, the notion of integrative complexity (IC) is proposed, and the corresponding metric is proposed as an alternative metric for modularity from a complexity management viewpoint. It is also demonstrated using several engineered systems from different application d omains that there is a strong negative correlation between the IC and system modularity. This leads to the conclusion that the IC can be used as an alternative metric for modularity assessment of system architectures.Korea (South). Ministry of Education, Science and Technology (MEST) (National Research Foundation of Korea. Grant NRF2016R1D1A1A09916273

    Optimal reference sequence selection for genome assembly using minimum description length principle

    Get PDF
    Reference assisted assembly requires the use of a reference sequence, as a model, to assist in the assembly of the novel genome. The standard method for identifying the best reference sequence for the assembly of a novel genome aims at counting the number of reads that align to the reference sequence, and then choosing the reference sequence which has the highest number of reads aligning to it. This article explores the use of minimum description length (MDL) principle and its two variants, the two-part MDL and Sophisticated MDL, in identifying the optimal reference sequence for genome assembly. The article compares the MDL based proposed scheme with the standard method coming to the conclusion that “counting the number of reads of the novel genome present in the reference sequence” is not a sufficient condition. Therefore, the proposed MDL scheme includes within itself the standard method of “counting the number of reads that align to the reference sequence” and also moves forward towards looking at the model, the reference sequence, as well, in identifying the optimal reference sequence. The proposed MDL based scheme not only becomes the sufficient criterion for identifying the optimal reference sequence for genome assembly but also improves the reference sequence so that it becomes more suitable for the assembly of the novel genome
    corecore