100 research outputs found

    Enhancing the selection of a model-based clustering with external qualitative variables

    Get PDF
    In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which were not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a model and a number of clusters which both fit the data well and take advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion

    Sélection de modèle pour la classification en présence d'une classification externe

    Get PDF
    International audienceEn classification non supervisée de données, il est souvent utile d'interpréter les résultats de la classification cherchée en regard d'une partition des individus connue a priori et obtenue sur d'autres informations que les données disponibles. Nous proposons une approche fondée sur le modèle de mélange de lois qui permet de sélectionner un modèle de classification et un nombre de classes de sorte à produire une classification qui, à la fois, s'ajuste bien aux données et présente une bonne liaison avec la partition a priori. Cette approche utilise la vraisemblance intégrée jointe des données et des deux classification en jeu. Il est à noter que l'obtention de la classification ne fait intervenir la partition a priori que dans la phase de sélection d'un modèle et non dans la phase de construction de la classification qui se fait de manière classique par maximum de vraisemblance. Des illustrations seront données et le fait de dissocier les étapes d'estimation et de sélection d'un modèle sera discuté

    Test them all, is it worth it? Assessing configuration sampling on the JHipster Web development stack

    Get PDF
    Many approaches for testing configurable software systems start from the same assumption: it is impossible to test all configurations. This motivated the definition of variability-aware abstractions and sampling techniques to cope with large configuration spaces. Yet, there is no theoretical barrier that prevents the exhaustive testing of all configurations by simply enumerating them if the effort required to do so remains acceptable. Not only this: we believe there is a lot to be learned by systematically and exhaustively testing a configurable system. In this case study, we report on the first ever endeavour to test all possible configurations of the industry-strength, open source configurable software system JHipster, a popular code generator for web applications. We built a testing scaffold for the 26,000+ configurations of JHipster using a cluster of 80 machines during 4 nights for a total of 4,376 hours (182 days) CPU time. We find that 35.70% configurations fail and we identify the feature interactions that cause the errors. We show that sampling strategies (like dissimilarity and 2-wise): (1) are more effective to find faults than the 12 default configurations used in the JHipster continuous integration; (2) can be too costly and exceed the available testing budget. We cross this quantitative analysis with the qualitative assessment of JHipster’s lead developers.</p

    The demand for money in developing countries: Assessing the role of financial innovation

    Get PDF
    Traditional specifications of money demand have been commonly plagU4:!d by persistent overprediction, implausible parameter estimates, and highly autocorrelated errors. This paper argues that some of those problems stem from the failure to account for the impact of financial innovation. We estimate money demand for ten developing countries employing various proxies for the innovation process and provide an assessment of the relative importance of this variable. We find that financial innovation plays an important role in determining money demand and its fluctuations, and that the importance of this role increases with the rate of inflation.

    Combining Mixture Components for Clustering

    Get PDF
    International audienceModel-based clustering consists of fitting a mixture model to data and identifying each cluster with one of its components. Multivariate normal distributions are typically used. The number of clusters is usually determined from the data, often using BIC. In practice, however, individual clusters can be poorly fitted by Gaussian distributions, and in that case model-based clustering tends to represent one non-Gaussian cluster by a mixture of two or more Gaussian distributions. If the number of mixture components is interpreted as the number of clusters, this can lead to overestimation of the number of clusters. This is because BIC selects the number of mixture components needed to provide a good approximation to the density, rather than the number of clusters as such. We propose first selecting the total number of Gaussian mixture components, K, using BIC and then combining them hierarchically according to an entropy criterion. This yields a unique soft clustering for each number of clusters less than or equal to K; these clusterings can be compared on substantive grounds. We illustrate the method with simulated data and a flow cytometry dataset

    Test them all, is it worth it? Assessing configuration sampling on the JHipster Web development stack

    Get PDF
    Many approaches for testing configurable software systems start from the same assumption: it is impossible to test all configurations. This motivated the definition of variability-aware abstractions and sampling techniques to cope with large configuration spaces. Yet, there is no theoretical barrier that prevents the exhaustive testing of all configurations by simply enumerating them, if the effort required to do so remains acceptable. Not only this: we believe there is lots to be learned by systematically and exhaustively testing a configurable system. In this case study, we report on the first ever endeavour to test all possible configurations of an industry-strength, open source configurable software system, JHipster, a popular code generator for web applications. We built a testing scaffold for the 26,000+ configurations of JHipster using a cluster of 80 machines during 4 nights for a total of 4,376 hours (182 days) CPU time. We find that 35.70% configurations fail and we identify the feature interactions that cause the errors. We show that sampling strategies (like dissimilarity and 2-wise): (1) are more effective to find faults than the 12 default configurations used in the JHipster continuous integration; (2) can be too costly and exceed the available testing budget. We cross this quantitative analysis with the qualitative assessment of JHipster's lead developers.Comment: Submitted to Empirical Software Engineerin

    A Vision for Behavioural Model-Driven Validation of Software Product Lines

    Get PDF
    International audienceThe Software Product Lines (SPLs) paradigm promises faster development cycles and increased quality by systematically reusing software assets. This paradigm considers a family of systems, each of which can be obtained by a selection of features in a variability model. Though essential, providing Quality Assurance (QA) techniques for SPLs has long been perceived as a very difficult challenge due to the combinatorics induced by variability and for which very few techniques were available. Recently, important progress has been made by the model-checking and testing communities to address this QA challenge, in a very disparate way though. We present our vision for a unified framework combining model-checking and testing approaches applied to behavioural models of SPLs. Our vision relies on Featured Transition Systems (FTSs), an extension of transition systems supporting variability. This vision is also based on model-driven technologies to support practical SPL modelling and orchestrate various QA scenarios. We illustrate such scenarios on a vending machine SPL
    • …
    corecore