    Exchangeable Variable Models

    A sequence of random variables is exchangeable if its joint distribution is invariant under variable permutations. We introduce exchangeable variable models (EVMs) as a novel class of probabilistic models whose basic building blocks are partially exchangeable sequences, a generalization of exchangeable sequences. We prove that a family of tractable EVMs is optimal under zero-one loss for a large class of functions, including parity and threshold functions, and strictly subsumes existing tractable independence-based model families. Extensive experiments show that EVMs outperform state of the art classifiers such as SVMs and probabilistic models which are solely based on independence assumptions.Comment: ICML 201

    Clustering of Symbolic Data based on Affinity Coefficient: Application to a Real Data Set

    Copyright © 2013 Walter de Gruyter GmbH.In this paper, we illustrate an application of Ascendant Hierarchical Cluster Analysis (AHCA) to complex data taken from the literature (interval data), based on the standardized weighted generalized affinity coefficient, by the method of Wald and Wolfowitz. The probabilistic aggregation criteria used belong to a parametric family of methods under the probabilistic approach of AHCA, named VL methodology. Finally, we compare the results achieved using our approach with those obtained by other authors

    A global Approach to the Comparison of Clustering Results

    Copyright © 2012 Walter de Gruyter GmbH.The discovery of knowledge in the case of Hierarchical Cluster Analysis (HCA) depends on many factors, such as the clustering algorithms applied and the strategies developed in the initialstage of Cluster Analysis. We present a global approach for evaluating the quality of clustering results and making a comparison among different clustering algorithms using the relevant information available (e.g. the stability, isolation and homogeneity of the clusters). In addition, we present a visual method to facilitate evaluation of the quality of the partitions, allowing identification of the similarities and differences between partitions, as well as the behaviour of the elements in the partitions. We illustrate our approach using a complex and heterogeneous dataset (real horse data) taken from the literature. We apply HCA based on the generalized affinity coefficient (similarity coefficient) to the case of complex data (symbolic data), combined with 26 (classic and probabilistic) clustering algorithms. Finally, we discuss the obtained results and the contribution of this approach to gaining better knowledge of the structure of data

    A violência doméstica na Região Autónoma dos Açores : estudo sócio-criminal

    A presente edição resulta integralmente do projecto realizado entre Janeiro de 2009 e Fevereiro de 2010 pelo Centro de Estudos Sociais da Universidade dos Açores e intitulado Estudo Sócio-criminal sobre a Violência Doméstica na Região autónoma dos Açores. Tratou-se de uma investigação financiada pelo Ministério da Administração Interna, através da Direcção-Geral de Administração Interna, que teve como objectivo geral actualizar e aprofundar o quadro de referência do conhecimento sobre a violência doméstica na Região Autónoma dos Açores. O excepcional trabalho desenvolvido pela equipa de investigadores coordenados pelas Professoras Gilberta Rocha e Piedade Lalanda veio a materializar-se num relatório final, cuja dimensão, como se antecipara, é insusceptível de publicação alargada. Assim, desde logo se admitiu que esse relatório de pesquisa deveria ficar disponível em formato digital, para consulta através da web (no sítio da DGAI e da própria Universidade), e que uma versão mais sintética, bilingue (em Português e Inglês), seria objecto de publicação em papel e posterior disseminação junto da comunidade científica e técnica, bem como junto das Forças de Segurança. Para efeitos desta publicação solicitou-se ao Doutor António Manuel Marques, da Escola Superior de Saúde do Instituto Politécnico de Setúbal, a tarefa de sistematizar o excepcional estudo elaborado pelo Centro de Estudos Sociais da Universidade dos Açores, e de colaborar, conjuntamente com a DGAI, na edição da respectiva versão bilingue. Cabe agradecer a todos os que deram o seu melhor para a realização, quer da pesquisa, quer da presente edição, numa lógica de trabalho cola­borativo que importa estimular

    Cluster Analysis of Business Data

    This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.In this work, classical as well as probabilistic hierarchical clustering models are used to look for typologies of variables in classical data, typologies of groups of individuals in a classical three-way data table, and typologies of groups of individuals in a symbolic data table. The data are issued from a questionnaire on business area in order to evaluate the quality and satisfaction with the services provided to customers by an automobile company. The Ascendant Hierarchical Cluster Analysis (AHCA) is based, respectively, on the basic affinity coefficient and on extensions of this coefficient for the cases of a classical three-way data table and a symbolic data table, obtained from the weighted generalized affinity coefficient. The probabilistic aggregation criteria used, under the probabilistic approach named VL methodology (V for Validity, L for Linkage), resort essentially to probabilistic notions for the definition of the comparative functions. The validation of the obtained partitions is based on the global statistics of levels (STAT)

    Quality evaluation of a selected partition : An approach based on resampling methods

    The aim of this work on cluster analysis is to provide a methodology to analyse and assess the quality of a selected partition (the best partition according to several validation indexes). In the proposed approach, the evaluation of the stability and of the consistency of the results of the selected partition (original partition) was done using the comparison between this partition and each of the partitions (with the same number of clusters that the original one) obtained by resampling. A special emphasis is given to an index defined by linear combination of four indicators, which allows evaluating the adjustment between the original partition and each of the partitions (and / or set of obtained partitions) obtained from resampling data. The application of these indexes is exemplified using a set of real data, and the main conclusions are summarized and discussed.CICS.UAc/CICS.NOVA.UAc, UID/SOC/04647/2013, and this paper was produced with support from the FCT/MEC thru National Funds and when applied co-financed by the FEDER within the partnership agreement PT2020.info:eu-repo/semantics/publishedVersio

    On clustering interval data with different scales of measures : experimental results

    This article is is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Attribution-NonCommercial (CC BY-NC) license lets others remix, tweak, and build upon work non-commercially, and although the new works must also acknowledge & be non-commercial.Symbolic Data Analysis can be defined as the extension of standard data analysis to more complex data tables. We illustrate the application of the Ascendant Hierarchical Cluster Analysis (AHCA) to a symbolic data set (with a known structure) in the field of the automobile industry (car data set), in which objects are described by variables whose values are intervals of the real data set (interval variables). The AHCA of thirty-three car models, described by eight interval variables (with different scales of measure), was based on the standardized weighted generalized affinity coefficient, by the method of Wald and Wolfowitz. We applied three probabilistic aggregation criteria in the scope of the VL methodology (V for Validity, L for Linkage). Moreover, we compare the achieved results with those obtained by other authors, and with a priori partition into four clusters defined by the category (Utilitarian, Berlina, Sporting and Luxury) to which the car belong. We used the global statistics of levels (STAT) to evaluate the obtained partitions