46 research outputs found
Recommended from our members
Probabilistic models for classification of bioacoustic data
Probabilistic models have been successfully applied for a wide variety of problems, such as but not limited to information retrieval, computer vision, bio-informatics and speech processing. Probabilistic models allow us to encode our assumptions about the data in an elegant fashion and enable us to perform machine learning tasks such as classification and clustering in a principled manner. Probabilistic models for bio-acoustic data help in identifying interesting patterns in the data (for instance, the species-specific vocabulary), as well as species identification (classification) in recordings where the label is not available. The focus of this thesis is to develop efficient inference techniques for existing models, as well as develop probabilistic models tailored to bioacoustic data. First, we develop inference algorithms for the supervised latent Dirichlet allocation (LDA) model. We present collapsed variational Bayes, collapsed Gibbs sampling and maximum-a-posteriori (MAP) inference for parameter estimation and classification in supervised LDA. We provide an empirical evaluation of the trade-off between computational complexity and classification performance of the inference methods for supervised LDA, on audio classification (species identification in this context)as well as image classification and document classification tasks. Next, we present novel probabilistic models for bird sound recordings, that can capture temporal structure at different hierarchical levels, and model additional information such as the duration and frequency of vocalizations. We present a non-parametric density estimation technique for parameter estimation and show that the MAP classifier for our models can be interpreted as a weighted nearest neighbor classifier. We provide an experimental comparison between the proposed models and a support vector machine based approach, using bird sound recordings from the Cornell Macaulay library.Keywords: classification, species identification, probabilistic models, bioacoustic
Exchangeable Variable Models
A sequence of random variables is exchangeable if its joint distribution is
invariant under variable permutations. We introduce exchangeable variable
models (EVMs) as a novel class of probabilistic models whose basic building
blocks are partially exchangeable sequences, a generalization of exchangeable
sequences. We prove that a family of tractable EVMs is optimal under zero-one
loss for a large class of functions, including parity and threshold functions,
and strictly subsumes existing tractable independence-based model families.
Extensive experiments show that EVMs outperform state of the art classifiers
such as SVMs and probabilistic models which are solely based on independence
assumptions.Comment: ICML 201
Clustering of Symbolic Data based on Affinity Coefficient: Application to a Real Data Set
Copyright © 2013 Walter de Gruyter GmbH.In this paper, we illustrate an application of Ascendant Hierarchical Cluster Analysis (AHCA) to complex data taken from the literature (interval data), based on the standardized weighted generalized affinity coefficient, by the method of Wald and Wolfowitz. The probabilistic aggregation criteria used belong to a parametric family of methods under the probabilistic approach of AHCA, named VL methodology. Finally, we compare the results achieved using our approach with those obtained by other authors
A global Approach to the Comparison of Clustering Results
Copyright © 2012 Walter de Gruyter GmbH.The discovery of knowledge in the case of Hierarchical Cluster Analysis (HCA) depends on many factors, such as the clustering algorithms applied and the strategies developed in the initialstage of Cluster Analysis. We present a global approach for evaluating the quality of clustering results and making a comparison among different clustering algorithms using the relevant information available (e.g. the stability, isolation and homogeneity of the clusters). In addition, we present a visual method to facilitate evaluation of the quality of the partitions, allowing identification of the similarities and differences between partitions, as well as the behaviour of the elements in the partitions. We illustrate our approach using a complex and heterogeneous dataset (real horse data) taken from the literature. We apply HCA based on the generalized affinity coefficient (similarity coefficient) to the case of complex data (symbolic data), combined with 26 (classic and probabilistic) clustering algorithms. Finally, we discuss the obtained results and the contribution of this approach to gaining better knowledge of the structure of data
A violência doméstica na Região Autónoma dos Açores : estudo sócio-criminal
A presente edição resulta integralmente do projecto realizado entre Janeiro de 2009 e Fevereiro de 2010 pelo Centro de Estudos Sociais da Universidade dos Açores e intitulado Estudo Sócio-criminal sobre
a Violência Doméstica na Região autónoma dos Açores. Tratou-se de uma investigação financiada pelo Ministério da Administração Interna, através da Direcção-Geral de Administração Interna, que teve como
objectivo geral actualizar e aprofundar o quadro de referência do conhecimento sobre a violência doméstica na Região Autónoma dos Açores. O excepcional trabalho desenvolvido pela equipa de investigadores
coordenados pelas Professoras Gilberta Rocha e Piedade Lalanda veio a materializar-se num relatório final, cuja dimensão, como se antecipara, é insusceptível de publicação alargada. Assim, desde logo se admitiu
que esse relatório de pesquisa deveria ficar disponível em formato digital, para consulta através da web (no sítio da DGAI e da própria Universidade), e que uma versão mais sintética, bilingue (em Português e Inglês),
seria objecto de publicação em papel e posterior disseminação junto da comunidade científica e técnica, bem como junto das Forças de Segurança. Para efeitos desta publicação solicitou-se ao Doutor António Manuel
Marques, da Escola Superior de Saúde do Instituto Politécnico de Setúbal, a tarefa de sistematizar o excepcional estudo elaborado pelo Centro de Estudos Sociais da Universidade dos Açores, e de colaborar, conjuntamente com a DGAI, na edição da respectiva versão bilingue. Cabe agradecer a todos os que deram o seu melhor para a realização, quer da pesquisa, quer da presente edição, numa lógica de trabalho
colaborativo que importa estimular
Cluster Analysis of Business Data
This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.In this work, classical as well as probabilistic hierarchical clustering models are used to look for typologies of variables in classical data, typologies of groups of individuals in a classical three-way data table, and typologies of groups of individuals in a symbolic data table. The data are issued from a questionnaire on business area in order to evaluate the quality and satisfaction with the services provided to customers by an automobile company. The Ascendant Hierarchical Cluster Analysis (AHCA) is based, respectively, on the basic affinity coefficient and on extensions of this coefficient for the cases of a classical three-way data table and a symbolic data table, obtained from the weighted generalized affinity coefficient. The probabilistic aggregation criteria used, under the probabilistic approach named VL methodology (V for Validity, L for Linkage), resort essentially to probabilistic notions for the definition of the comparative functions. The validation of the obtained partitions is based on the global statistics of levels (STAT)
Quality evaluation of a selected partition : An approach based on resampling methods
The aim of this work on cluster analysis is to provide a methodology to analyse and assess the quality of a selected partition (the best partition according to several validation indexes). In the proposed approach, the evaluation of the stability and of the consistency of the results of the selected partition (original partition) was done using the comparison between this partition and each of the partitions (with the same number of clusters that the original one) obtained by resampling. A special emphasis is given to an index defined by linear combination of four indicators, which allows evaluating the adjustment between the original partition and each of the partitions (and / or set of obtained partitions) obtained from resampling data. The application of these indexes is exemplified using a set of real data, and the main conclusions are summarized and discussed.CICS.UAc/CICS.NOVA.UAc, UID/SOC/04647/2013, and this paper was produced with support from the FCT/MEC thru National Funds and when applied co-financed by the FEDER within the partnership agreement PT2020.info:eu-repo/semantics/publishedVersio
On clustering interval data with different scales of measures : experimental results
This article is is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Attribution-NonCommercial (CC BY-NC) license lets others remix, tweak, and build upon work non-commercially, and although the new works must also acknowledge & be non-commercial.Symbolic Data Analysis can be defined as the extension of standard data analysis to more complex data tables. We illustrate the application of the Ascendant Hierarchical Cluster Analysis (AHCA) to a symbolic data set (with a known structure) in the field of the automobile industry (car data set), in which objects are described by variables whose values are intervals of the real data set (interval variables). The AHCA of thirty-three car models, described by eight interval variables (with different scales of measure), was based on the standardized weighted generalized affinity coefficient, by the method of Wald and Wolfowitz. We applied three probabilistic aggregation criteria in the scope of the VL methodology (V for Validity, L for Linkage). Moreover, we compare the achieved results with those obtained by other authors, and with a priori partition into four clusters defined by the category (Utilitarian, Berlina, Sporting and Luxury) to which the car belong. We used the global statistics of levels (STAT) to evaluate the obtained partitions