Search CORE

11,303 research outputs found

Outlier detection for multivariate categorical data

Author: Barnett
Besag
Box
Brown
Edmondson
Fernandez
Fienberg
Fuchs
Giron
Haberman
Holmes
Holmes
Hope
Kuhnt
Kuhnt
Mebane
Merriam
Mollie
Mosteller
Plummer M
Puig
Puig
Puig
Riba
Rousseeuw
Shahan
Simonoff
Stamatatatos
Yick
Zhao
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

This is an Accepted Manuscript of an article published by Taylor & Francis in “ Quality and Reliability Engineering International ” on 06th June 2018, available online: https://onlinelibrary.wiley.com/doi/abs/10.1002/qre.2339The detection of outlying rows in a contingency table is tackled from a Bayesian perspective, by adapting the framework adopted by Box and Tiao for normal models to multinomial models with random effects. The solution assumes a 2–component mixture model of 2 multinomial continuous mixtures for them, one for the nonoutlier rows and the second one for the outlier rows. The method starts by estimating the distributional characteristics of nonoutlier rows, and then it does cluster analysis to identify which rows belong to the outlier group and which do not. The method applies to any type of contingency table, and in particular, it could be used on the analysis of multivariate categorical control charts. Here, the use of the method is illustrated through a simulated example and by applying it to help identify heterogeneities of style among the acts in the plays of the First Folio edition of Shakespeare dramaPeer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

Author: Havinga P.J.M.
Meratnia N.
Zhang Yang
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

University of Twente Research Information

Sensitivity and robustness in MDS configurations for mixed-type data: a study of the economic crisis impact on socially vulnerable Spanish people

Author: Aurea Grané
Rosario Romera
Publication venue
Publication date
Field of study

Multidimensional scaling (MDS) techniques are initially proposed to produce pictorial representations of distance, dissimilarity or proximity data. Sensitivity and robustness assessment of multivariate methods is essential if inferences are to be drawn from the analysis. To our knowledge, the literature related to MDS for mixed-type data, including variables measured at continuous level besides categorical ones, is quite scarce. The main motivation of this work was to analyze the stability and robustness of MDS configurations as an extension of a previous study on a real data set, coming from a panel-type analysis designed to assess the economic crisis impact on Spanish people who were in situations of high risk of being socially excluded. The main contributions of the paper on the treatment of MDS configurations for mixed-type data are: (i) to propose a joint metric based on distance matrices computed for continuous, multi-scale categorical and/or binary variables, (ii) to introduce a systematic analysis on the sensitivity of MDS configurations and (iii) to present a systematic search for robustness and identification of outliers through a new procedure based on geometric variability notions.Gower distance, MDS configurations, Mixed-type data, Outliers identification, Related metric scaling, Survey data

Research Papers in Economics

Automatic Bayesian Density Analysis

Author: Ghahramani Zoubin
Kersting Kristian
Molina Alejandro
Peharz Robert
Valera Isabel
Vergari Antonio
Publication venue
Publication date: 01/01/2019
Field of study

Making sense of a dataset in an automatic and unsupervised fashion is a challenging problem in statistics and AI. Classical approaches for {exploratory data analysis} are usually not flexible enough to deal with the uncertainty inherent to real-world data: they are often restricted to fixed latent interaction models and homogeneous likelihoods; they are sensitive to missing, corrupt and anomalous data; moreover, their expressiveness generally comes at the price of intractable inference. As a result, supervision from statisticians is usually needed to find the right model for the data. However, since domain experts are not necessarily also experts in statistics, we propose Automatic Bayesian Density Analysis (ABDA) to make exploratory data analysis accessible at large. Specifically, ABDA allows for automatic and efficient missing value estimation, statistical data type and likelihood discovery, anomaly detection and dependency structure mining, on top of providing accurate density estimation. Extensive empirical evidence shows that ABDA is a suitable tool for automatic exploratory analysis of mixed continuous and discrete tabular data.Comment: In proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19

arXiv.org e-Print Archive

TUbiblio

Pure OAI Repository

MPG.PuRe

Association for the Advancement of Artificial Intelligence: AAAI Publications