Multivariate Outlier Mining Using Cluster Analysis: Case Study - National Health Interview Survey

Abstract

Outlier mining is a fundamental issue in many statistical analyses, especially in multivariate cases. Outliers may exert undue influence on outcomes of the analysis. In most cases, it is a big challenge to reveal the pattern of the outliers and the outlyingness . There are several approaches and methods to detect anomalous data points in data. But no single method is perfect for every data set especially when the data dimension and volume is high. In this thesis, I review distance-based clustering methods for multivariate outlier mining and demonstrate the usefulness of it in a medical setting. Specifically, I discuss Hierarchical clustering and the multivariate methods of determining appropriate cluster(s). After mining the multivariate outliers, I examine and describe the characteristics of the variables for those outliers. Finally, I demonstrate the application of these methods using the National Health Interview Survey (NHIS) 2008 database for the purposes of studying adolescent obesity

    Similar works