Multivariate Outlier Mining Using Cluster Analysis: Case Study - National Health Interview Survey

Sharker, Md Monir Hossain

Multivariate Outlier Mining Using Cluster Analysis: Case Study - National Health Interview Survey

Authors: Md Monir Hossain Sharker
Publication date: 1 January 2010
Publisher: Duquesne Scholarship Collection

Abstract

Outlier mining is a fundamental issue in many statistical analyses, especially in multivariate cases. Outliers may exert undue influence on outcomes of the analysis. In most cases, it is a big challenge to reveal the pattern of the outliers and the outlyingness . There are several approaches and methods to detect anomalous data points in data. But no single method is perfect for every data set especially when the data dimension and volume is high. In this thesis, I review distance-based clustering methods for multivariate outlier mining and demonstrate the usefulness of it in a medical setting. Specifically, I discuss Hierarchical clustering and the multivariate methods of determining appropriate cluster(s). After mining the multivariate outliers, I examine and describe the characteristics of the variables for those outliers. Finally, I demonstrate the application of these methods using the National Health Interview Survey (NHIS) 2008 database for the purposes of studying adolescent obesity

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Duquesne University: Digital Commons

oai:dsc.duq.edu:etd-2195

Last time updated on 30/10/2019