Search CORE

4 research outputs found

Visualising Incomplete Data with Subset Multiple Correspondence Analysis

Author: Gardner-Lubbe Sugnet
Nienkemper-Swanepoel Johané
Roux Niël J. le
Publication venue
Publication date: 20/05/2021
Field of study

Determining the cause of missing values is a challenge, but an important task in order to select correct analysis techniques for missing data. This paper presents a new approach to identify the missing data mechanism (MDM) by applying cluster analysis to biplots of data having missing observations. Subset multiple correspondence analysis (sMCA) enables an isolated analysis of a chosen subset while preserving the scaffolding of the original data set. Multivariate categorical data sets are frequently represented in a coded dummy matrix, referred to as an indicator matrix. Additional category levels can be created for the indicator matrix to account for the unobserved information which has the advantage of not forfeiting any observed information. The extended indicator matrix easily partitions a data set into observed and unobserved subsets. sMCA biplots are used for the visual exploration of the subsets. Configurations of the incomplete subsets enable the recognition of non-response patterns which could aid in the identification of a particular MDM. The missing at random (MAR) MDM refers to missing responses that are dependent on the observed information and is expected to be identified by patterns and groupings occurring in the incomplete sMCA biplot. The missing completely at random (MCAR) MDMstates that all observations have the same probability of not being captured which could be identified by a random cloud of points in the incomplete sMCA biplot. The partitioning around mediods (pam) clustering technique is used to establish the number of available clusters in an incomplete sMCA biplot. A simulation study confirmed that there is a difference in the number of sufficient clusters that can by identified from MAR and MCAR simulated data sets. A real data set is also explored and the MDM is identified using the results of the simulation study as guidelines

KITopen

Recommended from our members

Properties of individual differences scaling and its interpretation

Author: Gardner-Lubbe Sugnet
Gower John C.
Le Roux Niël J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Indscal models consider symmetric matrices \[ B_k = XW_k X^' \] for \[ k = 1,..., K \] where \[ X:n \times R \] is a compromise matrix termed the group-average and

W_k

is a diagonal matrix of weights given by the

\textit{k}th

individual to the

R

, specified in advance, columns of

X

; non-negative weights are preferred and usually

R < n

. We propose a new two-phase alternating least squares (ALS) algorithm, which emphasizes the two main components (group average and weighting parameters) of the Indscal model and specifically helps with the interpretation of the model. Furthermore, it has thrown new light on the properties of the converged solution, that would be satisfied by any algorithm that minimizes the basic Indscal criterion: min\[ \sum_{k=1}^{K} || B_k - XW_k {X^'} ||^2 where the minimization is over

X

and the

W_k

. The new algorithm has also proved to be a useful tool in unravelling the algebraic understanding of the role played by parameter constraints and their interpretation in variants of the Indscal model. The proposed analysis focusses on Indscal but the approach may be of more widespread interest, especially in the field of multidimensional data analysis. A major issue is that simultaneous least-squares estimates of the parameters may be found without imposing constraints. However, group average and individual weighting parameters may not be estimated uniquely, without imposing some subjective constraint that could encourage misleading interpretations. We encourage the use of linear constraints \[ \sum_{k=1}^{K} 1'W_k = 1', as it enables a comparison of the weights obtained (i) within group

k

and (ii) between the same item drawn from two or more groups. However, it is easy to exchange one system of constraints to another in a post- or pre-analysis. The new two-phase ALS algorithm (i) computes for fixed \[ X:n \times R \] the weights

W_k

subject to \[ \sum_{k=1}^{K} 1'W_k = 1', and then (ii) keeping

W_k

fixed, it updates

X

. At convergence, the estimates of \[ X:n \times R \] and the

W_k

will apply to all algorithms that minimize the Indscal criterion. Furthermore, we show that only at convergence an analysis-of-variance property holds on the demarcation region between over- and under-fitting. When the analysis-of-variance is valid, its validity extends over the whole matrix domain, over trace operations, and to individual matrix elements. The optimization process is unusual in that optima and local optima occur on the edges of what seem to be closely related to Heywood cases in Factor analysis

Open Research Online (The Open University)

Flexible graphical assessment of experimental designs in R: The vdg package

Author: Coetzer R.L.J. (Roelof L. J.)
Le Roux Niël
Schoonees P.C. (Pieter C.)
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/01/2016
Field of study

textabstractThe R package vdg provides a flexible interface for producing various graphical summaries of the prediction variance associated with specific linear model specifications and experimental designs. These methods include variance dispersion graphs, fraction of design space plots and quantile plots which can assist in choosing between a catalogue of candidate experimental designs. Instead of restrictive optimization methods used in traditional software to explore design regions, vdg utilizes sampling methods to introduce more flexibility. The package takes advantage of R’s modern graphical abilities via ggplot2 (Wickham 2009), adds facilities for using a variety of distance methods, allows for more flexible model specifications and incorporates quantile regressions to help with model comparison

Erasmus University Digital Repository

Spline-based nonlinear biplots

Author: A Gifi
C Boor De
DJ Hand
I Borg
IT Jolliffe
JA Nelder
JB Kruskal
JC Gower
JC Gower
JC Gower
JC Gower
JC Gower
JC Gower
Niël J. Le Roux
Patrick J. F. Groenen
Sugnet Gardner-Lubbe
T Hastie
T Hastie
V Vapnik
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref