7 research outputs found

    A Geneaology of Correspondence Analysis: Part 2 - The Variants

    Get PDF
    In 2012, a comprehensive historical and genealogical discussion of correspondence analysis was published in Australian and New Zealand Journal of Statistics. That genealogy consisted of more than 270 key books and articles and focused on an historical development of the correspondence analysis,a statistical tool which provides the analyst with a visual inspection of the association between two or more categorical variables. In this new genealogy, we provide a brief overview of over 30 variants of correspondence analysis that now exist outside of the traditional approaches used to analysethe association between two or more categorical variables. It comprises of a bibliography of a more than 300 books and articles that were not included in the 2012 bibliography and highlights the growth in the development ofcorrespondence analysis across all areas of research

    Biplot logístico para datos nominales y ordinales

    Get PDF
    [ES]Hay numerosas técnicas adecuadas para trabajar con datos nominales, algunas de las cuales analizan el problema que supone este tipo de datos desde el punto de vista del Análisis Factorial cuyo objetivo es obtener factores latentes que expliquen la correlación entre las variables. Otras inciden en algunos tipos de aproximaciones no paramétricas para explorar las similaridades entre los individuos (Análisis de Coordenadas Principales (PCoA) o Escalamiento Multidimensional (MS)), pero existe una ausencia de técnicas exploratorias generales que permitan la representación simultánea de individuos y variables, excepto el Análisis de Correspondencias Múltiple (MCA), basado en la distancia chi-cuadrado, que no siempre es la más adecuada para describir similaridades entre individuos y correlaciones entre variables. Para datos binarios, Vicente-Villardón y col.[2006] proponen una representación basada en respuestas logísticas llamándolo "Biplot Logístico", que es lineal, y estudian en su investigación la geometría de este tipo de biplots. Cuando el conjunto de datos contiene variables nominales con más de dos categorias, los biplots lineales e incluso los biplots logísticos binarios no son adecuados. En esta tésis se resuelve este problema extendiendo el concepto anterior y se desarrolla lo que se ha denominado "Biplot Logístico Nominal (NLB)" como un procedimiento que por un lado reduce la dimensión del espacio de partida, y por otro se utiliza como una técnica exploratoria. Los biplots logísticos nominales representan las filas de la matriz de datos como puntos en una representación correspondiente a un espacio dimensión reducida(generalmente 2 ó 3) y las variables como regiones de predicción(polígonos convexos). La principal ventaja del NLB es que la interpretación del biplot se hace en términos de distancias, de tal forma que para cada individuo la categoría que se predice en una variable es la más cercana a él en el biplot. De esta forma, este tipo de biplots extienden tanto al Análisis de Correspondencias Múltiples como al Análisis de Respuesta Latente, en el sentido de que provee una representación gráfica para el LTA similar a la que se obtiene en MCA. Cuando los datos contienen variables ordinales, los biplots lineales, binarios o los logísticos nominales tampoco son adecuados, situación en la cuál, el Análisis de Componentes Principales Categórico (CATPCA) ó la IRT para variables ordinales serían propuestas más válidas. Lo que haremos es extender el concepto de biplot a aquellas situaciones en las que aparezcan este tipo de datos, resultando un método que llamaremos Biplot Logístico Ordinal (OLB). Las puntuaciones de las filas se calculan teniendo en cuenta el supuesto de que tengan superficies de respuesta logística ordinales sobre las dimensiones consideradas y los parámetros columna producen superficies de respuesta logística que, proyectadas sobre el espacio reducido por las puntuaciones de las filas definen un biplot lineal. Se utilizará un modelo de odds proporcionales, obteniendo asi un modelo multidimensional conocido como modelo de respuesta graduada en la literatura del IRT. Estudiaremos la geometría de tales representaciones e implementaremos algoritmos computacionales para la estimación de los parámetros y de las direcciones de la predicción. El OLB extiende tanto CATPCA como IRT puesto que ofrece una representación gráfica para IRT parecida al biplot correspondiente al CATPCA. Por último, si la matriz de datos presenta variables categóricas de cualquier tipo se han adaptado los algoritmos para construir el biplot teniendo en cuenta las características de cada variable y sus geometrías asociadas, de manera que la tésis cubre las representaciones de datos categóricos en su conjunto. La utilización de los procedimientos descritos es posible debido a la implementación de tres paquetes de R públicos que permiten analizar cada situación, los cuales se utilizan con distintos conjuntos de datos reales en este estudio

    Weighted Euclidean biplots

    No full text
    We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, the method leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots

    Weighted Euclidean Biplots

    No full text
    Abstract: We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, the method leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots

    Weighted Euclidean Biplots

    No full text
    Abstract: We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, the method leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots

    _

    No full text
    We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, the method leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots
    corecore