14,663 research outputs found

    Data Cube Approximation and Mining using Probabilistic Modeling

    Get PDF
    On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data. Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches

    Multiple Correspondence Analysis & the Multilogit Bilinear Model

    Full text link
    Multiple Correspondence Analysis (MCA) is a dimension reduction method which plays a large role in the analysis of tables with categorical nominal variables such as survey data. Though it is usually motivated and derived using geometric considerations, in fact we prove that it amounts to a single proximal Newtown step of a natural bilinear exponential family model for categorical data the multinomial logit bilinear model. We compare and contrast the behavior of MCA with that of the model on simulations and discuss new insights on the properties of both exploratory multivariate methods and their cognate models. One main conclusion is that we could recommend to approximate the multilogit model parameters using MCA. Indeed, estimating the parameters of the model is not a trivial task whereas MCA has the great advantage of being easily solved by singular value decomposition and scalable to large data

    Conditional symmetry model as a better alternative to Symmetry Model for rater agreement measure

    Get PDF
    In almost all life or social science researches, subjects are classified into categories by raters, interviewers or observers. Many approaches have been proposed by various authors for analyzing the data or the results obtained from these raters. Symmetry and conditional symmetry models are models designed for square tables like the one arising from the raters results. Conditional symmetry model which possessed an extra parameter for the off-diagonal cells is a special case to symmetry. In this research work, we examined the effect of the extra parameter introduced by conditional symmetry model over that of symmetry on structure of agreement as well as their fittings. Generalized linear model (GLM) approach was used to model the loglinear model forms of these models with empirical examples. We observed that conditional symmetry based on it extra parameter gave a tremendous improvement to the significant level of the test statistics over that of its symmetry model counterpart, hence conditional symmetry model is better for raters agreement modelling which require symmetric table

    Visualization of the Significant Explicative Categories using Catanova Method and Non-Symmetrical Correspondence Analysis for Evaluation of Passenger Satisfaction

    Get PDF
    ANalysis Of VAriance (ANOVA) is a method to decompose the total variation of the observations into sum of variations due to different factors and the residual component. When the data are nominal, the usual approach of considering the total variation in response variable as measure of dispersion about the mean is not well defined. Light and Margolin (1971) proposed CATegorical ANalysis Of VAriance (CATANOVA), to analyze the categorical data. Onukogu (1985) extended the CATANOVA method to two-way classified nominal data. The components (sums of squares) are, however, not orthogonal. Singh (1996) developed a CATANOVA procedure that gives orthogonal sums of squares and defined test statistics and their asymptotic null distributions. In order to study which exploratory categories are influential factors for the response variable we propose to apply Non-Symmetrical Correspondence Analysis (D'Ambra and Lauro, 1989) on significant components. Finally, we illustrate the analysis numerically, with a practical example

    A review of some models for the analysis of contingency tables : a thesis presented in partial fulfilment of the requirements for the degree of Master of Arts in Statistics at Massey University

    Get PDF
    Some models proposed for the analysis of contingency tables are reviewed and illustrated with examples. These include standard loglinear models; models which are suitable for ordinal categorical variables such as ordinal loglinear, log­ multiplicative and logit models, and models based on an underlying distribution for the response; and models for incomplete and square tables. Estimation methods and inference are also discussed
    • …
    corecore