14,663 research outputs found
Data Cube Approximation and Mining using Probabilistic Modeling
On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data.
Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be
used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches
Multiple Correspondence Analysis & the Multilogit Bilinear Model
Multiple Correspondence Analysis (MCA) is a dimension reduction method which
plays a large role in the analysis of tables with categorical nominal variables
such as survey data. Though it is usually motivated and derived using geometric
considerations, in fact we prove that it amounts to a single proximal Newtown
step of a natural bilinear exponential family model for categorical data the
multinomial logit bilinear model. We compare and contrast the behavior of MCA
with that of the model on simulations and discuss new insights on the
properties of both exploratory multivariate methods and their cognate models.
One main conclusion is that we could recommend to approximate the multilogit
model parameters using MCA. Indeed, estimating the parameters of the model is
not a trivial task whereas MCA has the great advantage of being easily solved
by singular value decomposition and scalable to large data
Conditional symmetry model as a better alternative to Symmetry Model for rater agreement measure
In almost all life or social science researches, subjects are classified into categories by raters, interviewers or observers. Many approaches have been proposed by various authors for analyzing the data or the results obtained from these raters. Symmetry and conditional symmetry models are models designed for square tables like the one arising from the raters results. Conditional symmetry model which possessed an extra parameter for the off-diagonal cells is a special case to symmetry. In this research work, we examined the effect of the extra parameter introduced by conditional symmetry model over that of symmetry on structure of agreement as well as their fittings. Generalized linear model (GLM) approach was used to model the loglinear model forms of these models with empirical examples. We observed that conditional symmetry based on it extra parameter gave a tremendous improvement to the significant level of the test statistics over that of its symmetry model counterpart, hence conditional symmetry model is better for raters agreement modelling which require symmetric table
Visualization of the Significant Explicative Categories using Catanova Method and Non-Symmetrical Correspondence Analysis for Evaluation of Passenger Satisfaction
ANalysis Of VAriance (ANOVA) is a method to decompose the total variation of the observations into sum of variations due to different factors and the residual component. When the data are nominal, the usual approach of considering the total variation in response variable as measure of dispersion about the mean is not well defined. Light and Margolin (1971) proposed CATegorical ANalysis Of VAriance (CATANOVA), to analyze the categorical data. Onukogu (1985) extended the CATANOVA method to two-way classified nominal data. The components (sums of squares) are, however, not orthogonal. Singh (1996) developed a CATANOVA procedure that gives orthogonal sums of squares and defined test statistics and their asymptotic null distributions. In order to study which exploratory categories are influential factors for the response variable we propose to apply Non-Symmetrical Correspondence Analysis (D'Ambra and Lauro, 1989) on significant components. Finally, we illustrate the analysis numerically, with a practical example
A review of some models for the analysis of contingency tables : a thesis presented in partial fulfilment of the requirements for the degree of Master of Arts in Statistics at Massey University
Some models proposed for the analysis of contingency tables are reviewed and illustrated with examples.
These include standard loglinear models; models which are suitable for ordinal categorical variables such as ordinal loglinear, log multiplicative and logit models, and models based on an underlying distribution for the response; and models for incomplete and square tables.
Estimation methods and inference are also discussed
- …