4 research outputs found

    Rank and null space calculations using matrix decomposition without column interchanges

    Get PDF
    AbstractThe most widely used stable methods for numerical determination of the rank of a matrix A are the singular value decomposition and the QR algorithm with column interchanges. Here two algorithms are presented which determine rank and nullity in a numerically stable manner without using column interchanges. One algorithm makes use of the condition estimator of Cline, Moler, Stewart, and Wilkinson and relative to alternative stable algorithms is particularly efficient for sparse matrices. The second algorithm is important in the case that one wishes to test for rank and nullity while sequentially adding columns to a matrix

    Integrated smoothed location model and data reduction approaches for multi variables classification

    Get PDF
    Smoothed Location Model is a classification rule that deals with mixture of continuous variables and binary variables simultaneously. This rule discriminates groups in a parametric form using conditional distribution of the continuous variables given each pattern of the binary variables. To conduct a practical classification analysis, the objects must first be sorted into the cells of a multinomial table generated from the binary variables. Then, the parameters in each cell will be estimated using the sorted objects. However, in many situations, the estimated parameters are poor if the number of binary is large relative to the size of sample. Large binary variables will create too many multinomial cells which are empty, leading to high sparsity problem and finally give exceedingly poor performance for the constructed rule. In the worst case scenario, the rule cannot be constructed. To overcome such shortcomings, this study proposes new strategies to extract adequate variables that contribute to optimum performance of the rule. Combinations of two extraction techniques are introduced, namely 2PCA and PCA+MCA with new cutpoints of eigenvalue and total variance explained, to determine adequate extracted variables which lead to minimum misclassification rate. The outcomes from these extraction techniques are used to construct the smoothed location models, which then produce two new approaches of classification called 2PCALM and 2DLM. Numerical evidence from simulation studies demonstrates that the computed misclassification rate indicates no significant difference between the extraction techniques in normal and non-normal data. Nevertheless, both proposed approaches are slightly affected for non-normal data and severely affected for highly overlapping groups. Investigations on some real data sets show that the two approaches are competitive with, and better than other existing classification methods. The overall findings reveal that both proposed approaches can be considered as improvement to the location model, and alternatives to other classification methods particularly in handling mixed variables with large binary size

    Adaptive regression and model selection in data mining problems

    No full text
    Data Mining is a new and rapidly evolving area which deals with problems related to extracting structure from massive commercial and scientific data sets. Regression analysis is one of the major Data Mining techniques. The data sets encountered in the Data Mining area are often characterized by a large number of attributes (variables) as well as data records.This imposes two major requirements on the regression analysis tools used in Data Mining: first, in order to produce accurate and parsimonious models exhibiting the most important features of the problem in hand, they should be able to perform model selection adaptively and, second, the cost of running such tools has to be reasonably low. Most of the modern regression tools fail to meet the above requirements. This thesis is intended to contribute to the improvement of the existing methodologies as well as to propose new approaches. We focus on two regression estimation techniques. The first one,called Probing Least Absolute Squares Modelling (PLASM), is a generalization of the Least Absolute Shrinkage and Selection Operator (LASSO) by R. Tibshirani which minimizes the residual sum of squares subject to the 11-norm of the regression coefficients being less than a constant. LASSO has been shown to enjoy stability of the ridge regression coupled with the ability to carry out model selection. In our approach called PLASM, we replace the constraint employed in LASSO with a different constraint. PLASM allows for an arbitrary grouping of basis functions in a model and includes LASSO as a special case.The implication of using the new constraint is that PLASM is able to perform model selection in terms of groups of basis functions. This turns out to be very useful in a number of data analytic problems.For example, as far as additive modelling is concerned,the dimensionality of the PLASM minimization problem is much less than that of LASSO and is independent (at least explicitly) of the number of datapoints which makes it suitable for use in the Data Mining context. The second tool we consider this thesis is the Multivariate Adaptive Regression Splines (MARS) developed by J.Friedman.In our version of MARS called BMARS, we use B-splines instead of truncated power basis functions. The fact that B-splines have the compact support property allows us to introduce a new strategy whereby at any momen the algorithm builds a model using B-splines of a certain scale only and it switches over to splines of smaller scale after the fitting ability of the current splines has been exhausted. Also, we discuss a parallel version of BMARS as well as an application of the algorithm to processing of a large commercial data set.The results of the numerical experiments demonstrate that, while being considerably more efficient, BMARS is able to produce models competitive with those of the original MARS
    corecore