1,292 research outputs found

    A Novel Hybrid Dimensionality Reduction Method using Support Vector Machines and Independent Component Analysis

    Get PDF
    Due to the increasing demand for high dimensional data analysis from various applications such as electrocardiogram signal analysis and gene expression analysis for cancer detection, dimensionality reduction becomes a viable process to extracts essential information from data such that the high-dimensional data can be represented in a more condensed form with much lower dimensionality to both improve classification accuracy and reduce computational complexity. Conventional dimensionality reduction methods can be categorized into stand-alone and hybrid approaches. The stand-alone method utilizes a single criterion from either supervised or unsupervised perspective. On the other hand, the hybrid method integrates both criteria. Compared with a variety of stand-alone dimensionality reduction methods, the hybrid approach is promising as it takes advantage of both the supervised criterion for better classification accuracy and the unsupervised criterion for better data representation, simultaneously. However, several issues always exist that challenge the efficiency of the hybrid approach, including (1) the difficulty in finding a subspace that seamlessly integrates both criteria in a single hybrid framework, (2) the robustness of the performance regarding noisy data, and (3) nonlinear data representation capability. This dissertation presents a new hybrid dimensionality reduction method to seek projection through optimization of both structural risk (supervised criterion) from Support Vector Machine (SVM) and data independence (unsupervised criterion) from Independent Component Analysis (ICA). The projection from SVM directly contributes to classification performance improvement in a supervised perspective whereas maximum independence among features by ICA construct projection indirectly achieving classification accuracy improvement due to better intrinsic data representation in an unsupervised perspective. For linear dimensionality reduction model, I introduce orthogonality to interrelate both projections from SVM and ICA while redundancy removal process eliminates a part of the projection vectors from SVM, leading to more effective dimensionality reduction. The orthogonality-based linear hybrid dimensionality reduction method is extended to uncorrelatedness-based algorithm with nonlinear data representation capability. In the proposed approach, SVM and ICA are integrated into a single framework by the uncorrelated subspace based on kernel implementation. Experimental results show that the proposed approaches give higher classification performance with better robustness in relatively lower dimensions than conventional methods for high-dimensional datasets

    Linear components of quadratic classifiers

    Full text link
    This is pre-print of an article published in Advances in Data Analysis and Classification. The final authenticated version is available online at: https://doi.org/10.1007/s11634-018-0321-6We obtain a decomposition of any quadratic classifier in terms of products of hyperplanes. These hyperplanes can be viewed as relevant linear components of the quadratic rule (with respect to the underlying classification problem). As an application, we introduce the associated multidirectional classifier; a piecewise linear classification rule induced by the approximating products. Such a classifier is useful to determine linear combinations of the predictor variables with ability to discriminate. We also show that this classifier can be used as a tool to reduce the dimension of the data and helps identify the most important variables to classify new elements. Finally, we illustrate with a real data set the use of these linear components to construct oblique classification treesThis research was supported by the Spanish MCyT grant MTM2016-78751-

    The importance of selecting the optimal number of principal components for fault detection using principal component analysis

    Get PDF
    Includes summary.Includes bibliographical references.Fault detection and isolation are the two fundamental building blocks of process monitoring. Accurate and efficient process monitoring increases plant availability and utilization. Principal component analysis is one of the statistical techniques that are used for fault detection. Determination of the number of PCs to be retained plays a big role in detecting a fault using the PCA technique. In this dissertation focus has been drawn on the methods of determining the number of PCs to be retained for accurate and effective fault detection in a laboratory thermal system. SNR method of determining number of PCs, which is a relatively recent method, has been compared to two commonly used methods for the same, the CPV and the scree test methods

    Software quality and reliability prediction using Dempster -Shafer theory

    Get PDF
    As software systems are increasingly deployed in mission critical applications, accurate quality and reliability predictions are becoming a necessity. Most accurate prediction models require extensive testing effort, implying increased cost and slowing down the development life cycle. We developed two novel statistical models based on Dempster-Shafer theory, which provide accurate predictions from relatively small data sets of direct and indirect software reliability and quality predictors. The models are flexible enough to incorporate information generated throughout the development life-cycle to improve the prediction accuracy.;Our first contribution is an original algorithm for building Dempster-Shafer Belief Networks using prediction logic. This model has been applied to software quality prediction. We demonstrated that the prediction accuracy of Dempster-Shafer Belief Networks is higher than that achieved by logistic regression, discriminant analysis, random forests, as well as the algorithms in two machine learning software packages, See5 and WEKA. The difference in the performance of the Dempster-Shafer Belief Networks over the other methods is statistically significant.;Our second contribution is also based on a practical extension of Dempster-Shafer theory. The major limitation of the Dempsters rule and other known rules of evidence combination is the inability to handle information coming from correlated sources. Motivated by inherently high correlations between early life-cycle predictors of software reliability, we extended Murphy\u27s rule of combination to account for these correlations. When used as a part of the methodology that fuses various software reliability prediction systems, this rule provided more accurate predictions than previously reported methods. In addition, we proposed an algorithm, which defines the upper and lower bounds of the belief function of the combination results. To demonstrate its generality, we successfully applied it in the design of the Online Safety Monitor, which fuses multiple correlated time varying estimations of convergence of neural network learning in an intelligent flight control system

    CM55: special prime-field elliptic curves almost optimizing den Boer\u27s reduction between Diffie-Hellman and discrete logs

    Get PDF
    Using the Pohlig--Hellman algorithm, den Boer reduced the discrete logarithm problem to the Diffie--Hellman problem in groups of an order whose prime factors were each one plus a smooth number. This report reviews some related general conjectural lower bounds on the Diffie-Hellman problem in elliptic curve groups that relax the smoothness condition into a more commonly true condition. This report focuses on some elliptic curve parameters defined over a prime field size of size 9+55(2^288), whose special form may provide some efficiency advantages over random fields of similar sizes. The curve has a point of Proth prime order 1+55(2^286), which helps to nearly optimize the den Boer reduction. This curve is constructed using the CM method. It has cofactor 4, trace 6, and fundamental discriminant -55. This report also tries to consolidate the variety of ways of deciding between elliptic curves (or other algorithms) given the efficiency and security of each

    Multivariate Analysis in Management, Engineering and the Sciences

    Get PDF
    Recently statistical knowledge has become an important requirement and occupies a prominent position in the exercise of various professions. In the real world, the processes have a large volume of data and are naturally multivariate and as such, require a proper treatment. For these conditions it is difficult or practically impossible to use methods of univariate statistics. The wide application of multivariate techniques and the need to spread them more fully in the academic and the business justify the creation of this book. The objective is to demonstrate interdisciplinary applications to identify patterns, trends, association sand dependencies, in the areas of Management, Engineering and Sciences. The book is addressed to both practicing professionals and researchers in the field

    Model Selection Techniques for Kernel-Based Regression Analysis Using Information Complexity Measure and Genetic Algorithms

    Get PDF
    In statistical modeling, an overparameterized model leads to poor generalization on unseen data points. This issue requires a model selection technique that appropriately chooses the form, the parameters of the proposed model and the independent variables retained for the modeling. Model selection is particularly important for linear and nonlinear statistical models, which can be easily overfitted. Recently, support vector machines (SVMs), also known as kernel-based methods, have drawn much attention as the next generation of nonlinear modeling techniques. The model selection issues for SVMs include the selection of the kernel, the corresponding parameters and the optimal subset of independent variables. In the current literature, k-fold cross-validation is the widely utilized model selection method for SVMs by the machine learning researchers. However, cross-validation is computationally intensive since one has to fit the model k times. This dissertation introduces the use of a model selection criterion based on information complexity (ICOMP) measure for kernel-based regression analysis and its applications. ICOMP penalizes both the lack-of-fit and the complexity of the model to choose the optimal model with good generalization properties. ICOMP provides a simple index for each model and does not require any validation data. It is computationally efficient and it has been successfully applied to various linear model selection problems. In this dissertation, we introduce ICOMP to the nonlinear kernel-based modeling areas. Specifically, this dissertation proposes ICOMP and its various forms in the area of kernel ridge regression; kernel partial least squares regression; kernel principal component analysis; kernel principal component regression; relevance vector regression; relevance vector logistic regression and classification problems. The model selection tasks achieved by our proposed criterion include choosing the form of the kernel function, the parameters of the kernel function, the ridge parameter, the number of latent variables, the number of principal components and the optimal subset of input variables in a simultaneous fashion for intelligent data mining. The performance of the proposed model selection method is tested on simulation bench- mark data sets as well as real data sets. The predictive performance of the proposed model selection criteria are comparable to and even better than cross-validation, which is too costly to compute and not efficient. This dissertation combines the Genetic Algorithm with ICOMP in variable subsetting, which significantly decreases the computational time as compared to the exhaustive search of all possible subsets. GA procedure is shown to be robust and performs well in our repeated simulation examples. Therefore, this dissertation provides researchers an alternative computationally efficient model selection approach for data analysis using kernel methods
    corecore