19 research outputs found
A Scalable and Extensible Framework for Superposition-Structured Models
In many learning tasks, structural models usually lead to better
interpretability and higher generalization performance. In recent years,
however, the simple structural models such as lasso are frequently proved to be
insufficient. Accordingly, there has been a lot of work on
"superposition-structured" models where multiple structural constraints are
imposed. To efficiently solve these "superposition-structured" statistical
models, we develop a framework based on a proximal Newton-type method.
Employing the smoothed conic dual approach with the LBFGS updating formula, we
propose a scalable and extensible proximal quasi-Newton (SEP-QN) framework.
Empirical analysis on various datasets shows that our framework is potentially
powerful, and achieves super-linear convergence rate for optimizing some
popular "superposition-structured" statistical models such as the fused sparse
group lasso
Clustering based feature selection using Partitioning Around Medoids (PAM)
High-dimensional data contains a large number of features. With many features, high dimensional data requires immense computational resources, including space and time. Several studies indicate that not all features of high dimensional data are relevant to classification result. Dimensionality reduction is inevitable and is required due to classifier performance improvement. Several dimensionality reduction techniques were carried out, including feature selection techniques and feature extraction techniques. Sequential forward feature selection and backward feature selection are feature selection using the greedy approach. The heuristics approach is also applied in feature selection, using the Genetic Algorithm, PSO, and Forest Optimization Algorithm. PCA is the most well-known feature extraction method. Besides, other methods such as multidimensional scaling and linear discriminant analysis. In this work, a different approach is applied to perform feature selection. Cluster analysis based feature selection using Partitioning Around Medoids (PAM) clustering is carried out. Our experiment results showed that classification accuracy gained when using feature vectors' medoids to represent the original dataset is high, above 80%
Clustering based feature selection using Partitioning Around Medoids (PAM)
High-dimensional data contains a large number of features. With many features,
high dimensional data requires immense computational resources, including space
and time. Several studies indicate that not all features of high dimensional data are
relevant to classification result. Dimensionality reduction is inevitable and is
required due to classifier performance improvement. Several dimensionality
reduction techniques were carried out, including feature selection techniques and
feature extraction techniques. Sequential forward feature selection and backward
feature selection are feature selection using the greedy approach. The heuristics
approach is also applied in feature selection, using the Genetic Algorithm, PSO, and
Forest Optimization Algorithm. PCA is the most well-known feature extraction
method. Besides, other methods such as multidimensional scaling and linear
discriminant analysis. In this work, a different approach is applied to perform feature
selection. Cluster analysis based feature selection using Partitioning Around
Medoids (PAM) clustering is carried out. Our experiment results showed that
classification accuracy gained when using feature vectors' medoids to represent the
original dataset is high, above 80%