4 research outputs found
Recommended from our members
Sparsity in Machine Learning: An Information Selecting Perspective
Today we are living in a world awash with data. Large volumes of data are acquired, analyzed and applied to tasks through machine learning algorithms in nearly every area of science, business, and industry. For example, medical scientists analyze the gene expression data from a single specimen to learn the underlying causes of disease (e.g. cancer) and choose the best treatment; retailers can know more about customers\u27 shopping habits from retail data to adjust their business strategies to better appeal to customers; suppliers can enhance supply chain success through supply chain systems built on knowledge sharing. However, it is also reasonable to doubt whether all the genes make contributions to a disease; whether all the data obtained from existing customers can be applied to a new customer; whether all shared knowledge in the supply network is useful to a specific supply scenario. Therefore, it is crucial to sort through the massive information provided by data and keep what we really need. This process is referred to as information selection, which keeps the information that helps improve the performance of corresponding machine learning tasks and discards information that is useless or even harmful to task performance. Sparse learning is a powerful tool to achieve information selection. In this thesis, we apply sparse learning to two major areas in machine learning -- feature selection and transfer learning.
Feature selection is a dimensionality reduction technique that selects a subset of representative features. Recently, feature selection combined with sparse learning has attracted significant attention due to its outstanding performance compared with traditional feature selection methods that ignore correlation between features. However, they are restricted by design to linear data transformations, a potential drawback given that the underlying correlation structures of data are often non-linear. To leverage more sophisticated embedding than the linear model assumed by sparse learning, we propose an autoencoder-based unsupervised feature selection approach that leverages a single-layer autoencoder for a joint framework of feature selection and manifold learning. Additionally, we include spectral graph analysis on the projected data into the learning process to achieve local data geometry preservation from the original data space to the low-dimensional feature space.
Transfer learning describes a set of methods that aim at transferring knowledge from related domains to alleviate the problems caused by limited/no labeled training data in machine learnig tasks. Many transfer learning techniques have been proposed to deal with different application scenarios. However, due to the differences in data distribution, feature space, label space, etc., between source domain and target domain, it is necessary to select and only transfer relevant information from source domain to improve the performance of target learner. Otherwise, the target learner can be negatively impacted by the weak-related knowledge from source domain, which is referred to as negative transfer. In this thesis, we focus on two transfer learning scenarios for which limited labeled training data are available in target domain. In the first scenario, no label information is avaible in source data. In the second scenario, large amounts of labeled source data are available, but there is no overlap between the source and target label spaces. The corresponding transfer learning technique to the former case is called \emph{self-taught learning}, while that for the latter case is called \emph{few-shot learning}. We apply self-taught learning to visual, textal, and audio data. We also apply few-shot learning to wearable sensor based human activity data. For both cases, we propose a metric for the relevance between a target sample/class and a source sample/class, and then extract information from the related samples/classes for knowledge transfer to perform information selection so that negative transfer caused by weakly related source information can be alleviated. Experimental results show that transfer learning can provide better performance with information selection
Non-convex Regularized Self-representation for Unsupervised Feature Selection
5th International Conference, IScIDE 2015, Suzhou, China, June 14-16, 2015Feature selection aims to select a subset of features to decrease time complexity, reduce storage burden and improve the generalization ability of classification or clustering. For the countless unlabeled high dimensional data, unsupervised feature selection is effective in alleviating the curse of dimension-ality and can find applications in various fields. In this paper, we propose a non-convex regularized self-representation (RSR) model where features can be represented by a linear combination of other features, and propose to impose L2,p norm (0 < p < 1) regularization on self-representation coefficients for unsupervised feature selection. Compared with the conventional L2, 1 norm regularization, when p < 1, much sparser solution is obtained on the self-representation coefficients, and it is also more effective in selecting salient features. To solve the non-convex RSR model, we further propose an efficient iterative reweighted least squares (IRLS) algorithm with guaranteed convergence to fixed point. Extensive experimental results on nine datasets show that our feature selection method with small p is more effective. It mostly outperforms features selected at p = 1 and other state-of-the-art unsupervised feature selection methods in terms of classification accuracy and clustering result.Department of Computin
Using Feature Weighting as a Tool for Clustering Applications
The weighted variant of k-Means (Wk-Means), which assigns values to features based on their relevance, is a well-known approach to address the shortcoming of k-Means with data containing noisy and irrelevant features. This research aims first to explore how feature weighting can be used for feature selection, second to investigate the performance of Minkowski weighted k- Means (MWk-Means), and its intelligent variant, on datasets defined in different p-norms, and third to address the problem of missing values with a weighted variant of k-Means. A partial distance approach is used to address the problem of missing values for weighted variant of k- Means.
Anomalous clustering has been successfully used to detect natural clusters and initialize centroids in k-means type algorithms. Similarly, extensive work has been carried out on using feature weights to rescale features under Minkowski Lp metrics for p ≥ 1 . In this thesis, aspects from both of these approaches enable feature weights to be detected based on natural clusters present in the training data, but the clusters are not limited to spherical shape. Two methods, mean-FSFW and max-FSFW, are developed as further extensions of intelligent Minkowski Weighted k-Means(iMWk-Means), where feature weights are used as indices for feature selection with no requirement for user-specified parameters.
The proposed feature selection methods are able to significantly reduce the number of noisy features. These methods are further extended to mean-FSFWextPD and max-FSFWextPD to address missing values and are found to be better alternatives than existing imputation methods.
The effect of feature weighting on clustering of dataset defined in varying p-norms is further explored in the thesis. An algorithm that translates a dataset into different p-norms has been proposed. The capability of MWk-Means to read true shapes of clusters defined in different p- norms is explored.
To address the problem of missing feature values in weighted variant of k-Means, different missing-value imputation methods are tested. The MWk-Means and its intelligent variant are further extended to incorporate the partial distance approach, specifically to address the problem of missing values.
All these methods are tested in both synthetic and real-world datasets against three models of noise - noisy feature added, feature blurring and cluster-wise feature blurring - where applicable. These noises are generated from Gaussian and uniform distribution with three different strength of noise, i.e., no noise, half noise and full noise
Overall, results demonstrate that feature weighting can improve feature selection. The partial- distance approach, with feature weights, is effective at ignoring missing values, and cluster retrieval in various p-norm spaces is effective