45 research outputs found

    GLSVM: Integrating Structured Feature Selection and Large Margin Classification

    Full text link
    Abstract—High dimensional data challenges current feature selection methods. For many real world problems we often have prior knowledge about the relationship of features. For example in microarray data analysis, genes from the same biological pathways are expected to have similar relationship to the outcome that we target to predict. Recent regularization methods on Support Vector Machine (SVM) have achieved great success to perform feature selection and model selection simultaneously for high dimensional data, but neglect such re-lationship among features. To build interpretable SVM models, the structure information of features should be incorporated. In this paper, we propose an algorithm GLSVM that automatically perform model selection and feature selection in SVMs. To incorporate the prior knowledge of feature relationship, we extend standard 2 norm SVM and use a penalty function that employs a L2 norm regularization term including the normalized Laplacian of the graph and L1 penalty. We have demonstrated the effectiveness of our methods and compare them to the state-of-the-art using two real-world benchmarks. I

    Structured Feature Selection of Continuous Dynamical Systems for Aircraft Dynamics Identification

    Get PDF
    This paper addresses the problem of identifying structured nonlinear dynamical systems, with the goal of using the learned dynamics in model-based reinforcement learning problems. We present in this setting a new class of scalable multi-task estimators which promote sparsity, while preserving the dynamics structure and leveraging available physical insight. An implementation leading to consistent feature selection is suggested, allowing to obtain accurate models. An additional regularizer is also proposed to help in recovering realistic hidden representations of the dynamics. We illustrate our method by applying it to an aircraft trajectory optimization problem. Our numerical results based on real flight data from 25 medium haul aircraft, totaling 8 millions observations, show that our approach is competitive with existing methods for this type of application

    Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology

    Get PDF
    This paper presents some experiments in clustering homogeneous XMLdocuments to validate an existing classification or more generally anorganisational structure. Our approach integrates techniques for extracting knowledge from documents with unsupervised classification (clustering) of documents. We focus on the feature selection used for representing documents and its impact on the emerging classification. We mix the selection of structured features with fine textual selection based on syntactic characteristics.We illustrate and evaluate this approach with a collection of Inria activity reports for the year 2003. The objective is to cluster projects into larger groups (Themes), based on the keywords or different chapters of these activity reports. We then compare the results of clustering using different feature selections, with the official theme structure used by Inria.Comment: (postprint); This version corrects a couple of errors in authors' names in the bibliograph

    Resource Constrained Structured Prediction

    Full text link
    We study the problem of structured prediction under test-time budget constraints. We propose a novel approach applicable to a wide range of structured prediction problems in computer vision and natural language processing. Our approach seeks to adaptively generate computationally costly features during test-time in order to reduce the computational cost of prediction while maintaining prediction performance. We show that training the adaptive feature generation system can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two structured prediction tasks, optical character recognition (OCR) and dependency parsing and show strong performance in reduction of the feature costs without degrading accuracy

    Heterogeneous feature space based task selection machine for unsupervised transfer learning

    Full text link
    © 2015 IEEE. Transfer learning techniques try to transfer knowledge from previous tasks to a new target task with either fewer training data or less training than traditional machine learning techniques. Since transfer learning cares more about relatedness between tasks and their domains, it is useful for handling massive data, which are not labeled, to overcome distribution and feature space gaps, respectively. In this paper, we propose a new task selection algorithm in an unsupervised transfer learning domain, called as Task Selection Machine (TSM). It goes with a key technical problem, i.e., feature mapping for heterogeneous feature spaces. An extended feature method is applied to feature mapping algorithm. Also, TSM training algorithm, which is main contribution for this paper, relies on feature mapping. Meanwhile, the proposed TSM finally meets the unsupervised transfer learning requirements and solves the unsupervised multi-task transfer learning issues conversely

    Collaborative Filtering via Group-Structured Dictionary Learning

    Get PDF
    Structured sparse coding and the related structured dictionary learning problems are novel research areas in machine learning. In this paper we present a new application of structured dictionary learning for collaborative filtering based recommender systems. Our extensive numerical experiments demonstrate that the presented technique outperforms its state-of-the-art competitors and has several advantages over approaches that do not put structured constraints on the dictionary elements.Comment: A compressed version of the paper has been accepted for publication at the 10th International Conference on Latent Variable Analysis and Source Separation (LVA/ICA 2012

    Conic Multi-Task Classification

    Full text link
    Traditionally, Multi-task Learning (MTL) models optimize the average of task-related objective functions, which is an intuitive approach and which we will be referring to as Average MTL. However, a more general framework, referred to as Conic MTL, can be formulated by considering conic combinations of the objective functions instead; in this framework, Average MTL arises as a special case, when all combination coefficients equal 1. Although the advantage of Conic MTL over Average MTL has been shown experimentally in previous works, no theoretical justification has been provided to date. In this paper, we derive a generalization bound for the Conic MTL method, and demonstrate that the tightest bound is not necessarily achieved, when all combination coefficients equal 1; hence, Average MTL may not always be the optimal choice, and it is important to consider Conic MTL. As a byproduct of the generalization bound, it also theoretically explains the good experimental results of previous relevant works. Finally, we propose a new Conic MTL model, whose conic combination coefficients minimize the generalization bound, instead of choosing them heuristically as has been done in previous methods. The rationale and advantage of our model is demonstrated and verified via a series of experiments by comparing with several other methods.Comment: Accepted by European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD)-201
    corecore