21,621 research outputs found
A Feature Selection Method for Multivariate Performance Measures
Feature selection with specific multivariate performance measures is the key
to the success of many applications, such as image retrieval and text
classification. The existing feature selection methods are usually designed for
classification error. In this paper, we propose a generalized sparse
regularizer. Based on the proposed regularizer, we present a unified feature
selection framework for general loss functions. In particular, we study the
novel feature selection paradigm by optimizing multivariate performance
measures. The resultant formulation is a challenging problem for
high-dimensional data. Hence, a two-layer cutting plane algorithm is proposed
to solve this problem, and the convergence is presented. In addition, we adapt
the proposed method to optimize multivariate measures for multiple instance
learning problems. The analyses by comparing with the state-of-the-art feature
selection methods show that the proposed method is superior to others.
Extensive experiments on large-scale and high-dimensional real world datasets
show that the proposed method outperforms -SVM and SVM-RFE when choosing a
small subset of features, and achieves significantly improved performances over
SVM in terms of -score
Optimizing Ranking Measures for Compact Binary Code Learning
Hashing has proven a valuable tool for large-scale information retrieval.
Despite much success, existing hashing methods optimize over simple objectives
such as the reconstruction error or graph Laplacian related loss functions,
instead of the performance evaluation criteria of interest---multivariate
performance measures such as the AUC and NDCG. Here we present a general
framework (termed StructHash) that allows one to directly optimize multivariate
performance measures. The resulting optimization problem can involve
exponentially or infinitely many variables and constraints, which is more
challenging than standard structured output learning. To solve the StructHash
optimization problem, we use a combination of column generation and
cutting-plane techniques. We demonstrate the generality of StructHash by
applying it to ranking prediction and image retrieval, and show that it
outperforms a few state-of-the-art hashing methods.Comment: Appearing in Proc. European Conference on Computer Vision 201
Performance and optimization of support vector machines in high-energy physics classification problems
In this paper we promote the use of Support Vector Machines (SVM) as a
machine learning tool for searches in high-energy physics. As an example for a
new- physics search we discuss the popular case of Supersymmetry at the Large
Hadron Collider. We demonstrate that the SVM is a valuable tool and show that
an automated discovery- significance based optimization of the SVM
hyper-parameters is a highly efficient way to prepare an SVM for such
applications. A new C++ LIBSVM interface called SVM-HINT is developed and
available on Github.Comment: 20 pages, 6 figure
Multi-Target Prediction: A Unifying View on Problems and Methods
Multi-target prediction (MTP) is concerned with the simultaneous prediction
of multiple target variables of diverse type. Due to its enormous application
potential, it has developed into an active and rapidly expanding research field
that combines several subfields of machine learning, including multivariate
regression, multi-label classification, multi-task learning, dyadic prediction,
zero-shot learning, network inference, and matrix completion. In this paper, we
present a unifying view on MTP problems and methods. First, we formally discuss
commonalities and differences between existing MTP problems. To this end, we
introduce a general framework that covers the above subfields as special cases.
As a second contribution, we provide a structured overview of MTP methods. This
is accomplished by identifying a number of key properties, which distinguish
such methods and determine their suitability for different types of problems.
Finally, we also discuss a few challenges for future research
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
- …