30,263 research outputs found
Feature Extraction in Signal Regression: A Boosting Technique for Functional Data Regression
Main objectives of feature extraction in signal regression are the improvement of accuracy of prediction on future data and identification of relevant parts of the signal. A feature extraction procedure is proposed that uses boosting techniques to select the relevant parts of the signal. The proposed blockwise boosting procedure simultaneously selects intervals in the signal’s domain and estimates the effect on the response. The blocks that are defined explicitly use the underlying metric of the signal. It is demonstrated in simulation studies and for real-world data that the proposed approach competes well with procedures like PLS, P-spline signal regression and functional data regression.
The paper is a preprint of an article published in the Journal of Computational and Graphical Statistics. Please use the journal version for citation
Learning with Clustering Structure
We study supervised learning problems using clustering constraints to impose
structure on either features or samples, seeking to help both prediction and
interpretation. The problem of clustering features arises naturally in text
classification for instance, to reduce dimensionality by grouping words
together and identify synonyms. The sample clustering problem on the other
hand, applies to multiclass problems where we are allowed to make multiple
predictions and the performance of the best answer is recorded. We derive a
unified optimization formulation highlighting the common structure of these
problems and produce algorithms whose core iteration complexity amounts to a
k-means clustering step, which can be approximated efficiently. We extend these
results to combine sparsity and clustering constraints, and develop a new
projection algorithm on the set of clustered sparse vectors. We prove
convergence of our algorithms on random instances, based on a union of
subspaces interpretation of the clustering structure. Finally, we test the
robustness of our methods on artificial data sets as well as real data
extracted from movie reviews.Comment: Completely rewritten. New convergence proofs in the clustered and
sparse clustered case. New projection algorithm on sparse clustered vector
- …