165 research outputs found
Modified Frank-Wolfe Algorithm for Enhanced Sparsity in Support Vector Machine Classifiers
This work proposes a new algorithm for training a re-weighted L2 Support
Vector Machine (SVM), inspired on the re-weighted Lasso algorithm of Cand\`es
et al. and on the equivalence between Lasso and SVM shown recently by Jaggi. In
particular, the margin required for each training vector is set independently,
defining a new weighted SVM model. These weights are selected to be binary, and
they are automatically adapted during the training of the model, resulting in a
variation of the Frank-Wolfe optimization algorithm with essentially the same
computational complexity as the original algorithm. As shown experimentally,
this algorithm is computationally cheaper to apply since it requires less
iterations to converge, and it produces models with a sparser representation in
terms of support vectors and which are more stable with respect to the
selection of the regularization hyper-parameter
Modified Frank–Wolfe algorithm for enhanced sparsity in support vector machine classifiers
This work proposes a new algorithm for training a re-weighted ℓ2 Support Vector Machine (SVM), inspired on the re-weighted Lasso algorithm of Candès et al. and on the equivalence between Lasso and SVM shown recently by Jaggi. In particular, the margin required for each training vector is set independently, defining a new weighted SVM model. These weights are selected to be binary, and they are automatically adapted during the training of the model, resulting in a variation of the Frank–Wolfe optimization algorithm with essentially the same computational complexity as the original algorithm. As shown experimentally, this algorithm is computationally cheaper to apply since it requires less iterations to converge, and it produces models with a sparser representation in terms of support vectors and which are more stable with respect to the selection of the regularization hyper-parameterThe authors would like to thank the following organizations. • EU: The research leading to these results has received funding from the European Research Council under the European Ue- DATADRIVE-B (290923). This paper reflects only the authors’ views, the Union is not liable for any use that may be made of the contained information. • Research Council KUL: GOA/10/09 MaNet, CoE PFV/10/002 (OPTEC), BIL12/11
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
A Novel Frank-Wolfe Algorithm. Analysis and Applications to Large-Scale SVM Training
Recently, there has been a renewed interest in the machine learning community
for variants of a sparse greedy approximation procedure for concave
optimization known as {the Frank-Wolfe (FW) method}. In particular, this
procedure has been successfully applied to train large-scale instances of
non-linear Support Vector Machines (SVMs). Specializing FW to SVM training has
allowed to obtain efficient algorithms but also important theoretical results,
including convergence analysis of training algorithms and new characterizations
of model sparsity.
In this paper, we present and analyze a novel variant of the FW method based
on a new way to perform away steps, a classic strategy used to accelerate the
convergence of the basic FW procedure. Our formulation and analysis is focused
on a general concave maximization problem on the simplex. However, the
specialization of our algorithm to quadratic forms is strongly related to some
classic methods in computational geometry, namely the Gilbert and MDM
algorithms.
On the theoretical side, we demonstrate that the method matches the
guarantees in terms of convergence rate and number of iterations obtained by
using classic away steps. In particular, the method enjoys a linear rate of
convergence, a result that has been recently proved for MDM on quadratic forms.
On the practical side, we provide experiments on several classification
datasets, and evaluate the results using statistical tests. Experiments show
that our method is faster than the FW method with classic away steps, and works
well even in the cases in which classic away steps slow down the algorithm.
Furthermore, these improvements are obtained without sacrificing the predictive
accuracy of the obtained SVM model.Comment: REVISED VERSION (October 2013) -- Title and abstract have been
revised. Section 5 was added. Some proofs have been summarized (full-length
proofs available in the previous version
Quantum differentially private sparse regression learning
Differentially private (DP) learning, which aims to accurately extract
patterns from the given dataset without exposing individual information, is an
important subfield in machine learning and has been extensively explored.
However, quantum algorithms that could preserve privacy, while outperform their
classical counterparts, are still lacking. The difficulty arises from the
distinct priorities in DP and quantum machine learning, i.e., the former
concerns a low utility bound while the latter pursues a low runtime cost. These
varied goals request that the proposed quantum DP algorithm should achieve the
runtime speedup over the best known classical results while preserving the
optimal utility bound.
The Lasso estimator is broadly employed to tackle the high dimensional sparse
linear regression tasks. The main contribution of this paper is devising a
quantum DP Lasso estimator to earn the runtime speedup with the privacy
preservation, i.e., the runtime complexity is with
a nearly optimal utility bound , where is the sample
size and is the data dimension with . Since the optimal classical
(private) Lasso takes runtime, our proposal achieves quantum
speedups when . There are two key components in our algorithm.
First, we extend the Frank-Wolfe algorithm from the classical Lasso to the
quantum scenario, {where the proposed quantum non-private Lasso achieves a
quadratic runtime speedup over the optimal classical Lasso.} Second, we develop
an adaptive privacy mechanism to ensure the privacy guarantee of the
non-private Lasso. Our proposal opens an avenue to design various learning
tasks with both the proven runtime speedups and the privacy preservation
Group-wise Sparse and Explainable Adversarial Attacks
Sparse adversarial attacks fool deep neural networks (DNNs) through minimal
pixel perturbations, typically regularized by the norm. Recent efforts
have replaced this norm with a structural sparsity regularizer, such as the
nuclear group norm, to craft group-wise sparse adversarial attacks. The
resulting perturbations are thus explainable and hold significant practical
relevance, shedding light on an even greater vulnerability of DNNs than
previously anticipated. However, crafting such attacks poses an optimization
challenge, as it involves computing norms for groups of pixels within a
non-convex objective. In this paper, we tackle this challenge by presenting an
algorithm that simultaneously generates group-wise sparse attacks within
semantically meaningful areas of an image. In each iteration, the core
operation of our algorithm involves the optimization of a quasinorm adversarial
loss. This optimization is achieved by employing the -quasinorm proximal
operator for some iterations, a method tailored for nonconvex programming.
Subsequently, the algorithm transitions to a projected Nesterov's accelerated
gradient descent with -norm regularization applied to perturbation
magnitudes. We rigorously evaluate the efficacy of our novel attack in both
targeted and non-targeted attack scenarios, on CIFAR-10 and ImageNet datasets.
When compared to state-of-the-art methods, our attack consistently results in a
remarkable increase in group-wise sparsity, e.g., an increase of on
CIFAR-10 and on ImageNet (average case, targeted attack), all while
maintaining lower perturbation magnitudes. Notably, this performance is
complemented by a significantly faster computation time and a attack
success rate
Conditional Gradient Methods
The purpose of this survey is to serve both as a gentle introduction and a
coherent overview of state-of-the-art Frank--Wolfe algorithms, also called
conditional gradient algorithms, for function minimization. These algorithms
are especially useful in convex optimization when linear optimization is
cheaper than projections.
The selection of the material has been guided by the principle of
highlighting crucial ideas as well as presenting new approaches that we believe
might become important in the future, with ample citations even of old works
imperative in the development of newer methods. Yet, our selection is sometimes
biased, and need not reflect consensus of the research community, and we have
certainly missed recent important contributions. After all the research area of
Frank--Wolfe is very active, making it a moving target. We apologize sincerely
in advance for any such distortions and we fully acknowledge: We stand on the
shoulder of giants.Comment: 238 pages with many figures. The FrankWolfe.jl Julia package
(https://github.com/ZIB-IOL/FrankWolfe.jl) providces state-of-the-art
implementations of many Frank--Wolfe method
Joint optimization of manifold learning and sparse representations for face and gesture analysis
Face and gesture understanding algorithms are powerful enablers in intelligent vision systems for surveillance, security, entertainment, and smart spaces. In the future, complex networks of sensors and cameras may disperse directions to lost tourists, perform directory lookups in the office lobby, or contact the proper authorities in case of an emergency. To be effective, these systems will need to embrace human subtleties while interacting with people in their natural conditions. Computer vision and machine learning techniques have recently become adept at solving face and gesture tasks using posed datasets in controlled conditions. However, spontaneous human behavior under unconstrained conditions, or in the wild, is more complex and is subject to considerable variability from one person to the next. Uncontrolled conditions such as lighting, resolution, noise, occlusions, pose, and temporal variations complicate the matter further. This thesis advances the field of face and gesture analysis by introducing a new machine learning framework based upon dimensionality reduction and sparse representations that is shown to be robust in posed as well as natural conditions. Dimensionality reduction methods take complex objects, such as facial images, and attempt to learn lower dimensional representations embedded in the higher dimensional data. These alternate feature spaces are computationally more efficient and often more discriminative. The performance of various dimensionality reduction methods on geometric and appearance based facial attributes are studied leading to robust facial pose and expression recognition models. The parsimonious nature of sparse representations (SR) has successfully been exploited for the development of highly accurate classifiers for various applications. Despite the successes of SR techniques, large dictionaries and high dimensional data can make these classifiers computationally demanding. Further, sparse classifiers are subject to the adverse effects of a phenomenon known as coefficient contamination, where for example variations in pose may affect identity and expression recognition. This thesis analyzes the interaction between dimensionality reduction and sparse representations to present a unified sparse representation classification framework that addresses both issues of computational complexity and coefficient contamination. Semi-supervised dimensionality reduction is shown to mitigate the coefficient contamination problems associated with SR classifiers. The combination of semi-supervised dimensionality reduction with SR systems forms the cornerstone for a new face and gesture framework called Manifold based Sparse Representations (MSR). MSR is shown to deliver state-of-the-art facial understanding capabilities. To demonstrate the applicability of MSR to new domains, MSR is expanded to include temporal dynamics. The joint optimization of dimensionality reduction and SRs for classification purposes is a relatively new field. The combination of both concepts into a single objective function produce a relation that is neither convex, nor directly solvable. This thesis studies this problem to introduce a new jointly optimized framework. This framework, termed LGE-KSVD, utilizes variants of Linear extension of Graph Embedding (LGE) along with modified K-SVD dictionary learning to jointly learn the dimensionality reduction matrix, sparse representation dictionary, sparse coefficients, and sparsity-based classifier. By injecting LGE concepts directly into the K-SVD learning procedure, this research removes the support constraints K-SVD imparts on dictionary element discovery. Results are shown for facial recognition, facial expression recognition, human activity analysis, and with the addition of a concept called active difference signatures, delivers robust gesture recognition from Kinect or similar depth cameras
Classification algorithms on the cell processor
The rapid advancement in the capacity and reliability of data storage technology has allowed for the retention of virtually limitless quantity and detail of digital information. Massive information databases are becoming more and more widespread among governmental, educational, scientific, and commercial organizations. By segregating this data into carefully defined input (e.g.: images) and output (e.g.: classification labels) sets, a classification algorithm can be used develop an internal expert model of the data by employing a specialized training algorithm. A properly trained classifier is capable of predicting the output for future input data from the same input domain that it was trained on. Two popular classifiers are Neural Networks and Support Vector Machines. Both, as with most accurate classifiers, require massive computational resources to carry out the training step and can take months to complete when dealing with extremely large data sets. In most cases, utilizing larger training improves the final accuracy of the trained classifier. However, access to the kinds of computational resources required to do so is expensive and out of reach of private or under funded institutions. The Cell Broadband Engine (CBE), introduced by Sony, Toshiba, and IBM has recently been introduced into the market. The current most inexpensive iteration is available in the Sony Playstation 3 ® computer entertainment system. The CBE is a novel multi-core architecture which features many hardware enhancements designed to accelerate the processing of massive amounts of data. These characteristics and the cheap and widespread availability of this technology make the Cell a prime candidate for the task of training classifiers. In this work, the feasibility of the Cell processor in the use of training Neural Networks and Support Vector Machines was explored. In the Neural Network family of classifiers, the fully connected Multilayer Perceptron and Convolution Network were implemented. In the Support Vector Machine family, a Working Set technique known as the Gradient Projection-based Decomposition Technique, as well as the Cascade SVM were implemented
Optimisation Method for Training Deep Neural Networks in Classification of Non- functional Requirements
Non-functional requirements (NFRs) are regarded critical to a software system's success. The majority of NFR detection and classification solutions have relied on supervised machine
learning models. It is hindered by the lack of labelled data for training and necessitate a significant amount of time spent on feature engineering.
In this work we explore emerging deep learning techniques to reduce the burden of feature engineering. The goal of this study is to develop an autonomous system that can classify NFRs into multiple classes based on a labelled corpus. In the first section of the thesis, we standardise the NFRs ontology and annotations to produce a corpus based on five attributes: usability, reliability, efficiency, maintainability, and portability. In the second section, the design and implementation of four neural networks, including the artificial neural network, convolutional neural network, long short-term memory, and gated recurrent unit are examined to classify NFRs.
These models, necessitate a large corpus. To overcome this limitation, we proposed a new paradigm for data augmentation. This method uses a sort and concatenates strategy to combine two phrases from the same class, resulting in a two-fold increase in data size while keeping the domain vocabulary intact. We compared our method to a baseline (no augmentation) and an existing approach Easy data augmentation (EDA) with pre-trained word embeddings. All training has been performed under two modifications to the data; augmentation on the entire data before train/validation split vs augmentation on train set only. Our findings show that as compared to EDA and baseline, NFRs classification model improved greatly, and CNN outperformed when trained using our suggested technique in the first setting. However, we saw a slight boost in the second experimental setup with just train set augmentation. As a result, we can determine that augmentation of the validation is required in order to achieve acceptable results with our proposed approach. We hope that our ideas will inspire new data augmentation techniques, whether they are generic or task specific. Furthermore, it would also be useful to implement this strategy in other languages
- …