8,641 research outputs found
Handling Imbalanced Classification Problems With Support Vector Machines via Evolutionary Bilevel Optimization
Support vector machines (SVMs) are popular learning algorithms to deal with
binary classification problems. They traditionally assume equal
misclassification costs for each class; however, real-world problems may have
an uneven class distribution. This article introduces EBCS-SVM: evolutionary
bilevel cost-sensitive SVMs. EBCS-SVM handles imbalanced classification
problems by simultaneously learning the support vectors and optimizing the SVM
hyperparameters, which comprise the kernel parameter and misclassification
costs. The resulting optimization problem is a bilevel problem, where the lower
level determines the support vectors and the upper level the hyperparameters.
This optimization problem is solved using an evolutionary algorithm (EA) at the
upper level and sequential minimal optimization (SMO) at the lower level. These
two methods work in a nested fashion, that is, the optimal support vectors help
guide the search of the hyperparameters, and the lower level is initialized
based on previous successful solutions. The proposed method is assessed using
70 datasets of imbalanced classification and compared with several
state-of-the-art methods. The experimental results, supported by a Bayesian
test, provided evidence of the effectiveness of EBCS-SVM when working with
highly imbalanced datasets.Comment: Copyright 2022 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work
Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values
This work is motivated by the needs of predictive analytics on healthcare
data as represented by Electronic Medical Records. Such data is invariably
problematic: noisy, with missing entries, with imbalance in classes of
interests, leading to serious bias in predictive modeling. Since standard data
mining methods often produce poor performance measures, we argue for
development of specialized techniques of data-preprocessing and classification.
In this paper, we propose a new method to simultaneously classify large
datasets and reduce the effects of missing values. It is based on a multilevel
framework of the cost-sensitive SVM and the expected maximization imputation
method for missing values, which relies on iterated regression analyses. We
compare classification results of multilevel SVM-based algorithms on public
benchmark datasets with imbalanced classes and missing values as well as real
data in health applications, and show that our multilevel SVM-based method
produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625
Reliability-based design optimization using kriging surrogates and subset simulation
The aim of the present paper is to develop a strategy for solving
reliability-based design optimization (RBDO) problems that remains applicable
when the performance models are expensive to evaluate. Starting with the
premise that simulation-based approaches are not affordable for such problems,
and that the most-probable-failure-point-based approaches do not permit to
quantify the error on the estimation of the failure probability, an approach
based on both metamodels and advanced simulation techniques is explored. The
kriging metamodeling technique is chosen in order to surrogate the performance
functions because it allows one to genuinely quantify the surrogate error. The
surrogate error onto the limit-state surfaces is propagated to the failure
probabilities estimates in order to provide an empirical error measure. This
error is then sequentially reduced by means of a population-based adaptive
refinement technique until the kriging surrogates are accurate enough for
reliability analysis. This original refinement strategy makes it possible to
add several observations in the design of experiments at the same time.
Reliability and reliability sensitivity analyses are performed by means of the
subset simulation technique for the sake of numerical efficiency. The adaptive
surrogate-based strategy for reliability estimation is finally involved into a
classical gradient-based optimization algorithm in order to solve the RBDO
problem. The kriging surrogates are built in a so-called augmented reliability
space thus making them reusable from one nested RBDO iteration to the other.
The strategy is compared to other approaches available in the literature on
three academic examples in the field of structural mechanics.Comment: 20 pages, 6 figures, 5 tables. Preprint submitted to Springer-Verla
Training Support Vector Machines Using Frank-Wolfe Optimization Methods
Training a Support Vector Machine (SVM) requires the solution of a quadratic
programming problem (QP) whose computational complexity becomes prohibitively
expensive for large scale datasets. Traditional optimization methods cannot be
directly applied in these cases, mainly due to memory restrictions.
By adopting a slightly different objective function and under mild conditions
on the kernel used within the model, efficient algorithms to train SVMs have
been devised under the name of Core Vector Machines (CVMs). This framework
exploits the equivalence of the resulting learning problem with the task of
building a Minimal Enclosing Ball (MEB) problem in a feature space, where data
is implicitly embedded by a kernel function.
In this paper, we improve on the CVM approach by proposing two novel methods
to build SVMs based on the Frank-Wolfe algorithm, recently revisited as a fast
method to approximate the solution of a MEB problem. In contrast to CVMs, our
algorithms do not require to compute the solutions of a sequence of
increasingly complex QPs and are defined by using only analytic optimization
steps. Experiments on a large collection of datasets show that our methods
scale better than CVMs in most cases, sometimes at the price of a slightly
lower accuracy. As CVMs, the proposed methods can be easily extended to machine
learning problems other than binary classification. However, effective
classifiers are also obtained using kernels which do not satisfy the condition
required by CVMs and can thus be used for a wider set of problems
The Gremlin Graph Traversal Machine and Language
Gremlin is a graph traversal machine and language designed, developed, and
distributed by the Apache TinkerPop project. Gremlin, as a graph traversal
machine, is composed of three interacting components: a graph , a traversal
, and a set of traversers . The traversers move about the graph
according to the instructions specified in the traversal, where the result of
the computation is the ultimate locations of all halted traversers. A Gremlin
machine can be executed over any supporting graph computing system such as an
OLTP graph database and/or an OLAP graph processor. Gremlin, as a graph
traversal language, is a functional language implemented in the user's native
programming language and is used to define the of a Gremlin machine.
This article provides a mathematical description of Gremlin and details its
automaton and functional properties. These properties enable Gremlin to
naturally support imperative and declarative querying, host language
agnosticism, user-defined domain specific languages, an extensible
compiler/optimizer, single- and multi-machine execution models, hybrid depth-
and breadth-first evaluation, as well as the existence of a Universal Gremlin
Machine and its respective entailments.Comment: To appear in the Proceedings of the 2015 ACM Database Programming
Languages Conferenc
- …