741 research outputs found
"Virus hunting" using radial distance weighted discrimination
Motivated by the challenge of using DNA-seq data to identify viruses in human
blood samples, we propose a novel classification algorithm called "Radial
Distance Weighted Discrimination" (or Radial DWD). This classifier is designed
for binary classification, assuming one class is surrounded by the other class
in very diverse radial directions, which is seen to be typical for our virus
detection data. This separation of the 2 classes in multiple radial directions
naturally motivates the development of Radial DWD. While classical machine
learning methods such as the Support Vector Machine and linear Distance
Weighted Discrimination can sometimes give reasonable answers for a given data
set, their generalizability is severely compromised because of the linear
separating boundary. Radial DWD addresses this challenge by using a more
appropriate (in this particular case) spherical separating boundary.
Simulations show that for appropriate radial contexts, this gives much better
generalizability than linear methods, and also much better than conventional
kernel based (nonlinear) Support Vector Machines, because the latter methods
essentially use much of the information in the data for determining the shape
of the separating boundary. The effectiveness of Radial DWD is demonstrated for
real virus detection.Comment: Published at http://dx.doi.org/10.1214/15-AOAS869 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
GenSVM: a generalized multiclass support vector machine
Traditional extensions of the binary support vector machine (SVM) to multiclass problems are either heuristics or require solving a large dual optimization problem. Here, a generalized multiclass SVM is proposed called GenSVM. In this method classification boundaries for a K-class problem are constructed in a (K - 1)-dimensional space using a simplex encoding. Additionally, several different weightings of the misclassification errors are incorporated in the loss function, such that it generalizes three existing multiclass SVMs through a single optimization problem. An iterative majorization algorithm is derived that solves the optimization problem without the need of a dual formulation. This algorithm has the advantage that it can use warm starts during cross validation and during a grid search, which signifficantly speeds up the training phase. Rigorous numerical experiments compare linear GenSVM with seven existing multiclass SVMs on both small and large data sets. These comparisons show that the proposed method is competitive with existing methods in both predictive accuracy and training time, and that it signiffcantly outperforms several existing methods on these criteria
On Reject and Refine Options in Multicategory Classification
In many real applications of statistical learning, a decision made from
misclassification can be too costly to afford; in this case, a reject option,
which defers the decision until further investigation is conducted, is often
preferred. In recent years, there has been much development for binary
classification with a reject option. Yet, little progress has been made for the
multicategory case. In this article, we propose margin-based multicategory
classification methods with a reject option. In addition, and more importantly,
we introduce a new and unique refine option for the multicategory problem,
where the class of an observation is predicted to be from a set of class
labels, whose cardinality is not necessarily one. The main advantage of both
options lies in their capacity of identifying error-prone observations.
Moreover, the refine option can provide more constructive information for
classification by effectively ruling out implausible classes. Efficient
implementations have been developed for the proposed methods. On the
theoretical side, we offer a novel statistical learning theory and show a fast
convergence rate of the excess -risk of our methods with emphasis on
diverging dimensionality and number of classes. The results can be further
improved under a low noise assumption. A set of comprehensive simulation and
real data studies has shown the usefulness of the new learning tools compared
to regular multicategory classifiers. Detailed proofs of theorems and extended
numerical results are included in the supplemental materials available online.Comment: A revised version of this paper was accepted for publication in the
Journal of the American Statistical Association Theory and Methods Section.
52 pages, 6 figure
An Ensemble Framework Approach to Crop Type Prediction Using Feature Selection and Multiclass Classification
Crop type classification plays a crucial role in modern agriculture, aiding in yield prediction, resource management, and land-use planning. This paper presents a comprehensive framework for crop type classification utilizing a combination of feature selection techniques, robust classification Algorithm, and a Support Vector Machine (SVM)-based multiclass classification approach. The proposed framework begins with a novel feature selection process that identifies the most relevant attributes from the Agricultural Data and Rainfall data. This feature selection step is essential for reducing data dimensionality, enhancing classification accuracy, and improving model interpretability. Following feature selection, a state-of-the-art multiclass classification strategy based on Support Vector Machines is employed. SVMs are known for their capability to handle high-dimensional data and have demonstrated superior performance in various classification tasks. In this framework, SVMs are adapted to handle multiclass crop type classification efficiently. The model is trained on the selected features and optimized using hyperparameter tuning techniques to ensure robust performance
Modelling, Monitoring, Control and Optimization for Complex Industrial Processes
This reprint includes 22 research papers and an editorial, collected from the Special Issue "Modelling, Monitoring, Control and Optimization for Complex Industrial Processes", highlighting recent research advances and emerging research directions in complex industrial processes. This reprint aims to promote the research field and benefit the readers from both academic communities and industrial sectors
Gibbs Max-margin Topic Models with Data Augmentation
Max-margin learning is a powerful approach to building classifiers and
structured output predictors. Recent work on max-margin supervised topic models
has successfully integrated it with Bayesian topic models to discover
discriminative latent semantic structures and make accurate predictions for
unseen testing data. However, the resulting learning problems are usually hard
to solve because of the non-smoothness of the margin loss. Existing approaches
to building max-margin supervised topic models rely on an iterative procedure
to solve multiple latent SVM subproblems with additional mean-field assumptions
on the desired posterior distributions. This paper presents an alternative
approach by defining a new max-margin loss. Namely, we present Gibbs max-margin
supervised topic models, a latent variable Gibbs classifier to discover hidden
topic representations for various tasks, including classification, regression
and multi-task learning. Gibbs max-margin supervised topic models minimize an
expected margin loss, which is an upper bound of the existing margin loss
derived from an expected prediction rule. By introducing augmented variables
and integrating out the Dirichlet variables analytically by conjugacy, we
develop simple Gibbs sampling algorithms with no restricting assumptions and no
need to solve SVM subproblems. Furthermore, each step of the
"augment-and-collapse" Gibbs sampling algorithms has an analytical conditional
distribution, from which samples can be easily drawn. Experimental results
demonstrate significant improvements on time efficiency. The classification
performance is also significantly improved over competitors on binary,
multi-class and multi-label classification tasks.Comment: 35 page
Semi-automated image analysis for the identification of bivalve larvae from a Cape Cod estuary
Author Posting. © Association for the Sciences of Limnology and Oceanography, 2012. This article is posted here by permission of Association for the Sciences of Limnology and Oceanography for personal use, not for redistribution. The definitive version was published in Limnology and Oceanography: Methods 10 (2012): 538-554, doi:10.4319/lom.2012.10.538.Machine-learning methods for identifying planktonic organisms are becoming well-established. Although similar morphologies among species make traditional image identification methods difficult for larval bivalves, species-specific shell birefringence patterns under polarized light permit identification by color and texture-based features. This approach uses cross-polarized images of bivalve larvae, extracts Gabor and color angle features from each image, and classifies images using a Support Vector Machine. We adapted this method, which was established on hatchery-reared larvae, to identify bivalve larvae from a series of field samples from a Cape Cod estuary in 2009. This method had 98% identification accuracy for four hatchery-reared species. We used a multiplex polymerase chain reaction (PCR) method to confirm field identifications and to compare accuracies to the software classifications. Image classification of larvae collected in the field had lower accuracies than both the classification of hatchery species and PCR-based identification due to error in visually classifying unknown larvae and variability in larval images from the field. A six-species field training set had the best correspondence to our visual classifications with 75% overall agreement and individual species agreements from 63% to 88%. Larval abundance estimates for a time-series of field samples showed good correspondence with visual methods after correction. Overall, this approach represents a cost- and time-saving alternative to molecular-based identifications and can produce sufficient results to address long-term abundance and transport-based questions on a species-specific level, a rarity in studies of bivalve larvae.This project was supported by an award to S. Gallager
and C. Mingione Thompson from the Estuarine Reserves Division, Office
of Ocean and Coastal Resource Management, National Ocean Service,
National Oceanic and Atmospheric Administration and a grant from
Woods Hole Oceanographic Institution’s Coastal Ocean Institute
- …