6,248 research outputs found
Popular Ensemble Methods: An Empirical Study
An ensemble consists of a set of individually trained classifiers (such as
neural networks or decision trees) whose predictions are combined when
classifying novel instances. Previous research has shown that an ensemble is
often more accurate than any of the single classifiers in the ensemble. Bagging
(Breiman, 1996c) and Boosting (Freund and Shapire, 1996; Shapire, 1990) are two
relatively new but popular methods for producing ensembles. In this paper we
evaluate these methods on 23 data sets using both neural networks and decision
trees as our classification algorithm. Our results clearly indicate a number of
conclusions. First, while Bagging is almost always more accurate than a single
classifier, it is sometimes much less accurate than Boosting. On the other
hand, Boosting can create ensembles that are less accurate than a single
classifier -- especially when using neural networks. Analysis indicates that
the performance of the Boosting methods is dependent on the characteristics of
the data set being examined. In fact, further results show that Boosting
ensembles may overfit noisy data sets, thus decreasing its performance.
Finally, consistent with previous studies, our work suggests that most of the
gain in an ensemble's performance comes in the first few classifiers combined;
however, relatively large gains can be seen up to 25 classifiers when Boosting
decision trees
Feature and Region Selection for Visual Learning
Visual learning problems such as object classification and action recognition
are typically approached using extensions of the popular bag-of-words (BoW)
model. Despite its great success, it is unclear what visual features the BoW
model is learning: Which regions in the image or video are used to discriminate
among classes? Which are the most discriminative visual words? Answering these
questions is fundamental for understanding existing BoW models and inspiring
better models for visual recognition.
To answer these questions, this paper presents a method for feature selection
and region selection in the visual BoW model. This allows for an intermediate
visualization of the features and regions that are important for visual
learning. The main idea is to assign latent weights to the features or regions,
and jointly optimize these latent variables with the parameters of a classifier
(e.g., support vector machine). There are four main benefits of our approach:
(1) Our approach accommodates non-linear additive kernels such as the popular
and intersection kernel; (2) our approach is able to handle both
regions in images and spatio-temporal regions in videos in a unified way; (3)
the feature selection problem is convex, and both problems can be solved using
a scalable reduced gradient method; (4) we point out strong connections with
multiple kernel learning and multiple instance learning approaches.
Experimental results in the PASCAL VOC 2007, MSR Action Dataset II and YouTube
illustrate the benefits of our approach
Transonic Flutter Suppression Control Law Design, Analysis and Wind Tunnel Results
The benchmark active controls technology and wind tunnel test program at NASA Langley Research Center was started with the objective to investigate the nonlinear, unsteady aerodynamics and active flutter suppression of wings in transonic flow. The paper will present the flutter suppression control law design process, numerical nonlinear simulation and wind tunnel test results for the NACA 0012 benchmark active control wing model. The flutter suppression control law design processes using (1) classical, (2) linear quadratic Gaussian (LQG), and (3) minimax techniques are described. A unified general formulation and solution for the LQG and minimax approaches, based on the steady state differential game theory is presented. Design considerations for improving the control law robustness and digital implementation are outlined. It was shown that simple control laws when properly designed based on physical principles, can suppress flutter with limited control power even in the presence of transonic shocks and flow separation. In wind tunnel tests in air and heavy gas medium, the closed-loop flutter dynamic pressure was increased to the tunnel upper limit of 200 psf The control law robustness and performance predictions were verified in highly nonlinear flow conditions, gain and phase perturbations, and spoiler deployment. A non-design plunge instability condition was also successfully suppressed
Recommended from our members
Bayesian Modeling for Mental Health Surveys
Sample surveys are often used to collect data for obtaining estimates of finite population quantities, such as disease prevalence. However, non-response and sampling frame under-coverage can cause the survey sample to differ from the target population in important ways. To reduce bias in the survey estimates that can arise from these differences, auxiliary information about the target population from sources including administrative files or census data can be used. Survey weighting is one approach commonly used to reduce bias. Although weighted estimates are relatively easy to obtain, they can be inefficient in the presence of highly dispersed weights. Model-based estimation in survey research offers advantages of improved efficiency in the presence of sparse data and highly variable weights. However, these models can be subject to model misspecification. In this dissertation, we propose Bayesian penalized spline regression models for survey inference about proportions in the entire population as well as in sub-populations. The proposed methods incorporate survey weights as covariates using a penalized spline to protect against model misspecification. We show by simulations that the proposed methods perform well, yielding efficient estimates of population proportion for binary survey data in the presence of highly dispersed weights and robust to model misspecification for survey outcomes. We illustrate the use of the proposed methods to estimate the prevalence of lifetime temper dysregulation disorder among National Guard service members overall and in sub-populations defined by gender and race using the Ohio Army National Guard Mental Health Initiative 2008-2009 survey data. We further extend the proposed framework to the setting where individual auxiliary data for the population are not available and utilize a Bayesian bootstrap approach to complete model-based estimation of current and undiagnosed depression in Hispanics/Latinos of different national backgrounds from the 2015 Washington Heights Community Survey
Recommended from our members
Adaptive Optimal Control The Thinking Man's GPC
Exploring connections between adaptive control theory and practice, this book treats the techniques of linear quadratic optimal control and estimation (Kalman filtering), recursive identification, linear systems theory and robust arguments
From SMOTE to Mixup for Deep Imbalanced Classification
Given imbalanced data, it is hard to train a good classifier using deep
learning because of the poor generalization of minority classes. Traditionally,
the well-known synthetic minority oversampling technique (SMOTE) for data
augmentation, a data mining approach for imbalanced learning, has been used to
improve this generalization. However, it is unclear whether SMOTE also benefits
deep learning. In this work, we study why the original SMOTE is insufficient
for deep learning, and enhance SMOTE using soft labels. Connecting the
resulting soft SMOTE with Mixup, a modern data augmentation technique, leads to
a unified framework that puts traditional and modern data augmentation
techniques under the same umbrella. A careful study within this framework shows
that Mixup improves generalization by implicitly achieving uneven margins
between majority and minority classes. We then propose a novel margin-aware
Mixup technique that more explicitly achieves uneven margins. Extensive
experimental results demonstrate that our proposed technique yields
state-of-the-art performance on deep imbalanced classification while achieving
superior performance on extremely imbalanced data. The code is open-sourced in
our developed package https://github.com/ntucllab/imbalanced-DL to foster
future research in this direction.Comment: 25 pages, 3 figures. The paper is accepted by TAAI 202
- …