2,260 research outputs found
Semi-supervised multi-layered clustering model for intrusion detection
A Machine Learning (ML) -based Intrusion Detection and Prevention System (IDPS) requires a large amount of labeled up-to-date training data, to effectively detect intrusions and generalize well to novel attacks. However, labeling of data is costly and becomes infeasible when dealing with big data, such as those generated by IoT (Internet of Things) -based applications. To this effect, building a ML model that learns from non- or partially-labeled data is of critical importance. This paper proposes a novel Semi-supervised Multi-Layered Clustering Model (SMLC) for network intrusion detection and prevention tasks. The SMLC has the capability to learn from partially labeled data while achieving a comparable detection performance to supervised ML-based IDPS. The performance of the SMLC is compared with well-known supervised ensemble ML models, namely, RandomForest, Bagging, and AdaboostM1 and a semi-supervised model (i.e., tri-training) on a benchmark network intrusion dataset, the Kyoto 2006+. Experimental results show that the SMLC outperforms all other models and can achieve better detection accuracy using only 20% labeled instances of the training data
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
A number of studies have found that today's Visual Question Answering (VQA)
models are heavily driven by superficial correlations in the training data and
lack sufficient image grounding. To encourage development of models geared
towards the latter, we propose a new setting for VQA where for every question
type, train and test sets have different prior distributions of answers.
Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we
call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2
respectively). First, we evaluate several existing VQA models under this new
setting and show that their performance degrades significantly compared to the
original VQA setting. Second, we propose a novel Grounded Visual Question
Answering model (GVQA) that contains inductive biases and restrictions in the
architecture specifically designed to prevent the model from 'cheating' by
primarily relying on priors in the training data. Specifically, GVQA explicitly
disentangles the recognition of visual concepts present in the image from the
identification of plausible answer space for a given question, enabling the
model to more robustly generalize across different distributions of answers.
GVQA is built off an existing VQA model -- Stacked Attention Networks (SAN).
Our experiments demonstrate that GVQA significantly outperforms SAN on both
VQA-CP v1 and VQA-CP v2 datasets. Interestingly, it also outperforms more
powerful VQA models such as Multimodal Compact Bilinear Pooling (MCB) in
several cases. GVQA offers strengths complementary to SAN when trained and
evaluated on the original VQA v1 and VQA v2 datasets. Finally, GVQA is more
transparent and interpretable than existing VQA models.Comment: 15 pages, 10 figures. To appear in IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 201
Random Feature Maps via a Layered Random Projection (LaRP) Framework for Object Classification
The approximation of nonlinear kernels via linear feature maps has recently
gained interest due to their applications in reducing the training and testing
time of kernel-based learning algorithms. Current random projection methods
avoid the curse of dimensionality by embedding the nonlinear feature space into
a low dimensional Euclidean space to create nonlinear kernels. We introduce a
Layered Random Projection (LaRP) framework, where we model the linear kernels
and nonlinearity separately for increased training efficiency. The proposed
LaRP framework was assessed using the MNIST hand-written digits database and
the COIL-100 object database, and showed notable improvement in object
classification performance relative to other state-of-the-art random projection
methods.Comment: 5 page
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Evolving Ensemble Fuzzy Classifier
The concept of ensemble learning offers a promising avenue in learning from
data streams under complex environments because it addresses the bias and
variance dilemma better than its single model counterpart and features a
reconfigurable structure, which is well suited to the given context. While
various extensions of ensemble learning for mining non-stationary data streams
can be found in the literature, most of them are crafted under a static base
classifier and revisits preceding samples in the sliding window for a
retraining step. This feature causes computationally prohibitive complexity and
is not flexible enough to cope with rapidly changing environments. Their
complexities are often demanding because it involves a large collection of
offline classifiers due to the absence of structural complexities reduction
mechanisms and lack of an online feature selection mechanism. A novel evolving
ensemble classifier, namely Parsimonious Ensemble pENsemble, is proposed in
this paper. pENsemble differs from existing architectures in the fact that it
is built upon an evolving classifier from data streams, termed Parsimonious
Classifier pClass. pENsemble is equipped by an ensemble pruning mechanism,
which estimates a localized generalization error of a base classifier. A
dynamic online feature selection scenario is integrated into the pENsemble.
This method allows for dynamic selection and deselection of input features on
the fly. pENsemble adopts a dynamic ensemble structure to output a final
classification decision where it features a novel drift detection scenario to
grow the ensemble structure. The efficacy of the pENsemble has been numerically
demonstrated through rigorous numerical studies with dynamic and evolving data
streams where it delivers the most encouraging performance in attaining a
tradeoff between accuracy and complexity.Comment: this paper has been published by IEEE Transactions on Fuzzy System
Cluster-based ensemble learning for wind power modeling with meteorological wind data
Optimal implementation and monitoring of wind energy generation hinge on
reliable power modeling that is vital for understanding turbine control, farm
operational optimization, and grid load balance. Based on the idea of similar
wind condition leads to similar wind power; this paper constructs a modeling
scheme that orderly integrates three types of ensemble learning algorithms,
bagging, boosting, and stacking, and clustering approaches to achieve optimal
power modeling. It also investigates applications of different clustering
algorithms and methodology for determining cluster numbers in wind power
modeling. The results reveal that all ensemble models with clustering exploit
the intrinsic information of wind data and thus outperform models without it by
approximately 15% on average. The model with the best farthest first clustering
is computationally rapid and performs exceptionally well with an improvement of
around 30%. The modeling is further boosted by about 5% by introducing stacking
that fuses ensembles with varying clusters. The proposed modeling framework
thus demonstrates promise by delivering efficient and robust modeling
performance.Comment: UNDER REVIEW Renewable & Sustainable Energy Review
- …