230 research outputs found
Multilayer Feedforward Neural Network for Internet Traffic Classification
Recently, the efficient internet traffic classification has gained attention in order to improve service quality in IP networks. But the problem with the existing solutions is to handle the imbalanced dataset which has high uneven distribution of flows between the classes. In this paper, we propose a multilayer feedforward neural network architecture to handle the high imbalanced dataset. In the proposed model, we used a variation of multilayer perceptron with 4 hidden layers (called as mountain mirror networks) which does the feature transformation effectively. To check the efficacy of the proposed model, we used Cambridge dataset which consists of 248 features spread across 10 classes. Experimentation is carried out for two variants of the same dataset which is a standard one and a derived subset. The proposed model achieved an accuracy of 99.08% for highly imbalanced dataset (standard)
Adaptive Preferential Attached kNN Graph With Distribution-Awareness
Graph-based kNN algorithms have garnered widespread popularity for machine
learning tasks, due to their simplicity and effectiveness. However, the
conventional kNN graph's reliance on a fixed value of k can hinder its
performance, especially in scenarios involving complex data distributions.
Moreover, like other classification models, the presence of ambiguous samples
along decision boundaries often presents a challenge, as they are more prone to
incorrect classification. To address these issues, we propose the Preferential
Attached k-Nearest Neighbors Graph (paNNG), which combines adaptive kNN with
distribution-based graph construction. By incorporating distribution
information, paNNG can significantly improve performance for ambiguous samples
by "pulling" them towards their original classes and hence enable enhanced
overall accuracy and generalization capability. Through rigorous evaluations on
diverse benchmark datasets, paNNG outperforms state-of-the-art algorithms,
showcasing its adaptability and efficacy across various real-world scenarios
A review on classification of imbalanced data for wireless sensor networks
© The Author(s) 2020. Classification of imbalanced data is a vastly explored issue of the last and present decade and still keeps the same importance because data are an essential term today and it becomes crucial when data are distributed into several classes. The term imbalance refers to uneven distribution of data into classes that severely affects the performance of traditional classifiers, that is, classifiers become biased toward the class having larger amount of data. The data generated from wireless sensor networks will have several imbalances. This review article is a decent analysis of imbalance issue for wireless sensor networks and other application domains, which will help the community to understand WHAT, WHY, and WHEN of imbalance in data and its remedies
Application of advanced machine learning techniques to early network traffic classification
The fast-paced evolution of the Internet is drawing a complex context which
imposes demanding requirements to assure end-to-end Quality of Service. The
development of advanced intelligent approaches in networking is envisioning
features that include autonomous resource allocation, fast reaction against
unexpected network events and so on. Internet Network Traffic Classification
constitutes a crucial source of information for Network Management, being decisive
in assisting the emerging network control paradigms. Monitoring traffic flowing
through network devices support tasks such as: network orchestration, traffic
prioritization, network arbitration and cyberthreats detection, amongst others.
The traditional traffic classifiers became obsolete owing to the rapid Internet
evolution. Port-based classifiers suffer from significant accuracy losses due to port
masking, meanwhile Deep Packet Inspection approaches have severe user-privacy
limitations. The advent of Machine Learning has propelled the application of
advanced algorithms in diverse research areas, and some learning approaches have
proved as an interesting alternative to the classic traffic classification approaches.
Addressing Network Traffic Classification from a Machine Learning perspective
implies numerous challenges demanding research efforts to achieve feasible
classifiers. In this dissertation, we endeavor to formulate and solve important
research questions in Machine-Learning-based Network Traffic Classification. As a
result of numerous experiments, the knowledge provided in this research constitutes
an engaging case of study in which network traffic data from two different
environments are successfully collected, processed and modeled.
Firstly, we approached the Feature Extraction and Selection processes providing our
own contributions. A Feature Extractor was designed to create Machine-Learning
ready datasets from real traffic data, and a Feature Selection Filter based on fast
correlation is proposed and tested in several classification datasets. Then, the
original Network Traffic Classification datasets are reduced using our Selection
Filter to provide efficient classification models. Many classification models based on
CART Decision Trees were analyzed exhibiting excellent outcomes in identifying
various Internet applications. The experiments presented in this research comprise
a comparison amongst ensemble learning schemes, an exploratory study on Class
Imbalance and solutions; and an analysis of IP-header predictors for early traffic
classification. This thesis is presented in the form of compendium of JCR-indexed
scientific manuscripts and, furthermore, one conference paper is included.
In the present work we study a wide number of learning approaches employing the
most advance methodology in Machine Learning. As a result, we identify the
strengths and weaknesses of these algorithms, providing our own solutions to
overcome the observed limitations. Shortly, this thesis proves that Machine
Learning offers interesting advanced techniques that open prominent prospects in
Internet Network Traffic Classification.Departamento de Teoría de la Señal y Comunicaciones e Ingeniería TelemáticaDoctorado en Tecnologías de la Información y las Telecomunicacione
SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary
The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is
considered \de facto" standard in the framework of learning from imbalanced data. This
is due to its simplicity in the design of the procedure, as well as its robustness when applied
to di erent type of problems. Since its publication in 2002, SMOTE has proven
successful in a variety of applications from several di erent domains. SMOTE has also inspired
several approaches to counter the issue of class imbalance, and has also signi cantly
contributed to new supervised learning paradigms, including multilabel classi cation, incremental
learning, semi-supervised learning, multi-instance learning, among others. It is
standard benchmark for learning from imbalanced data. It is also featured in a number of
di erent software packages | from open source to commercial. In this paper, marking the
fteen year anniversary of SMOTE, we re
ect on the SMOTE journey, discuss the current
state of a airs with SMOTE, its applications, and also identify the next set of challenges
to extend SMOTE for Big Data problems.This work have been partially supported by the Spanish Ministry of Science and Technology
under projects TIN2014-57251-P, TIN2015-68454-R and TIN2017-89517-P; the Project
887 BigDaP-TOOLS - Ayudas Fundaci on BBVA a Equipos de Investigaci on Cient ca 2016;
and the National Science Foundation (NSF) Grant IIS-1447795
- …