Search CORE

230 research outputs found

Multilayer Feedforward Neural Network for Internet Traffic Classification

Author: Harish B S
Manju N
Nagadarshan N
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 21/03/2022
Field of study

Recently, the efficient internet traffic classification has gained attention in order to improve service quality in IP networks. But the problem with the existing solutions is to handle the imbalanced dataset which has high uneven distribution of flows between the classes. In this paper, we propose a multilayer feedforward neural network architecture to handle the high imbalanced dataset. In the proposed model, we used a variation of multilayer perceptron with 4 hidden layers (called as mountain mirror networks) which does the feature transformation effectively. To check the efficacy of the proposed model, we used Cambridge dataset which consists of 248 features spread across 10 classes. Experimentation is carried out for two variants of the same dataset which is a standard one and a derived subset. The proposed model achieved an accuracy of 99.08% for highly imbalanced dataset (standard)

Re-UNIR

Data Preprocessing Strategies in Cancer Stage Prediction

Author: Ana Maria Mendes Moreira
Publication venue
Publication date: 14/09/2022
Field of study

Repositório Aberto da Universidade do Porto

Adaptive Preferential Attached kNN Graph With Distribution-Awareness

Author: Liu Ji
Min Shaojie
Publication venue
Publication date: 04/08/2023
Field of study

Graph-based kNN algorithms have garnered widespread popularity for machine learning tasks, due to their simplicity and effectiveness. However, the conventional kNN graph's reliance on a fixed value of k can hinder its performance, especially in scenarios involving complex data distributions. Moreover, like other classification models, the presence of ambiguous samples along decision boundaries often presents a challenge, as they are more prone to incorrect classification. To address these issues, we propose the Preferential Attached k-Nearest Neighbors Graph (paNNG), which combines adaptive kNN with distribution-based graph construction. By incorporating distribution information, paNNG can significantly improve performance for ambiguous samples by "pulling" them towards their original classes and hence enable enhanced overall accuracy and generalization capability. Through rigorous evaluations on diverse benchmark datasets, paNNG outperforms state-of-the-art algorithms, showcasing its adaptability and efficacy across various real-world scenarios

arXiv.org e-Print Archive

A review on classification of imbalanced data for wireless sensor networks

Author: Iwendi C
Jo O
Kashif Bashir A
Patel H
Singh Rajput D
Thippa Reddy G
Publication venue: 'SAGE Publications'
Publication date: 01/04/2020
Field of study

© The Author(s) 2020. Classification of imbalanced data is a vastly explored issue of the last and present decade and still keeps the same importance because data are an essential term today and it becomes crucial when data are distributed into several classes. The term imbalance refers to uneven distribution of data into classes that severely affects the performance of traditional classifiers, that is, classifiers become biased toward the class having larger amount of data. The data generated from wireless sensor networks will have several imbalances. This review article is a decent analysis of imbalance issue for wireless sensor networks and other application domains, which will help the community to understand WHAT, WHY, and WHEN of imbalance in data and its remedies

E-space: Manchester Metropolitan University's Research Repository

Application of advanced machine learning techniques to early network traffic classification

Author: Egea Gómez Santiago
Publication venue: 'Universidad de Valladolid'
Publication date: 01/01/2020
Field of study

The fast-paced evolution of the Internet is drawing a complex context which imposes demanding requirements to assure end-to-end Quality of Service. The development of advanced intelligent approaches in networking is envisioning features that include autonomous resource allocation, fast reaction against unexpected network events and so on. Internet Network Traffic Classification constitutes a crucial source of information for Network Management, being decisive in assisting the emerging network control paradigms. Monitoring traffic flowing through network devices support tasks such as: network orchestration, traffic prioritization, network arbitration and cyberthreats detection, amongst others. The traditional traffic classifiers became obsolete owing to the rapid Internet evolution. Port-based classifiers suffer from significant accuracy losses due to port masking, meanwhile Deep Packet Inspection approaches have severe user-privacy limitations. The advent of Machine Learning has propelled the application of advanced algorithms in diverse research areas, and some learning approaches have proved as an interesting alternative to the classic traffic classification approaches. Addressing Network Traffic Classification from a Machine Learning perspective implies numerous challenges demanding research efforts to achieve feasible classifiers. In this dissertation, we endeavor to formulate and solve important research questions in Machine-Learning-based Network Traffic Classification. As a result of numerous experiments, the knowledge provided in this research constitutes an engaging case of study in which network traffic data from two different environments are successfully collected, processed and modeled. Firstly, we approached the Feature Extraction and Selection processes providing our own contributions. A Feature Extractor was designed to create Machine-Learning ready datasets from real traffic data, and a Feature Selection Filter based on fast correlation is proposed and tested in several classification datasets. Then, the original Network Traffic Classification datasets are reduced using our Selection Filter to provide efficient classification models. Many classification models based on CART Decision Trees were analyzed exhibiting excellent outcomes in identifying various Internet applications. The experiments presented in this research comprise a comparison amongst ensemble learning schemes, an exploratory study on Class Imbalance and solutions; and an analysis of IP-header predictors for early traffic classification. This thesis is presented in the form of compendium of JCR-indexed scientific manuscripts and, furthermore, one conference paper is included. In the present work we study a wide number of learning approaches employing the most advance methodology in Machine Learning. As a result, we identify the strengths and weaknesses of these algorithms, providing our own solutions to overcome the observed limitations. Shortly, this thesis proves that Machine Learning offers interesting advanced techniques that open prominent prospects in Internet Network Traffic Classification.Departamento de Teoría de la Señal y Comunicaciones e Ingeniería TelemáticaDoctorado en Tecnologías de la Información y las Telecomunicacione

Repositorio Documental de la Universidad de Valladolid

SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary

Author: Chawla Nitesh V.
Fernández Hilario Alberto Luis
García López Salvador
Herrera Triguero Francisco
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2018
Field of study

The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered \de facto" standard in the framework of learning from imbalanced data. This is due to its simplicity in the design of the procedure, as well as its robustness when applied to di erent type of problems. Since its publication in 2002, SMOTE has proven successful in a variety of applications from several di erent domains. SMOTE has also inspired several approaches to counter the issue of class imbalance, and has also signi cantly contributed to new supervised learning paradigms, including multilabel classi cation, incremental learning, semi-supervised learning, multi-instance learning, among others. It is standard benchmark for learning from imbalanced data. It is also featured in a number of di erent software packages | from open source to commercial. In this paper, marking the fteen year anniversary of SMOTE, we re ect on the SMOTE journey, discuss the current state of a airs with SMOTE, its applications, and also identify the next set of challenges to extend SMOTE for Big Data problems.This work have been partially supported by the Spanish Ministry of Science and Technology under projects TIN2014-57251-P, TIN2015-68454-R and TIN2017-89517-P; the Project 887 BigDaP-TOOLS - Ayudas Fundaci on BBVA a Equipos de Investigaci on Cient ca 2016; and the National Science Foundation (NSF) Grant IIS-1447795

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Granada