Search CORE

982 research outputs found

Identifying Mislabeling in Machine Learning Based Intrusion Detection System

Author: Li Bofan
Publication venue: ISU ReD: Research and eData
Publication date: 15/11/2021
Field of study

Machine learning has shown strong potential in improving the performance of an Intrusion Detection Systems (IDS). In a machine learning based IDS, the problem is commonly formulated as a supervised classification, in which various training datasets are used to train a selected model to learn how various network features are related to different types (i.e., benign traffic or a type of network attack) of network traffic. Each training dataset usually includes a large amount of data samples, and each data sample contains many network features and their associated type of traffic called label. Most recent studies focus on developing a better machine learning model to achieve higher performance in an IDS. Very little research has been done in understanding the quality of training datasets, especially mislabeling affects the performance of a machine learning based IDS.In this thesis, we focus on the mislabeling issue in a machine learning based IDS. We first show the impact of mislabeling on the performance of such an IDS. Then, we propose a new algorithm called Heuristic Mislabel Identification (HMI) based on Data Shapley [6] to identify mislabels in training datasets. Based on different mislabeling scenarios, HMI heuristically and iteratively divides a training dataset into multiple groups to narrow down the location or range of mislabels. We have evaluated our method using a widely adopted IDS training dataset (i.e., CICIDS2017). The evaluation results show that HMI can identify 84% random mislabels and 78% mislabels from a single data source. The precision on both experiment above is 100% which means the suspect group must contain mislabeling samples

ISU ReD: Research and eData

Exploring variability in medical imaging

Author: Chotzoglou Elissavet
Publication venue: Computing, Imperial College London
Publication date: 01/04/2022
Field of study

Although recent successes of deep learning and novel machine learning techniques improved the perfor- mance of classification and (anomaly) detection in computer vision problems, the application of these methods in medical imaging pipeline remains a very challenging task. One of the main reasons for this is the amount of variability that is encountered and encapsulated in human anatomy and subsequently reflected in medical images. This fundamental factor impacts most stages in modern medical imaging processing pipelines. Variability of human anatomy makes it virtually impossible to build large datasets for each disease with labels and annotation for fully supervised machine learning. An efficient way to cope with this is to try and learn only from normal samples. Such data is much easier to collect. A case study of such an automatic anomaly detection system based on normative learning is presented in this work. We present a framework for detecting fetal cardiac anomalies during ultrasound screening using generative models, which are trained only utilising normal/healthy subjects. However, despite the significant improvement in automatic abnormality detection systems, clinical routine continues to rely exclusively on the contribution of overburdened medical experts to diagnosis and localise abnormalities. Integrating human expert knowledge into the medical imaging processing pipeline entails uncertainty which is mainly correlated with inter-observer variability. From the per- spective of building an automated medical imaging system, it is still an open issue, to what extent this kind of variability and the resulting uncertainty are introduced during the training of a model and how it affects the final performance of the task. Consequently, it is very important to explore the effect of inter-observer variability both, on the reliable estimation of model’s uncertainty, as well as on the model’s performance in a specific machine learning task. A thorough investigation of this issue is presented in this work by leveraging automated estimates for machine learning model uncertainty, inter-observer variability and segmentation task performance in lung CT scan images. Finally, a presentation of an overview of the existing anomaly detection methods in medical imaging was attempted. This state-of-the-art survey includes both conventional pattern recognition methods and deep learning based methods. It is one of the first literature surveys attempted in the specific research area.Open Acces

Spiral - Imperial College Digital Repository

Deep Learning based Densenet Convolution Neural Network for Community Detection in Online Social Networks

Author: Kathiravan A. Vijaya
Selvakumar M.
Publication venue: Auricle Global Society of Education and Research
Publication date: 18/08/2023
Field of study

Online Social Networks (OSNs) have become increasingly popular, with hundreds of millions of users in recent years. A community in a social network is a virtual group with shared interests and activities that they want to communicate. OSN and the growing number of users have also increased the need for communities. Community structure is an important topological property of OSN and plays an essential role in various dynamic processes, including the diffusion of information within the network. All networks have a community format, and one of the most continually addressed research issues is the finding of communities. However, traditional techniques didn't do a better community of discovering user interests. As a result, these methods cannot detect active communities.  To tackle this issues, in this paper presents Densenet Convolution Neural Network (DnetCNN) approach for community detection. Initially, we gather dataset from Kaggle repository. Then preprocessing the dataset to remove inconsistent and missing values. In addition to User Behavior Impact Rate (UBIR) technique to identify the user URL access, key term and page access. After that, Web Crawling Prone Factor Rate (WCPFR) technique is used find the malicious activity random forest and decision method. Furthermore, Spider Web Cluster Community based Feature Selection (SWC2FS) algorithm is used to choose finest attributes in the dataset. Based on the attributes, to find the community group using Densenet Convolution Neural Network (DnetCNN) approach. Thus, the experimental result produce better performance than other methods

International Journal on Recent and Innovation Trends in Computing and Communication

Internet of Underwater Things and Big Marine Data Analytics -- A Comprehensive Survey

Author: Azghadi Mostafa Rahimi
Hanzo Lajos
Jahanbakht Mohammad
Xiang Wei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

The Internet of Underwater Things (IoUT) is an emerging communication ecosystem developed for connecting underwater objects in maritime and underwater environments. The IoUT technology is intricately linked with intelligent boats and ships, smart shores and oceans, automatic marine transportations, positioning and navigation, underwater exploration, disaster prediction and prevention, as well as with intelligent monitoring and security. The IoUT has an influence at various scales ranging from a small scientific observatory, to a midsized harbor, and to covering global oceanic trade. The network architecture of IoUT is intrinsically heterogeneous and should be sufficiently resilient to operate in harsh environments. This creates major challenges in terms of underwater communications, whilst relying on limited energy resources. Additionally, the volume, velocity, and variety of data produced by sensors, hydrophones, and cameras in IoUT is enormous, giving rise to the concept of Big Marine Data (BMD), which has its own processing challenges. Hence, conventional data processing techniques will falter, and bespoke Machine Learning (ML) solutions have to be employed for automatically learning the specific BMD behavior and features facilitating knowledge extraction and decision support. The motivation of this paper is to comprehensively survey the IoUT, BMD, and their synthesis. It also aims for exploring the nexus of BMD with ML. We set out from underwater data collection and then discuss the family of IoUT data communication techniques with an emphasis on the state-of-the-art research challenges. We then review the suite of ML solutions suitable for BMD handling and analytics. We treat the subject deductively from an educational perspective, critically appraising the material surveyed.Comment: 54 pages, 11 figures, 19 tables, IEEE Communications Surveys & Tutorials, peer-reviewed academic journa

arXiv.org e-Print Archive

ResearchOnline at James Cook University

Machine-assisted mixed methods: augmenting humanities and social sciences with artificial intelligence

Author: Karjus Andres
Publication venue
Publication date: 24/09/2023
Field of study

The increasing capacities of large language models (LLMs) present an unprecedented opportunity to scale up data analytics in the humanities and social sciences, augmenting and automating qualitative analytic tasks previously typically allocated to human labor. This contribution proposes a systematic mixed methods framework to harness qualitative analytic expertise, machine scalability, and rigorous quantification, with attention to transparency and replicability. 16 machine-assisted case studies are showcased as proof of concept. Tasks include linguistic and discourse analysis, lexical semantic change detection, interview analysis, historical event cause inference and text mining, detection of political stance, text and idea reuse, genre composition in literature and film; social network inference, automated lexicography, missing metadata augmentation, and multimodal visual cultural analytics. In contrast to the focus on English in the emerging LLM applicability literature, many examples here deal with scenarios involving smaller languages and historical texts prone to digitization distortions. In all but the most difficult tasks requiring expert knowledge, generative LLMs can demonstrably serve as viable research instruments. LLM (and human) annotations may contain errors and variation, but the agreement rate can and should be accounted for in subsequent statistical modeling; a bootstrapping approach is discussed. The replications among the case studies illustrate how tasks previously requiring potentially months of team effort and complex computational pipelines, can now be accomplished by an LLM-assisted scholar in a fraction of the time. Importantly, this approach is not intended to replace, but to augment researcher knowledge and skills. With these opportunities in sight, qualitative expertise and the ability to pose insightful questions have arguably never been more critical

arXiv.org e-Print Archive

Improving Security and Reliability of Physical Unclonable Functions Using Machine Learning

Author: Wen Yuejiang
Publication venue: Clemson University Libraries
Publication date: 01/08/2018
Field of study

Physical Unclonable Functions (PUFs) are promising security primitives for device authenti-cation and key generation. Due to the noise inﬂuence, reliability is an important performance metric of PUF-based authentication. In the literature, lots of eﬀorts have been devoted to enhancing PUF reliability by using error correction methods such as error-correcting codes and fuzzy extractor. Ho-wever, one property that most of these prior works overlooked is the non-uniform distribution of PUF response across diﬀerent bits. This wok proposes a two-step methodology to improve the reliability of PUF under noisy conditions. The ﬁrst step involves acquiring the parameters of PUF models by using machine lear-ning algorithms. The second step then utilizes these obtained parameters to improve the reliability of PUFs by selectively choosing challenge-response pairs (CRPs) for authentication. Two distinct algorithms for improving the reliability of multiplexer (MUX) PUF, i.e., total delay diﬀerence thresholding and sensitive bits grouping, are presented. It is important to note that the methodology can be easily applied to other types of PUFs as well. Our experimental results show that the relia-bility of PUF-based authentication can be signiﬁcantly improved by the proposed approaches. For example, in one experimental setting, the reliability of an MUX PUF is improved from 89.75% to 94.07% using total delay diﬀerence thresholding, while 89.30% of generated challenges are stored. As opposed to total delay diﬀerence thresholding, sensitive bits grouping possesses higher eﬃciency, as it can produce reliable CRPs directly. Our experimental results show that the reliability can be improved to 96.91% under the same setting, when we group 12 bits in the challenge vector of a 128-stage MUX PUF. Besides, because the actual noise varies greatly in diï¬€erent conditions, it is hard to predict the error of of each individual PUF response bit. This wok proposes a novel methodology to improve the efficiency of PUF response error correction based on error-rates. The proposed method first obtains the PUF model by using machine learning techniques, which is then used to predict the error-rates. Intuitively, we are inclined to tolerate errors in PUF response bits with relatively higher error-rates. Thus, we propose to treat diﬀerent PUF response bits with diﬀerent degrees of error tolerance, according to their estimated error-rates. Speciﬁcally, by assigning optimized weights, i.e., 0, 1, 2, 3, and inﬁnity to PUF response bits, while a small portion of high error rates responses are truncated; the other responses are duplicated to a limited number of bits according to error-rates before error correction and a portion of low error-rates responses bypass the error correction as direct keys. The hardware cost for error correction can also be reduced by employing these methods. Response weighting is capable of reducing the false negative and false positive simultaneously. The entropy can also be controlled. Our experimental results show that the response weighting algorithm can reduce not only the false negative from 20.60% to 1.71%, but also the false positive rate from 1.26 × 10−21 to 5.38 × 10−22 for a PUF-based authentication with 127-bit response and 13-bit error correction. Besides, three case studies about the applications of the proposed algorithm are also discussed. Along with the rapid development of hardware security techniques, the revolutionary gro-wth of countermeasures or attacking methods developed by intelligent and adaptive adversaries have signiﬁcantly complicated the ability to create secure hardware systems. Thus, there is a critical need to (re)evaluate existing or new hardware security techniques against these state-of-the-art attacking methods. With this in mind, this wok presents a novel framework for incorporating active learning techniques into hardware security ﬁeld. We demonstrate that active learning can signiﬁcantly im-prove the learning eﬃciency of PUF modeling attack, which samples the least conﬁdent and the most informative challenge-response pair (CRP) for training in each iteration. For example, our ex-perimental results show that in order to obtain a prediction error below 4%, 2790 CRPs are required in passive learning, while only 811 CRPs are required in active learning. The sampling strategies and detailed applications of PUF modeling attack under various environmental conditions are also discussed. When the environment is very noisy, active learning may sample a large number of mis-labeled CRPs and hence result in high prediction error. We present two methods to mitigate the contradiction between informative and noisy CRPs. At last, it is critical to design secure PUF, which can mitigate the countermeasures or modeling attacking from intelligent and adaptive adversaries. Previously, researchers devoted to hiding PUF information by pre- or post processing of PUF challenge/response. However, these methods are still subject to side-channel analysis based hybrid attacks. Methods for increasing the non-linearity of PUF structure, such as feedforward PUF, cascade PUF and subthreshold current PUF, have also been proposed. However, these methods signiﬁcantly degrade the reliability. Based on the previous work, this work proposes a novel concept, noisy PUF, which achieves modeling attack resistance while maintaining a high degree of reliability for selected CRPs. A possible design of noisy PUF along with the corresponding experimental results is also presented

Clemson University: TigerPrints

Advances in Intelligent Vehicle Control

Author
Publication venue: 'MDPI AG'
Publication date: 06/12/2022
Field of study

This book is a printed edition of the Special Issue Advances in Intelligent Vehicle Control that was published in the journal Sensors. It presents a collection of eleven papers that covers a range of topics, such as the development of intelligent control algorithms for active safety systems, smart sensors, and intelligent and efficient driving. The contributions presented in these papers can serve as useful tools for researchers who are interested in new vehicle technology and in the improvement of vehicle control systems

Directory of Open Access Books (DOAB)