Search CORE

161 research outputs found

Exploring Feature-Level Duplications on Imbalanced Data Using Stochastic Diffusion Search

Author: al-Rifaie Mohammad Majid
Alhakbani Haya Abdullah
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

One of the computer algorithms inspired by swarm intelligence is stochastic diffusion search (SDS). SDS uses some of the processes and techniques found in swarm to solve search and optimisation problems. In this paper, a hybrid approach is proposed to deal with real-world imbalanced data. The proposed model involves oversampling the minority class, undersampling the majority class as well as optimising the parameters of the classifier, Support Vector Machine (SVM). The proposed model uses Synthetic Minority Over-sampling Technique (SMOTE) to perform the oversampling and the agents of a swarm intelligence technique, SDS, to perform an `informed' undersampling on the majority classes. In addition to comparing the agents-led undersampling with random undersampling, the results are contrasted against other best known techniques on nine real-world datasets. Moreover, the behaviour of SDS agents in this context is also analysed

Goldsmiths Research Online

Crossref

Handling Class Imbalance Using Swarm Intelligence Techniques, Hybrid Data and Algorithmic Level Solutions

Author: Alhakbani Haya
Publication venue
Publication date
Field of study

This research focuses mainly on the binary class imbalance problem in data mining. It investigates the use of combined approaches of data and algorithmic level solutions. Moreover, it examines the use of swarm intelligence and population-based techniques to combat the class imbalance problem at all levels, including at the data, algorithmic, and feature level. It also introduces various solutions to the class imbalance problem, in which swarm intelligence techniques like Stochastic Diffusion Search (SDS) and Dispersive Flies Optimisation (DFO) are used. The algorithms were evaluated using experiments on imbalanced datasets, in which the Support Vector Machine (SVM) was used as a classifier. SDS was used to perform informed undersampling of the majority class to balance the dataset. The results indicate that this algorithm improves the classifier performance and can be used on imbalanced datasets. Moreover, SDS was extended further to perform feature selection on high dimensional datasets. Experimental results show that SDS can be used to perform feature selection and improve the classifier performance on imbalanced datasets. Further experiments evaluated DFO as an algorithmic level solution to optimise the SVM kernel parameters when learning from imbalanced datasets. Based on the promising results of DFO in these experiments, the novel approach was extended further to provide a hybrid algorithm that simultaneously optimises the kernel parameters and performs feature selection

Goldsmiths Research Online

Deciphering the evolutionary impact of gen(om)e duplications through mechanistic modeling of the genotype-phenotype map

Author: Gutiérrez Betancur Jayson Arley
Publication venue: Ghent University. Faculty of Sciences
Publication date: 01/01/2014
Field of study

Ghent University Academic Bibliography

Integrating Deep Learning with Correlation-based Multimedia Semantic Concept Detection

Author: Ha Hsin-Yu
Publication venue: FIU Digital Commons
Publication date: 01/01/2015
Field of study

The rapid advances in technologies make the explosive growth of multimedia data possible and available to the public. Multimedia data can be defined as data collection, which is composed of various data types and different representations. Due to the fact that multimedia data carries knowledgeable information, it has been widely adopted to different genera, like surveillance event detection, medical abnormality detection, and many others. To fulfil various requirements for different applications, it is important to effectively classify multimedia data into semantic concepts across multiple domains. In this dissertation, a correlation-based multimedia semantic concept detection framework is seamlessly integrated with the deep learning technique. The framework aims to explore implicit and explicit correlations among features and concepts while adopting different Convolutional Neural Network (CNN) architectures accordingly. First, the Feature Correlation Maximum Spanning Tree (FC-MST) is proposed to remove the redundant and irrelevant features based on the correlations between the features and positive concepts. FC-MST identifies the effective features and decides the initial layer\u27s dimension in CNNs. Second, the Negative-based Sampling method is proposed to alleviate the data imbalance issue by keeping only the representative negative instances in the training process. To adjust dierent sizes of training data, the number of iterations for the CNN is determined adaptively and automatically. Finally, an Indirect Association Rule Mining (IARM) approach and a correlation-based re-ranking method are proposed to reveal the implicit relationships from the correlations among concepts, which are further utilized together with the classification scores to enhance the re-ranking process. The framework is evaluated using two benchmark multimedia data sets, TRECVID and NUS-WIDE, which contain large amounts of multimedia data and various semantic concepts

DigitalCommons@Florida International University

Classification of Resting-State fMRI using Evolutionary Algorithms: Towards a Brain Imaging Biomarker for Parkinson’s Disease

Author: Dehsarvi Amir
Publication venue: University of York
Publication date: 30/01/2018
Field of study

It is commonly accepted that accurate early diagnosis and monitoring of neurodegenerative conditions is essential for effective disease management and delivery of medication and treatment. This research develops automatic methods for detecting brain imaging preclinical biomarkers for Parkinson’s disease (PD) by considering the novel application of evolutionary algorithms. An additional novel element of this work is the use of evolutionary algorithms to both map and predict the functional connectivity in patients using rs-fMRI data. Specifically, Cartesian Genetic Programming was used to classify dynamic causal modelling data as well as timeseries data. The findings were validated using two other commonly used classification methods (Artificial Neural Networks and Support Vector Machines) and by employing k-fold cross-validation. Across dynamic causal modelling and timeseries analyses, findings revealed maximum accuracies of 75.21% for early stage (prodromal) PD patients in which patients reveal no motor symptoms versus healthy controls, 85.87% for PD patients versus prodromal PD patients, and 92.09% for PD patients versus healthy controls. Prodromal PD patients were classified from healthy controls with high accuracy – this is notable and represents the key finding since current methods of diagnosing prodromal PD have low reliability and low accuracy. Furthermore, Cartesian Genetic Programming provided comparable performance accuracy relative to Artificial Neural Networks and Support Vector Machines. Nevertheless, evolutionary algorithms enable us to decode the classifier in terms of understanding the data inputs that are used, more easily than in Artificial Neural Networks and Support Vector Machines. Hence, these findings underscore the relevance of both dynamic causal modelling analyses for classification and Cartesian Genetic Programming as a novel classification tool for brain imaging data with medical implications for disease diagnosis, particularly in early stages 5-20 years prior to motor symptoms

White Rose E-theses Online

A deep generative model framework for creating high quality synthetic transaction sequences

Author: Nickerson Kyle
Publication venue: Memorial University of Newfoundland
Publication date: 01/08/2023
Field of study

Synthetic data are artificially generated data that closely model real-world measurements, and can be a valuable substitute for real data in domains where it is costly to obtain real data, or privacy concerns exist. Synthetic data has traditionally been generated using computational simulations, but deep generative models (DGMs) are increasingly used to generate high-quality synthetic data. In this thesis, we create a framework which employs DGMs for generating highquality synthetic transaction sequences. Transaction sequences, such as we may see in an online banking platform, or credit card statement, are important type of financial data for gaining insight into financial systems. However, research involving this type of data is typically limited to large financial institutions, as privacy concerns often prevent academic researchers from accessing this kind of data. Our work represents a step towards creating shareable synthetic transaction sequence datasets, containing data not connected to any actual humans. To achieve this goal, we begin by developing Banksformer, a DGM based on the transformer architecture, which is able to generate high-quality synthetic transaction sequences. Throughout the remainder of the thesis, we develop extensions to Banksformer that further improve the quality of data we generate. Additionally, we perform extensively examination of the quality synthetic data produced by our method, both with qualitative visualizations and quantitative metrics

Memorial University Research Repository

Document-level sentiment analysis of email data

Author: Liu Sisi
Publication venue
Publication date: 01/01/2020
Field of study

Sisi Liu investigated machine learning methods for Email document sentiment analysis. She developed a systematic framework that has been qualitatively and quantitatively proved to be effective and efficient in identifying sentiment from massive amount of Email data. Analytical results obtained from the document-level Email sentiment analysis framework are beneficial for better decision making in various business settings

ResearchOnline@JCU

ResearchOnline at James Cook University

Spatial reaction systems on parallel supercomputers

Author: Smith Mark
Publication venue: The University of Edinburgh
Publication date: 01/01/1994
Field of study

Edinburgh Research Archive

A rapid review exploring the effectiveness of artificial intelligence for cancer diagnosis

Author: Ayres Toby
Cooper Alison
Davies Jacob
Edwards Adrian
Edwards Rhiannon Tudor
Lewis Ruth
Okolie Chukwudi
Shaw Hannah
Wale Alesha
Publication venue
Publication date: 09/11/2023
Field of study

Bangor University Research Portal

Bioinformatics Applications Based On Machine Learning

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

The great advances in information technology (IT) have implications for many sectors, such as bioinformatics, and has considerably increased their possibilities. This book presents a collection of 11 original research papers, all of them related to the application of IT-related techniques within the bioinformatics sector: from new applications created from the adaptation and application of existing techniques to the creation of new methodologies to solve existing problems

Directory of Open Access Books (DOAB)