Search CORE

517 research outputs found

Computational models and approaches for lung cancer diagnosis

Author: Azzawi Hasseeb
Publication venue: Deakin University, Faculty of Science, Engineering and Built Environment, School of Information Technology
Publication date: 01/10/2019
Field of study

The success of treatment of patients with cancer depends on establishing an accurate diagnosis. To this end, the aim of this study is to developed novel lung cancer diagnostic models. New algorithms are proposed to analyse the biological data and extract knowledge that assists in achieving accurate diagnosis results

Deakin Research Online

Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis

Author: Al-Rajab Murad
Lu Joan
Qiang Xu
Publication venue: 'Elsevier BV'
Publication date: 01/07/2017
Field of study

Background and Objectives: This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process. Methods: In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these. Results: It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Sup- port Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%). Conclusions: It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society

University of Huddersfield Repository

Huddersfield Research Portal

VGG19+CNN: Deep Learning-Based Lung Cancer Classification with Meta-Heuristic Feature Selection Methodology

Author: Devarakonda Nagaraju
Nandipati Bhagya Lakshmi
Publication venue: IAES Indonesia Section
Publication date: 25/03/2023
Field of study

Lung illnesses are lung-affecting illnesses that harm the respiratory mechanism. Lung cancer is one of the major causes of death in humans internationally. Advance diagnosis could optimise survivability amongst humans. This remains feasible to systematise or reinforce the radiologist for cancer prognosis. PET and CT scanned images can be used for lung cancer detection. On the whole, the CT scan exhibits importance on the whole and functions as a comprehensive operation in former cancer prognosis. Thus, to subdue specific faults in choosing the feature and optimise classification, this study employs a new revolutionary algorithm called the Accelerated Wrapper-based Binary Artificial Bee Colony algorithm (AWBABCA) for effectual feature selection and VGG19+CNN for classifying cancer phases. The morphological features will be extracted out of the pre-processed image; next, the feature or nodule related to the lung that possesses a significant impact on incurring cancer will be chosen, and for this intention, herein AWBABCA has been employed. The chosen features will be utilised for cancer classification, facilitating a great level of strength and precision. Using the lung dataset to do an experimental evaluation shows that the proposed classifier got the best accuracy, precision, recall, and f1-score

Indonesian Journal of Electrical Engineering and Informatics (IJEEI)

Gene selection and classification in autism gene expression data

Author: Al-Jaf Shilan Sameen Hameed
Publication venue
Publication date: 01/01/2017
Field of study

Autism spectrum disorders (ASD) are neurodevelopmental disorders that are currently diagnosed on the basis of abnormal stereotyped behaviour as well as observable deficits in communication and social functioning. Although a variety of candidate genes have been attributed to the disorder, no single gene is applicable to more than 1–2% of the general ASD population. Despite extensive efforts, definitive genes that contribute to autism susceptibility have yet to be identified. The major problems in dealing with the gene expression dataset of autism include the presence of limited number of samples and large noises due to errors of experimental measurements and natural variation. In this study, a systematic combination of three important filters, namely t-test (TT), Wilcoxon Rank Sum (WRS) and Feature Correlation (COR) are applied along with efficient wrapper algorithm based on geometric binary particle swarm optimization-support vector machine (GBPSO-SVM), aiming at selecting and classifying the most attributed genes of autism. A new approach based on the criterion of median ratio, mean ratio and variance deviations is also applied to reduce the initial dataset prior to its involvement. Results showed that the most discriminative genes that were identified in the first and last selection steps concluded the presence of a repetitive gene (CAPS2), which was assigned as the most ASD risk gene. The fused result of genes subset that were selected by the GBPSO-SVM algorithm increased the classification accuracy to about 92.10%, which is higher than those reported in literature for the same autism dataset. Noticeably, the application of ensemble using random forest (RF) showed better performance compared to that of previous studies. However, the ensemble approach based on the employment of SVM as an integrator of the fused genes from the output branches of GBPSO-SVM outperformed the RF integrator. The overall improvement was ascribed to the selection strategies that were taken to reduce the dataset and the utilization of efficient wrapper based GBPSO-SVM algorithm

Universiti Teknologi Malaysia Institutional Repository

Gene selection for cancer classification with the help of bees

Author: A Balmain
A Banharnsakun
A Bhattacharjee
A Brazma
A Choudhary
A Dussutour
A Farji-Brener
A Statnikov
A Statnikov
AG Karegowda
AI Su
AV Tinker
B Wu
BJ Norton
BK Verma
C Giallourakis
C Lazar
C Xu
CA Markowski
CC Chang
CJ Tu
CL Nutt
CM Bishop
D Chen
D Karaboga
D Karaboga
D Karaboga
D Karaboga
D Karaboga
D Karaboga
D Singh
D Teodorovic
DM Gordon
DM Gordon
DM Gordon
DV Nguyen
EL Lehmann
ER Dougherty
F Ahmade
F Emmert-Streib
F Kang
F Kang
F Roces
F Roces
F Wilcoxon
FJ Rodriguez
G George
G Li
G Stephanopoulos
G Xu
G Yan
G Zhu
GEP Box
H Drias
H Hu
H Liu
H Shah
H Sharma
H Torres-Contreras
H Yu
H Zhang
HF Wedde
I Eksin
I Guyon
I Guyon
I Inza
J Hamidi
J Ji
J Kennedy
J Khan
J Kiefer
J Wang
J Xu
J-Q Li
JC Bansal
JC Bansal
JC Chang
JD Gibbons
JE Staunton
JG Zhang
JH Cho
JJ Howard
JJ Liu
JL Deneubourg
Johra Muhammad Moosa
JW Lee
L Breiman
L Deng
L Lan
L Li
L Wang
LW Jacobs
LY Chuang
LY Chuang
LY Chuang
LY Chuang
M Bollazzi
M Dorigo
M Hollander
M Kefayat
M Mohamad
M Pirooznia
M Schena
MA Shipp
MA Tahir
MH Kashan
MJ Greene
Mohammad Kaykobad
Mohammad Sohel Rahman
MS Mohamad
MS Mohamad
MS Mohamad
N Todorovic
OK Erol
P Mukherjee
PA Devijver
PE Lønning
PW TSai
PY Kumbhar
Q Shen
Q Zhou
QK Pan
QK Pan
R Akbari
R Cai
R Debnath
R Díaz-Uriarte
R Hooke
R Kohavi
R Kohavi
R Mallika
R Murugan
R Ruiz
Rameen Shakur
RJ Schafer
RN Khushaba
S Bicciato
S Bitam
S Dudoit
S Guo
S Knudsen
S Kumar
S Kumar
S Li
S Omkar
S Pavlidis
S Ramaswamy
S Siegel
S Sundar
S Wang
S Yang
SA Armstrong
SL Pomeroy
SL Wang
SP Fodor
SS Jadon
SS Jeffrey
T Davidović
T Li
T Stützle
TK Sharma
TM Cover
TR Golub
TS Furey
V Saravanan
V Tereshko
V Tereshko
V Tereshko
VN Vapnik
W Li
W Li
W Szeto
W-F Gao
WH Au
WH Kruskal
WH Press
X Wang
X Yan
X Yu
X Zhou
Y Leung
Y Lu
Y Saeys
Y Tan
Y Wang
Y Wang
Y Xu
Y Zhang
Y Zhang
Z Liu
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Memetic micro-genetic algorithms for cancer data classification

Author: Carballido Jessica Andrea
Olivera Ana Carolina
Rojas Matias Gabriel
Vidal Pablo Javier
Publication venue: Elsevier
Publication date: 01/01/2023
Field of study

Fast and precise medical diagnosis of human cancer is crucial for treatment decisions. Gene selection consists of identifying a set of informative genes from microarray data to allow high predictive accuracy in human cancer classification. This task is a combinatorial search problem, and optimisation methods can be applied for its resolution. In this paper, two memetic micro-genetic algorithms (MμV1 and MμV2) with different hybridisation approaches are proposed for feature selection of cancer microarray data. Seven gene expression datasets are used for experimentation. The comparison with stochastic state-of-the-art optimisation techniques concludes that problem-dependent local search methods combined with micro-genetic algorithms improve feature selection of cancer microarray data.Fil: Rojas, Matias Gabriel. Universidad Nacional de Lujan. Centro de Investigacion Docencia y Extension En Tecnologias de la Informacion y Las Comunicaciones.; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza; ArgentinaFil: Olivera, Ana Carolina. Universidad Nacional de Cuyo. Facultad de Ingeniería; Argentina. Universidad Nacional de Lujan. Centro de Investigacion Docencia y Extension En Tecnologias de la Informacion y Las Comunicaciones.; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza; ArgentinaFil: Carballido, Jessica Andrea. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Vidal, Pablo Javier. Universidad Nacional de Cuyo. Facultad de Ingeniería; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza; Argentin

CONICET Digital

Directory of Open Access Journals

Improved Reptile Search Optimization Algorithm using Chaotic map and Simulated Annealing for Feature Selection in Medical Filed

Author: Alomari Osama Ahmad
Elgamal Zenab
Makhadmeh Sharif Naser
Sabri Aznul Qalid Md
Tbaishat Dina
Tubishat Mohammad
Publication venue: ZU Scholars
Publication date: 01/01/2022
Field of study

The increased volume of medical datasets has produced high dimensional features, negatively affecting machine learning (ML) classifiers. In ML, the feature selection process is fundamental for selecting the most relevant features and reducing redundant and irrelevant ones. The optimization algorithms demonstrate its capability to solve feature selection problems. Reptile Search Algorithm (RSA) is a new nature-inspired optimization algorithm that stimulates Crocodiles’ encircling and hunting behavior. The unique search of the RSA algorithm obtains promising results compared to other optimization algorithms. However, when applied to high-dimensional feature selection problems, RSA suffers from population diversity and local optima limitations. An improved metaheuristic optimizer, namely the Improved Reptile Search Algorithm (IRSA), is proposed to overcome these limitations and adapt the RSA to solve the feature selection problem. Two main improvements adding value to the standard RSA; the first improvement is to apply the chaos theory at the initialization phase of RSA to enhance its exploration capabilities in the search space. The second improvement is to combine the Simulated Annealing (SA) algorithm with the exploitation search to avoid the local optima problem. The IRSA performance was evaluated over 20 medical benchmark datasets from the UCI machine learning repository. Also, IRSA is compared with the standard RSA and state-of-the-art optimization algorithms, including Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Grasshopper Optimization algorithm (GOA) and Slime Mould Optimization (SMO). The evaluation metrics include the number of selected features, classification accuracy, fitness value, Wilcoxon statistical test (p-value), and convergence curve. Based on the results obtained, IRSA confirmed its superiority over the original RSA algorithm and other optimized algorithms on the majority of the medical datasets

ZU Scholars (Zayed University)

Fast learning optimized prediction methodology for protein secondary structure prediction, relative solvent accessibility prediction and phosphorylation prediction

Author: Sundararajan Saraswathi
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2011
Field of study

Computational methods are rapidly gaining importance in the field of structural biology, mostly due to the explosive progress in genome sequencing projects and the large disparity between the number of sequences and the number of structures. There has been an exponential growth in the number of available protein sequences and a slower growth in the number of structures. There is therefore an urgent need to develop computed structures and identify the functions of these sequences. Developing methods that will satisfy these needs both efficiently and accurately is of paramount importance for advances in many biomedical fields, for a better basic understanding of aberrant states of stress and disease, including drug discovery and discovery of biomarkers. Several aspects of secondary structure predictions and other protein structure-related predictions are investigated using different types of information such as data obtained from knowledge-based potentials derived from amino acids in protein sequences, physicochemical properties of amino acids and propensities of amino acids to appear at the ends of secondary structures. Investigating the performance of these secondary structure predictions by type of amino acid highlights some interesting aspects relating to the influences of the individual amino acid types on formation of secondary structures and points toward ways to make further gains. Other research areas include Relative Solvent Accessibility (RSA) predictions and predictions of phosphorylation sites, which is one of the Post-Translational Modification (PTM) sites in proteins. Protein secondary structures and other features of proteins are predicted efficiently, reliably, less expensively and more accurately. A novel method called Fast Learning Optimized PREDiction (FLOPRED) Methodology is proposed for predicting protein secondary structures and other features, using knowledge-based potentials, a Neural Network based Extreme Learning Machine (ELM) and advanced Particle Swarm Optimization (PSO) techniques that yield better and faster convergence to produce more accurate results. These techniques yield superior classification of secondary structures, with a training accuracy of 93.33% and a testing accuracy of 92.24% with a standard deviation of 0.48% obtained for a small group of 84 proteins. We have a Matthew\u27s correlation-coefficient ranging between 80.58% and 84.30% for these secondary structures. Accuracies for individual amino acids range between 83% and 92% with an average standard deviation between 0.3% and 2.9% for the 20 amino acids. On a larger set of 415 proteins, we obtain a testing accuracy of 86.5% with a standard deviation of 1.38%. These results are significantly higher than those found in the literature. Prediction of protein secondary structure based on amino acid sequence is a common technique used to predict its 3-D structure. Additional information such as the biophysical properties of the amino acids can help improve the results of secondary structure prediction. A database of protein physicochemical properties is used as features to encode protein sequences and this data is used for secondary structure prediction using FLOPRED. Preliminary studies using a Genetic Algorithm (GA) for feature selection, Principal Component Analysis (PCA) for feature reduction and FLOPRED for classification give promising results. Some amino acids appear more often at the ends of secondary structures than others. A preliminary study has indicated that secondary structure accuracy can be improved as much as 6% by including these effects for those residues present at the ends of alpha-helix, beta-strand and coil. A study on RSA prediction using ELM shows large gains in processing speed compared to using support vector machines for classification. This indicates that ELM yields a distinct advantage in terms of processing speed and performance for RSA. Additional gains in accuracies are possible when the more advanced FLOPRED algorithm and PSO optimization are implemented. Phosphorylation is a post-translational modification on proteins often controls and regulates their activities. It is an important mechanism for regulation. Phosphorylated sites are known to be present often in intrinsically disordered regions of proteins lacking unique tertiary structures, and thus less information is available about the structures of phosphorylated sites. It is important to be able to computationally predict phosphorylation sites in protein sequences obtained from mass-scale sequencing of genomes. Phosphorylation sites may aid in the determination of the functions of a protein and to better understanding the mechanisms of protein functions in healthy and diseased states. FLOPRED is used to model and predict experimentally determined phosphorylation sites in protein sequences. Our new PSO optimization included in FLOPRED enable the prediction of phosphorylation sites with higher accuracy and with better generalization. Our preliminary studies on 984 sequences demonstrate that this model can predict phosphorylation sites with a training accuracy of 92.53% , a testing accuracy 91.42% and Matthew\u27s correlation coefficient of 83.9%. In summary, secondary structure prediction, Relative Solvent Accessibility and phosphorylation site prediction have been carried out on multiple sets of data, encoded with a variety of information drawn from proteins and the physicochemical properties of their constituent amino acids. Improved and efficient algorithms called S-ELM and FLOPRED, which are based on Neural Networks and Particle Swarm Optimization are used for classifying and predicting protein sequences. Analysis of the results of these studies provide new and interesting insights into the influence of amino acids on secondary structure prediction. S-ELM and FLOPRED have also proven to be robust and efficient for predicting relative solvent accessibility of proteins and phosphorylation sites. These studies show that our method is robust and resilient and can be applied for a variety of purposes. It can be expected to yield higher classification accuracy and better generalization performance compared to previous methods

Digital Repository @ Iowa State University (ISU)

Feature Selection with the MS-EPSO Algorithm to Predict Cardiac Pathology in Children and Teenagers

Author: Mário Tasso Ribeiro Serra Neto
Publication venue
Publication date: 28/07/2020
Field of study

Repositório Aberto da Universidade do Porto