Search CORE

15 research outputs found

A new genetic algorithm for multi-label correlation-based feature selection.

Author: Freitas Alex A.
Jungjit Suwimol
Publication venue: ESANN
Publication date: 01/04/2015
Field of study

This paper proposes a new Genetic Algorithm for Multi-Label Correlation-Based Feature Selection (GA-ML-CFS). This GA performs a global search in the space of candidate feature subset, in order to select a high-quality feature subset is used by a multi-label classification algorithm - in this work, the Multi-Label k-NN algorithm. We compare the results of GA-ML-CFS with the results of the previously proposed Hill-Climbing for Multi-Label Correlation-Based Feature Selection (HC-ML-CFS), across 10 multi-label datasets

Kent Academic Repository

Microarray gene expression ranking with Z-score for Cancer Classification

Author: M .Yasodha, Dr P Ponmuthuramalingam
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/08/2014
Field of study

Over the past few decade there has been explosion in the amount of genomic data available to biomedical engineer due to the advantage of biotechnology. For example using microarray it is possible to find out a persons gene expressions profile more than 30000 genomes. Among this one of the most important gene selection problem is gene ranking. Here we will describe Z-score ranking for microarray gene expression selection. In that technique it choose the gene and then applied the Z-Score Ranking technique and then divides the genes into subsets with Successive Feature selection and then finally LDA Applied for the result. With this Z-score ranking technique we will get the accurate results and less effort. The Lymphoma and Leukemia dataset genes are utilized. The proposed technique shows capable classification accuracy for the whole test data sets

International Journal on Recent and Innovation Trends in Computing and Communication

Feature Extraction of Chest X-ray Images and Analysis using PCA and kPCA

Author: H Roopa
T Asha
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2018
Field of study

Tuberculosis (TB) is an infectious disease caused by mycobacterium which can be diagnosed by its various symptoms like fever, cough, etc. Tuberculosis can also be analyzed by understanding the chest x-ray of the patient which is revealed by an expert physician .The chest x-ray image contains many features which cannot be directly used by any computer system for analyzing the disease. Features of chest x-ray images must be understood and extracted, so that it can be processed to a form to be fed to any computer system for disease analysis. This paper presents feature extraction of chest x-ray image which can be used as an input for any data mining algorithm for TB disease analysis. So texture and shape based features are extracted from x-ray image using image processing concepts. The features extracted are analyzed using principal component analysis (PCA) and kernel principal component analysis (kPCA) techniques. Filter and wrapper feature selection method using linear regression model were applied on these techniques. The performance of PCA and kPCA are analyzed and found that the accuracy of PCA using wrapper approach is 96.07% when compared to the accuracy of kPCA which is 62.50%. PCA performs well than kPCA with a good accuracy

Crossref

ZENODO

Institute of Advanced Engineering and Science

Comparing Prediction Accuracy for Machine Learning and Other Classical Approaches in Gene Expression Data

Author: Kar Setu Chandra
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 29/10/2014
Field of study

Microarray based gene expression profiling has been emerged as an efficient technique for cancer classification, as well as for diagnosis, prognosis, and treatment purposes. The classification of different tumor types is of great significance in cancer diagnosis and drug innovation. Using a large number of genes to classify samples based on a small number of microarrays remains a difficult problem. Feature selection techniques can be used to extract the marker genes which influence the classification accuracy effectively by eliminating the unwanted noisy and redundant genes. Quite a number of methods have been proposed in recent years with promising results. But there are still a lot of issues which need to be addressed and understood. Diagonal discriminant analysis, regularized discriminant analysis, support vector machines and k-nearest neighbor have been suggested as among the best methods for small sample size situations. In this paper, we have compared the performance of different discrimination methods for the classification of tumors based on gene expression data. The methods are applied to datasets from four recently published cancer gene expression studies. The performance of the classification technique has been evaluated for varying number of selected features in terms of misclassification rate using hold-out cross validation. Our study shows that KNN, RDA and SVM with linear kernel methods have lower misclassification rate than the other algorithms. Keywords: microarray, gene expression, KNN, DLDA, RDA, SV

International Institute for Science, Technology and Education (IISTE): E-Journals

Comparing Prediction Accuracy for Supervised Techniques in Gene Expression Data

Author: Kar Setu Chandra
Publication venue: Mathematical Theory and Modeling
Publication date: 30/05/2014
Field of study

Classification is one of the most important tasks for different application such as text categorization, tone recognition, image classification, micro-array gene expression, proteins structure predictions, data classification etc. Microarray based gene expression profiling has been emerged as an efficient technique for cancer classification, as well as for diagnosis, prognosis, and treatment purposes. The classification of different tumor types is of great significance in cancer diagnosis and drug innovation. One challenging area in the studies of gene expression data is the classification of different types of tumors into correct classes. Diagonal discriminant analysis, regularized discriminant analysis, support vector machines and k-nearest neighbor have been suggested as among the best methods for small sample size situations. The methods are applied to datasets from four recently published cancer gene expression studies. Four publicly available microarray data sets are Leukemia, Lymphoma, SRBCT & Prostate. The performance of the classification technique has been evaluated according to the percentage of misclassification through hold-out cross validation

International Institute for Science, Technology and Education (IISTE): E-Journals

Effect of Feature Selection on Gene Expression Datasets Classification Accurac

Author: Lazaar Mohamed
Omara Hicham
Tabii Youness
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/10/2018
Field of study

Feature selection attracts researchers who deal with machine learning and data mining. It consists of selecting the variables that have the greatest impact on the dataset classification, and discarding the rest. This dimentionality reduction allows classifiers to be fast and more accurate. This paper traits the effect of feature selection on the accuracy of widely used classifiers in literature. These classifiers are compared with three real datasets which are pre-processed with feature selection methods. More than 9% amelioration in classification accuracy is observed, and k-means appears to be the most sensitive classifier to feature selection

IAES journal

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

EGFAFS:A Novel Feature Selection Algorithm Based on Explosion Gravitation Field Algorithm

Author: Fu Yuan
Hu Xuemei
Huang Lan
Wang Yan
Publication venue
Publication date: 01/06/2022
Field of study

Feature selection (FS) is a vital step in data mining and machine learning, especially for analyzing the data in high-dimensional feature space. Gene expression data usually consist of a few samples characterized by high-dimensional feature space. As a result, they are not suitable to be processed by simple methods, such as the filter-based method. In this study, we propose a novel feature selection algorithm based on the Explosion Gravitation Field Algorithm, called EGFAFS. To reduce the dimensions of the feature space to acceptable dimensions, we constructed a recommended feature pool by a series of Random Forests based on the Gini index. Furthermore, by paying more attention to the features in the recommended feature pool, we can find the best subset more efficiently. To verify the performance of EGFAFS for FS, we tested EGFAFS on eight gene expression datasets compared with four heuristic-based FS methods (GA, PSO, SA, and DE) and four other FS methods (Boruta, HSICLasso, DNN-FS, and EGSG). The results show that EGFAFS has better performance for FS on gene expression data in terms of evaluation metrics, having more than the other eight FS algorithms. The genes selected by EGFAGS play an essential role in the differential co-expression network and some biological functions further demonstrate the success of EGFAFS for solving FS problems on gene expression data

Aberystwyth Research Portal

Directory of Open Access Journals

PubMed Central

Abordagens multivariadas para seleção de variáveis com vistas à classificação e predição de propriedades de amostras

Author: Yamashita Gabrielli Harumi
Publication venue
Publication date: 01/01/2021
Field of study

A seleção de variáveis é uma etapa importante para a análise de dados, visto que identifica os subconjuntos de variáveis mais informativas para a construção de modelos precisos de classificação e predição. Além disso, a seleção de variáveis facilita a interpretação e análise dos modelos obtidos, potencialmente reduzindo o tempo computacional de geração dos modelos e o custo/tempo para obtenção das amostras. Neste contexto, a presente tese apresenta proposições inovadoras de abordagens com vistas à seleção de variáveis para classificação e predição de propriedades de amostras de produtos diversos. Tais abordagens são abordadas em três artigos apresentados nesta tese, com intuito de melhorar a precisão dos modelos de classificação e predição em diferentes áreas. No primeiro artigo, integram-se índices de importância de variáveis a sistemáticas de classificação hierárquica para categorizar amostras de espumantes de acordo com seu país de origem. No segundo artigo, para selecionar as variáveis mais informativas para a predição de amostras via PLS, propõe-se um índice de importância de variáveis baseado na Lei de Lambert-Beer combinado a um processo iterativo de seleção do tipo forward. Por fim, o terceiro artigo utilizou cluster de variáveis espectrais e índice de importância para selecionar as variáveis que produzem modelos de predição mais consistentes. Em todos os artigos dessa tese, os resultados obtidos pelos métodos propostos foram superiores quando comparados a outros métodos tradicionais da literatura voltados à identificação das variáveis mais informativas.Variable selection is an important step in data analysis, since it identifies the most informative subsets of variables for build accurate classification and prediction models. In addition, variable selection improves the interpretation and analysis of obtained models, reduces the computational time to build models and reduces the obtained samples costs. In this context, this thesis presents propositions for a variable selection method aiming to classifying and predicting sample properties. Such methods are presented in three papers in this thesis, in order to improve the classification and prediction accuracy in different areas. In first paper, we applied variable importance index coupled with a hierarchical classification technique to identify the country of origin of sparkling wines. In second paper, to select the most informative variables for prediction, a variable improtance index was built based on Lambert-Beer law and an iterative forward process was performed. Finally, in third paper was used clustering of variables and variable importance index to select the variables that produce more consistent prediction models. In all papers of this thesis, when conpared to other traditional methods, our proposition obtained better results

Lume 5.8

Cell cycle and aging, morphogenesis, and response to stimuli genes are individualized biomarkers of glioblastoma progression and survival

Author: A Ganguly
A Martin
A Takeno
B Gyorffy
B Kwabi-Addo
B Salhia
BC Christensen
Bruce R Southey
C Brennan
C Chen
C Dai
C Houillier
C Prapinjumrune
C Welch
Cancer Genome Atlas Research Network
CE Pelloski
CI Dumur
CL Nutt
D Cigognini
D Krex
D Maucort-Boulch
D Michael
D Wang
DF Schaeffer
DN Martin
DR Cox
E Blaveri
E Razis
EU Sim
F Al-Shahrour
F Gao
FV Jacinto
G Minniti
G Sala
G Thomas
G Wang
H Ohgaki
HP Li
HS Phillips
I Nindl
IP Trougakos
J Madoz-Gurpide
J Novakova
J Rohozinski
J Soulier
J van den Boom
J Zhang
JA Doherty
JD Carpten
JG Hodgson
JH Kim
JM Campbell
JM Dreyfuss
JM Nigro
JN Rich
JN Rich
Jonathan E Beever
K Graham
KC Wei
KH Vousden
KK Lagerstedt
KL Gorringe
KR Delfino
Kristin R Delfino
L Frederick
L Wang
LP Fernandez
LY Chuang
LY Chuang
M Ashburner
M Bredel
M Ferletta
M Grade
M Kanehisa
M Lae
M Ocejo-Garcia
M Schraders
M Shirahata
M Tessema
M Weller
M Wrensch
ME Halatsch
ME Mullendore
MJ McGirt
MW Smith
N Butowski
N Ikenaga
NF Marko
Nicola VL Serão
P Bhatti
P Shannon
PA Lachenbruch
PS Mischel
R Baskar
R Garcia-Munoz
R Lymbouridou
RA Calogero
RG Verhaak
RK Nibbe
RL Alterman
S Chevillard
S Comincini
S Dong
S Fre
S Hasegawa
S Kesari
S Madhavan
S Mittal
S Pavlides
S Rorive
Sandra L Rodriguez-Zas
SP Reddy
T John
T Onda
T Suzuki
T Watanabe
TA Chan
TJ MacDonald
TK Jenssen
U Petrausch
W Cheng
W Sun
X Castells
Y Fu
Y Liu
Y Zeng
YF Lau
Z Chen
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Glioblastoma is a complex multifactorial disorder that has swift and devastating consequences. Few genes have been consistently identified as prognostic biomarkers of glioblastoma survival. The goal of this study was to identify general and clinical-dependent biomarker genes and biological processes of three complementary events: lifetime, overall and progression-free glioblastoma survival. Methods A novel analytical strategy was developed to identify general associations between the biomarkers and glioblastoma, and associations that depend on cohort groups, such as race, gender, and therapy. Gene network inference, cross-validation and functional analyses further supported the identified biomarkers. Results A total of 61, 47 and 60 gene expression profiles were significantly associated with lifetime, overall, and progression-free survival, respectively. The vast majority of these genes have been previously reported to be associated with glioblastoma (35, 24, and 35 genes, respectively) or with other cancers (10, 19, and 15 genes, respectively) and the rest (16, 4, and 10 genes, respectively) are novel associations. <it>Pik3r1</it>, <it>E2f3, Akr1c3</it>, <it>Csf1</it>, <it>Jag2</it>, <it>Plcg1</it>, <it>Rpl37a</it>, <it>Sod2</it>, <it>Topors</it>, <it>Hras</it>, <it>Mdm2, Camk2g</it>, <it>Fstl1</it>, <it>Il13ra1</it>, <it>Mtap </it>and <it>Tp53 </it>were associated with multiple survival events. Most genes (from 90 to 96%) were associated with survival in a general or cohort-independent manner and thus the same trend is observed across all clinical levels studied. The most extreme associations between profiles and survival were observed for <it>Syne1</it>, <it>Pdcd4</it>, <it>Ighg1</it>, <it>Tgfa</it>, <it>Pla2g7</it>, and <it>Paics</it>. Several genes were found to have a cohort-dependent association with survival and these associations are the basis for individualized prognostic and gene-based therapies. <it>C2</it>, <it>Egfr</it>, <it>Prkcb</it>, <it>Igf2bp3</it>, and <it>Gdf10 </it>had gender-dependent associations; <it>Sox10</it>, <it>Rps20</it>, <it>Rab31</it>, and <it>Vav3 </it>had race-dependent associations; <it>Chi3l1</it>, <it>Prkcb</it>, <it>Polr2d</it>, and <it>Apool </it>had therapy-dependent associations. Biological processes associated glioblastoma survival included morphogenesis, cell cycle, aging, response to stimuli, and programmed cell death. Conclusions Known biomarkers of glioblastoma survival were confirmed, and new general and clinical-dependent gene profiles were uncovered. The comparison of biomarkers across glioblastoma phases and functional analyses offered insights into the role of genes. These findings support the development of more accurate and personalized prognostic tools and gene-based therapies that improve the survival and quality of life of individuals afflicted by glioblastoma multiforme.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Gene selection for cancer classification with the help of bees

Author: A Balmain
A Banharnsakun
A Bhattacharjee
A Brazma
A Choudhary
A Dussutour
A Farji-Brener
A Statnikov
A Statnikov
AG Karegowda
AI Su
AV Tinker
B Wu
BJ Norton
BK Verma
C Giallourakis
C Lazar
C Xu
CA Markowski
CC Chang
CJ Tu
CL Nutt
CM Bishop
D Chen
D Karaboga
D Karaboga
D Karaboga
D Karaboga
D Karaboga
D Karaboga
D Singh
D Teodorovic
DM Gordon
DM Gordon
DM Gordon
DV Nguyen
EL Lehmann
ER Dougherty
F Ahmade
F Emmert-Streib
F Kang
F Kang
F Roces
F Roces
F Wilcoxon
FJ Rodriguez
G George
G Li
G Stephanopoulos
G Xu
G Yan
G Zhu
GEP Box
H Drias
H Hu
H Liu
H Shah
H Sharma
H Torres-Contreras
H Yu
H Zhang
HF Wedde
I Eksin
I Guyon
I Guyon
I Inza
J Hamidi
J Ji
J Kennedy
J Khan
J Kiefer
J Wang
J Xu
J-Q Li
JC Bansal
JC Bansal
JC Chang
JD Gibbons
JE Staunton
JG Zhang
JH Cho
JJ Howard
JJ Liu
JL Deneubourg
Johra Muhammad Moosa
JW Lee
L Breiman
L Deng
L Lan
L Li
L Wang
LW Jacobs
LY Chuang
LY Chuang
LY Chuang
LY Chuang
M Bollazzi
M Dorigo
M Hollander
M Kefayat
M Mohamad
M Pirooznia
M Schena
MA Shipp
MA Tahir
MH Kashan
MJ Greene
Mohammad Kaykobad
Mohammad Sohel Rahman
MS Mohamad
MS Mohamad
MS Mohamad
N Todorovic
OK Erol
P Mukherjee
PA Devijver
PE Lønning
PW TSai
PY Kumbhar
Q Shen
Q Zhou
QK Pan
QK Pan
R Akbari
R Cai
R Debnath
R Díaz-Uriarte
R Hooke
R Kohavi
R Kohavi
R Mallika
R Murugan
R Ruiz
Rameen Shakur
RJ Schafer
RN Khushaba
S Bicciato
S Bitam
S Dudoit
S Guo
S Knudsen
S Kumar
S Kumar
S Li
S Omkar
S Pavlidis
S Ramaswamy
S Siegel
S Sundar
S Wang
S Yang
SA Armstrong
SL Pomeroy
SL Wang
SP Fodor
SS Jadon
SS Jeffrey
T Davidović
T Li
T Stützle
TK Sharma
TM Cover
TR Golub
TS Furey
V Saravanan
V Tereshko
V Tereshko
V Tereshko
VN Vapnik
W Li
W Li
W Szeto
W-F Gao
WH Au
WH Kruskal
WH Press
X Wang
X Yan
X Yu
X Zhou
Y Leung
Y Lu
Y Saeys
Y Tan
Y Wang
Y Wang
Y Xu
Y Zhang
Y Zhang
Z Liu
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref