Search CORE

283 research outputs found

Identification of disease-causing genes using microarray data mining and gene ontology

Author: A Mohammadi
A Zhang
AA Alizadeh
Azadeh Mohammadi
B Duval
BF Souza
C Ambroise
C Ding
C Tago
D Lin
D Singh
E Martinez
FM Couto
I Guyon
I Inza
J Jaeger
JJ Jiang
L Li
L Yu
L Ziaei
Mansoor Salehi
Mohammad H Saraee
N Cristianini
P Pavlidis
P Resnik
PA Mundra
PA Mundra
PJ Park
R Genuer
RF Weaver
S Li
S Li
TM Huang
TR Golub
TS Furey
U Alon
W Xu
Y Ding
Y Saeys
Y Wang
YL Chin
Z Xie
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Background: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods: We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results: The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions: The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene Ontology information. It predicts marker genes for colon, DLBCL and prostate cancer with a high accuracy. The predictions made in this study can serve as a list of candidates for subsequent wet-lab verification and might help in the search for a cure for cancers

University of Salford Institutional Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Integration and comparison of different genomic data for outcome prediction in cancer

Author: Antonio Martínez-Torteya
Emmanuel Martínez-Ledesma
Hugo Gómez-Rueda
Rebeca Palacios-Corona
Victor Trevino
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Clinical data per cancer type. (XLSX 9 kb

Springer - Publisher Connector

FigShare

Integration and comparison of different genomic data for outcome prediction in cancer

Author: AB Ashraf
Antonio Martínez-Torteya
B Efron
BL Andersen
Cancer Genome Atlas Reasearch Network
Cancer Genome Atlas Reasearch Network
Cancer Genome Atlas Reasearch Network
Cancer Genome Atlas Reasearch Network
CH Mermel
D Kim
E Martinez
E Martinez
EA Mathé
Emmanuel Martínez-Ledesma
H Uno
H Zou
Hugo Gómez-Rueda
J Ferlay
J Ferlay
J Friedman
J Shendure
JN Weinstein
JT Ptacek
KM Leung
L Rahib
LA Garraway
MR Abern
NQ Liu
PN Butow
Q Zhao
RD Riley
Rebeca Palacios-Corona
RG Hagerty
TJ Hudson
V Bewick
Victor Trevino
W Schroth
WF Baile
Y Yuan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

An integrated approach of particle swarm optimization and support vector machine for gene signature selection and cancer prediction

Author: Chan Kit Yan
Yeung C.
Publication venue: IEEE Press
Publication date: 01/01/2009
Field of study

To improve cancer diagnosis and drug development, the classification of tumor types based on genomic information is important. As DNA micro array studies produce a large amount of data, expression data are highly redundant and noisy, and most genes are believed to be uninformative with respect to the studied classes. Only a fraction of genes may present distinct profiles for different classes of samples. Classification tools to deal with these issues are thus important. These tools should learn to robustly identify a subset of informative genes embedded in a large dataset that is contaminated with high dimensional noises. In this paper, an integrated approach of support vector machine (SVM) and particle swarm optimization (PSO) is proposed for this purpose. The proposed approach can simultaneously optimize the selection of feature subset and the classifier through a common solution coding mechanism. As an illustration, the proposed approach is applied to search the combinational gene signatures for predicting histologic response to chemotherapy of osteosarcoma patients. Cross validation results show that the proposed approach outperforms other existing methods in terms of classification accuracy. Further validation using an independent dataset shows misclassification of only one out of fourteen patient samples, suggesting that the selected gene signatures can reflect the chemoresistance in osteosarcoma

espace@Curtin

Under-Updated Particle Swarm Optimization for Small Feature Selection Subsets from Large-Scale Datasets

Author: Emmanuel Martinez
Victor Trevino
Publication venue: 'IntechOpen'
Publication date: 16/03/2012
Field of study

IntechOpen

En-PaFlower: An Ensemble Approach using PSO and Flower Pollination Algorithm for Cancer Diagnosis

Author: Mahapatra Satyasundara
Senapati Sudhir Kumar
Shrivastava Manish
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/08/2023
Field of study

Machine learning now is used across many sectors and provides consistently precise predictions. The machine learning system is able to learn effectively because the training dataset contains examples of previously completed tasks. After learning how to process the necessary data, researchers have proven that machine learning algorithms can carry out the whole work autonomously. In recent years, cancer has become a major cause of the worldwide increase in mortality. Therefore, early detection of cancer improves the chance of a complete recovery, and Machine Learning (ML) plays a significant role in this perspective. Cancer diagnostic and prognosis microarray dataset is available with the biopsy dataset. Because of its importance in making diagnoses and classifying cancer diseases, the microarray data represents a massive amount. It may be challenging to do an analysis on a large number of datasets, though. As a result, feature selection is crucial, and machine learning provides classification techniques. These algorithms choose the relevant features that help build a more precise categorization model. Accurately classifying diseases is facilitated as a result, which aids in disease prevention. This work aims to synthesize existing knowledge on cancer diagnosis using machine learning techniques into a compact report.  Current research work aims to propose an ensemble-based machine learning model En-PaFlower using Particle Swarm Optimization (PSO) as the feature selection algorithm, Flower Pollination algorithm (FPA) as the optimization algorithm with the majority voting algorithm. Finally, the performance of the proposed algorithm is evaluated over three different types of cancer disease datasets with accuracy, precision, recall, specificity, and F-1 Score etc as the evaluation parameters. The empirical analysis shows that the proposed methodology shows highest accuracy as 95.65%

International Journal on Recent and Innovation Trends in Computing and Communication

Evolutionary Computation and QSAR Research

Author: Aguiar-Pulido Vanessa
Cruz-Monteagudo Maykel
Dorado Julián
Gestal M.
Munteanu Cristian-Robert
Rabuñal Juan R.
Publication venue: 'Bentham Science Publishers Ltd.'
Publication date: 01/01/2013
Field of study

[Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. Consellería de Economía e Industria; 10SIN105004P

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives

Author: Bazlur Rashid A. N. M.
Choudhury Tonmoy
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2019
Field of study

The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions

Research Online @ ECU

Computational models and approaches for lung cancer diagnosis

Author: Azzawi Hasseeb
Publication venue: Deakin University, Faculty of Science, Engineering and Built Environment, School of Information Technology
Publication date: 01/10/2019
Field of study

The success of treatment of patients with cancer depends on establishing an accurate diagnosis. To this end, the aim of this study is to developed novel lung cancer diagnostic models. New algorithms are proposed to analyse the biological data and extract knowledge that assists in achieving accurate diagnosis results

Deakin Research Online