Search CORE

378 research outputs found

Determining appropriate approaches for using data in feature selection

Author: A Kalousis
C Ambroise
DW Aha
F Wilcoxon
G Chandrashekar
H Liu
J Reunanen
JC Platt
JR Quinlan
L Yu
M Lecocke
MA Hall
P Somol
V Bolón-Canedo
Y Han
Y Saeys
Z He
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/12/2015
Field of study

Feature selection is increasingly important in data analysis and machine learning in big data era. However, how to use the data in feature selection, i.e. using either ALL or PART of a dataset, has become a serious and tricky issue. Whilst the conventional practice of using all the data in feature selection may lead to selection bias, using part of the data may, on the other hand, lead to underestimating the relevant features under some conditions. This paper investigates these two strategies systematically in terms of reliability and effectiveness, and then determines their suitability for datasets with different characteristics. The reliability is measured by the Average Tanimoto Index and the Inter-method Average Tanimoto Index, and the effectiveness is measured by the mean generalisation accuracy of classification. The computational experiments are carried out on ten real-world benchmark datasets and fourteen synthetic datasets. The synthetic datasets are generated with a pre-set number of relevant features and varied numbers of irrelevant features and instances, and added with different levels of noise. The results indicate that the PART approach is more effective in reducing the bias when the size of a dataset is small but starts to lose its advantage as the dataset size increases

Crossref

Springer - Publisher Connector

University of East Anglia digital repository

Machine learning for automatic prediction of the quality of electrophysiological recordings

Author: AB Wiltschko
BT Priest
C Mathes
CG Galizia
Dominique Martinez
F Franke
H Lei
Jean-Pierre Rospars
Johannes Reisert
M Asmild
MS Lewicki
R Friedrich
R Kohavi
S Panzeri
S Takahashi
SB Wilson
Shereen Elbanna
Sylvia Anton
T Nowotny
Thomas Nowotny
Y Saeys
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

The quality of electrophysiological recordings varies a lot due to technical and biological variability and neuroscientists inevitably have to select “good” recordings for further analyses. This procedure is time-consuming and prone to selection biases. Here, we investigate replacing human decisions by a machine learning approach. We define 16 features, such as spike height and width, select the most informative ones using a wrapper method and train a classifier to reproduce the judgement of one of our expert electrophysiologists. Generalisation performance is then assessed on unseen data, classified by the same or by another expert. We observe that the learning machine can be equally, if not more, consistent in its judgements as individual experts amongst each other. Best performance is achieved for a limited number of informative features; the optimal feature set being different from one data set to another. With 80–90% of correct judgements, the performance of the system is very promising within the data sets of each expert but judgments are less reliable when it is used across sets of recordings from different experts. We conclude that the proposed approach is relevant to the selection of electrophysiological recordings, provided parameters are adjusted to different types of experiments and to individual experimenters

Public Library of Science (PLOS)

Crossref

INRIA a CCSD electronic archive server

Directory of Open Access Journals

Sussex Research Online

FigShare

Asynchronous processing for latent fingerprint identification on heterogeneous CPU-GPU systems

Author: Herrera Francisco
Medina-Perez Miguel Angel
Peralta Cámara Daniel
Romero Luis F.
Saeys Yvan
Sanchez-Fernandez Andres J.
Tabik Siham
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Latent fingerprint identification is one of the most essential identification procedures in criminal investigations. Addressing this task is challenging as (i) it requires analyzing massive databases in reasonable periods and (ii) it is commonly solved by combining different methods with very complex data-dependencies, which make fully exploiting heterogeneous CPU-GPU systems very complex. Most efforts in this context focus on improving the accuracy of the approaches and neglect reducing the processing time. Indeed, the most accurate approach was designed for one single thread. This work introduces the fastest methodology for latent fingerprint identification maintaining high accuracy called Asynchronous processing for Latent Fingerprint Identification (ALFI). ALFI fully exploits all the resources of CPU-GPU systems using asynchronous processing and fine-coarse parallelism for analyzing massive databases. Our approach reduces idle times in processing and exploits the inherent parallelism of comparing latent fingerprints to fingerprint impressions. We analyzed the performance of ALFI on Linux and Windows operating systems using the well-known NIST/FVC databases. Experimental results reveal that ALFI is in average 22x faster than the state-of-the-art algorithm, reaching a value of 44.7x for the best-studied case

Ghent University Academic Bibliography

Computational flow cytometry as a diagnostic tool in suspected-myelodysplastic syndromes

Author: Bachas Costa
Duetz Carolien
Saeys Yvan
van de Loosdrecht Arjan A.
Van Gassen Sofie
van Spronsen Margot F.
Westers Theresia M.
Publication venue: 'Wiley'
Publication date: 01/01/2021
Field of study

The diagnostic work-up of patients suspected for myelodysplastic syndromes is challenging and mainly relies on bone marrow morphology and cytogenetics. In this study, we developed and prospectively validated a fully computational tool for flow cytometry diagnostics in suspected-MDS. The computational diagnostic workflow consists of methods for pre-processing flow cytometry data, followed by a cell population detection method (FlowSOM) and a machine learning classifier (Random Forest). Based on a six tubes FC panel, the workflow obtained a 90% sensitivity and 93% specificity in an independent validation cohort. For practical advantages (e.g., reduced processing time and costs), a second computational diagnostic workflow was trained, solely based on the best performing single tube of the training cohort. This workflow obtained 97% sensitivity and 95% specificity in the prospective validation cohort. Both workflows outperformed the conventional, expert analyzed flow cytometry scores for diagnosis with respect to accuracy, objectivity and time investment (less than 2 min per patient)

Crossref

Ghent University Academic Bibliography

PubMed Central

Classification of motor imagery tasks for BCI with multiresolution analysis and multiobjective feature selection

Author: Andrés Ortiz
D Kimovski
F Lotte
I Daubechies
J Asensio-Cubero
J Cohen
J Handl
J Wang
Javier Asensio-Cubero
John Q. Gan
Julio Ortega
LS Oliveira
M Aharon
P Pudil
SJ Raudys
TT Cai
Y Kim
Y Saeys
Y Zang
Y Zhang
Z Zao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background: Brain-computer interfacing (BCI) applications based on the classification of electroencephalographic (EEG) signals require solving high-dimensional pattern classification problems with such a relatively small number of training patterns that curse of dimensionality problems usually arise. Multiresolution analysis (MRA) has useful properties for signal analysis in both temporal and spectral analysis, and has been broadly used in the BCI field. However, MRA usually increases the dimensionality of the input data. Therefore, some approaches to feature selection or feature dimensionality reduction should be considered for improving the performance of the MRA based BCI. Methods: This paper investigates feature selection in the MRA-based frameworks for BCI. Several wrapper approaches to evolutionary multiobjective feature selection are proposed with different structures of classifiers. They are evaluated by comparing with baseline methods using sparse representation of features or without feature selection. Results and conclusion: The statistical analysis, by applying the Kolmogorov-Smirnoff and Kruskal-Wallis tests to the means of the Kappa values evaluated by using the test patterns in each approach, has demonstrated some advantages of the proposed approaches. In comparison with the baseline MRA approach used in previous studies, the proposed evolutionary multiobjective feature selection approaches provide similar or even better classification performances, with significant reduction in the number of features that need to be computed

University of Essex Research Repository

Crossref

Springer - Publisher Connector

PubMed Central

Minimum redundancy maximum relevance feature selection approach for temporal gene expression data

Author: A Argyriou
AK Zaas
B Chen
C Ding
F Nie
F Petitjean
I Guyon
J G
J Peyman
K Kira
L Fan
L Yu
M Chen
MF Ghalwash
Milos Radovic
Mohamed Ghalwash
N Hoque
Nenad Filipovic
Q Lou
R Kohavi
S Salvador
T Elena
T Rakthanmanon
TN Lal
V Bolón-Canedo
Y Saeys
ZM Hira
Zoran Obradovic
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Regularized logistic regression and multi-objective variable selection for classifying MEG data

Author: A Mendiburu
B Obermaier
C Coello
Concha Bielza
DE Goldberg
EA Leicht
F Di Grazia
F Lotte
FP de Lange
G McLachlan
H Zou
I Inza
I Iturrate
J Carmena
J Friedman
J Rieger
J Wolpaw
JH Holland
L Bianchi
M Besserve
M Lebedev
M Nicolelis
M Pelikan
M van-Gerven
M van-Gerven
PA Valdés-Sosa
Pedro Larrañaga
R Armañanzas
R Guimera
R Milo
R Santana
Roberto Santana
S Haufe
S Kelly
U Hoffmann
V Vapnik
W Wang
Y Saeys
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

This paper addresses the question of maximizing classifier accuracy for classifying task-related mental activity from Magnetoencelophalography (MEG) data. We propose the use of different sources of information and introduce an automatic channel selection procedure. To determine an informative set of channels, our approach combines a variety of machine learning algorithms: feature subset selection methods, classifiers based on regularized logistic regression, information fusion, and multiobjective optimization based on probabilistic modeling of the search space. The experimental results show that our proposal is able to improve classification accuracy compared to approaches whose classifiers use only one type of MEG information or for which the set of channels is fixed a priori

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

The role of chloroplast movement in C4 photosynthesis: a theoretical analysis using a three-dimensional reaction-diffusion model for maize

Author: Berghuijs Herman N.C.
Cano F. J.
Ghannoum Oula
Ho Quang Tri
Nicolaï Bart M.
Retta Moges A.
Saeys Wouter
Struik Paul C.
Verboven Pieter
Watté Rodrigo
Yin Xinyou
Publication venue: Oxford University Press
Publication date: 01/01/2023
Field of study

18 Pág.Chloroplasts movement within mesophyll cells in C4 plants is hypothesized to enhance the CO2 concentrating mechanism, but this is difficult to verify experimentally. A three-dimensional (3D) leaf model can help analyse how chloroplast movement influences the operation of the CO2 concentrating mechanism. The first volumetric reaction-diffusion model of C4 photosynthesis that incorporates detailed 3D leaf anatomy, light propagation, ATP and NADPH production, and CO2, O2 and bicarbonate concentration driven by diffusional and assimilation/emission processes was developed. It was implemented for maize leaves to simulate various chloroplast movement scenarios within mesophyll cells: the movement of all mesophyll chloroplasts towards bundle sheath cells (aggregative movement) and movement of only those of interveinal mesophyll cells towards bundle sheath cells (avoidance movement). Light absorbed by bundle sheath chloroplasts relative to mesophyll chloroplasts increased in both cases. Avoidance movement decreased light absorption by mesophyll chloroplasts considerably. Consequently, total ATP and NADPH production and net photosynthetic rate increased for aggregative movement and decreased for avoidance movement compared with the default case of no chloroplast movement at high light intensities. Leakiness increased in both chloroplast movement scenarios due to the imbalance in energy production and demand in mesophyll and bundle sheath cells. These results suggest the need to design strategies for coordinated increases in electron transport and Rubisco activities for an efficient CO2 concentrating mechanism at very high light intensities.The work is supported by the Research Council of KU Leuven (project C1/16/002) and the Research Fund Flanders (project G.0645.13). Wageningen based authors have contributed to this work within the program BioSolar Cells. FJC was funded through the Spanish fellowship Ramon y Cajal (RYC2021-035064-I).Peer reviewe

Brage IMR

Digital.CSIC

Western Sydney ResearchDirect

Feature selection in the reconstruction of complex network representations of spectral data

Author: AJ Link
BL Adam
D Powers
DF Specht
E Bullmore
EF Petricoin
EF Petricoin III
Ernestina Menasalvas
Eshel Ben-Jacob
F Fleuret
H Peng
HJ Issaq
I Guyon
I Guyon
J Griffin
K Dettmer
LdF Costa
M Zanin
M Zanin
MA Moseley III
Massimiliano Zanin
ME Newman
MH Zweig
O Beckonert
Pedro Sousa
PW Anderson
R Milo
S Boccaletti
S Fortunato
S Havlin
Stefano Boccaletti
TR Covey
Y Saeys
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Complex networks have been extensively used in the last decade to characterize and analyze complex systems, and they have been recently proposed as a novel instrument for the analysis of spectra extracted from biological samples. Yet, the high number of measurements composing spectra, and the consequent high computational cost, make a direct network analysis unfeasible. We here present a comparative analysis of three customary feature selection algorithms, including the binning of spectral data and the use of information theory metrics. Such algorithms are compared by assessing the score obtained in a classification task, where healthy subjects and people suffering from different types of cancers should be discriminated. Results indicate that a feature selection strategy based on Mutual Information outperforms the more classical data binning, while allowing a reduction of the dimensionality of the data set in two orders of magnitud

CiteSeerX

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

Archivo Digital UPM

GC content of early metazoan genes and its impact on gene expression levels in mammalian cell lines

Author: Beyaert R.
De Keuckelaere E.
Deroo T.
Driege Y.
Gul I.S.
Hulpiau P.
Kamm K.
Saeys Y.
Sanders E.
Schierwater B.
Staal J.
Staes K.
Technau U.
van Roy F.
Publication venue
Publication date: 01/01/2018
Field of study

With the genomes available for many animal clades, including the early-branching metazoans, one can readily study the functional conservation of genes across a diversity of animal lineages. Ectopic expression of an animal protein in, for instance, a mammalian cell line is a generally used strategy in structure–function analysis. However, this might turn out to be problematic in case of distantly related species. Here we analyzed the GC content of the coding sequences of basal animals and show its impact on gene expression levels in human cell lines, and, importantly, how this expression efficiency can be improved. Optimization of the GC3 content in the coding sequences of cadherin, alpha-catenin, and paracaspase of Trichoplax adhaerens dramatically increased the expression of these basal animal genes in human cell lines

Ghent University Academic Bibliography

Open Marine Archive