Search CORE

38 research outputs found

Inferring meta-covariates in classification

Author: B. Hanczar
C. Fraley
C.M. Bishop
K. Bae
K.E. Lee
M.Y. Park
S. Dudoit
T.R. Golub
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

This paper develops an alternative method for gene selection that combines model based clustering and binary classification. By averaging the covariates within the clusters obtained from model based clustering, we define “meta-covariates” and use them to build a probit regression model, thereby selecting clusters of similarly behaving genes, aiding interpretation. This simultaneous learning task is accomplished by an EM algorithm that optimises a single likelihood function which rewards good performance at both classification and clustering. We explore the performance of our methodology on a well known leukaemia dataset and use the Gene Ontology to interpret our results

Crossref

UCL Discovery

Enlighten

Predicting students' emotions using machine learning techniques

Author: B Hanczar
F Tian
J Kaur
T Danisman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Detecting students' real-time emotions has numerous benefits, such as helping lecturers understand their students' learning behaviour and to address problems like confusion and boredom, which undermine students' engagement. One way to detect students' emotions is through their feedback about a lecture. Detecting students' emotions from their feedback, however, is both demanding and time-consuming. For this purpose, we looked at several models that could be used for detecting emotions from students' feedback by training seven different machine learning techniques using real students' feedback. The models with a single emotion performed better than those with multiple emotions. Overall, the best three models were obtained with the CNB classiffier for three emotions: amused, bored and excitement

Crossref

University of Brighton Research Portal

Portsmouth University Research Portal (Pure)

On optimal Bayesian classification and risk estimation under multiple classes

Author: A Zollanvari
B Efron
B Efron
B Efron
B Hanczar
B Hanczar
BE Boser
C Cortes
C-C Chang
CM Bishop
ER Dougherty
H Xu
H Xu
JM Knight
L Devroye
LA Dalton
LA Dalton
LA Dalton
LA Dalton
LA Dalton
LA Dalton
LA Dalton
Lori A. Dalton
MJ van de Vijver
Mohammadmahdi R. Yousefi
MR Yousefi
MR Yousefi
MR Yousefi
MS Esfahani
NL Johnson
S Kotz
UM Braga-Neto
UM Braga-Neto
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Microarray profiling of human white adipose tissue after exogenous leptin injection

Author: Cancello R.
Clement K.
Evelo C.T.A.
Hanczar B.
Henegar C.
Hukshorn C.J.
Langin D.
Pelloux V.
Saris W.H.M.
Taleb S.
van Haaften R.I.M.
Viguerie N.
Zucker J.D.
Publication venue: 'Wiley'
Publication date: 01/01/2006
Field of study

Maastricht University Research Portal

SlimPLS: A Method for Feature Selection in Gene Expression-Based Disease Classification

Author: A Berchuck
A Hodges
A Webb
B Ding
B Hanczar
BS Everitt
D Chowdary
D Singh
DG Beer
DV Nguyen
DV Nguyen
E Tian
F Borovecki
G Fort
Gideon Dror
H Martens
H Wold
H Wold
K Chin
K-AL Cao
L Breiman
L Breiman
L Song
LJ van 't Veer
M Barker
M Gutkin
M Momma
M West
MA Zapala
Magnus Rattray
ME Burczynski
Michael Gutkin
N Iizuka
R Rosipal
R Rosipal
R Rosipol
RA Fisher
Ron Shamir
RW Hamming
S Gruvberger
S Wold
SJ Russell
SM Dhanasekaran
T Hastie
T Okada
TM Mitchell
TR Golub
U Alon
V Vapnik
WN Venables
X Huang
X Huang
Y Saeys
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

A major challenge in biomedical studies in recent years has been the classification of gene expression profiles into categories, such as cases and controls. This is done by first training a classifier by using a labeled training set containing labeled samples from the two populations, and then using that classifier to predict the labels of new samples. Such predictions have recently been shown to improve the diagnosis and treatment selection practices for several diseases. This procedure is complicated, however, by the high dimensionality if the data. While microarrays can measure the levels of thousands of genes per sample, case-control microarray studies usually involve no more than several dozen samples. Standard classifiers do not work well in these situations where the number of features (gene expression levels measured in these microarrays) far exceeds the number of samples. Selecting only the features that are most relevant for discriminating between the two categories can help construct better classifiers, in terms of both accuracy and efficiency. In this work we developed a novel method for multivariate feature selection based on the Partial Least Squares algorithm. We compared the method's variants with common feature selection techniques across a large number of real case-control datasets, using several classifiers. We demonstrate the advantages of the method and the preferable combinations of classifier and feature selection technique

CiteSeerX

Public Library of Science (PLOS)

Crossref

PubMed Central

A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis

Author: A Carmena
A Carmena
A Carmena
A Dastjerdi
A Erives
A Ivan
A Nose
A Paululat
A Siepel
A Subramanian
A Visel
A Visel
A Woolfe
AA Philippakis
AC Groth
AG Nazina
AG Nazina
AK Holloway
Alan M. Michelson
AM Michelson
AM Michelson
B Estrada
B Hanczar
BL Black
BP Berman
Brian W. Busser
BW Busser
C Bourgouin
C Chang
C Jiang
C Klämbt
CA Berkes
CI Swanson
CT Ong
DN Arnosti
DT Odom
E Davidson
EE Hare
EN Olson
FC Wardle
G Hon
G Junion
G Leung
G Ranganayakulu
GE Crawford
GG Loots
H Brohmann
H Rouault
HP Shih
I Abnizova
I Costello
I Guyon
I Ovcharenko
I Reim
I Reim
Ivan Ovcharenko
J Bischof
J Crocker
J Crocker
J Enriquez
J Ernst
J Shawe-Taylor
J Zeitlinger
JA Pederson
James W. Posakony
JD Pederson
JM Claycomb
JS Jakobsen
JW Mahaffey
K Jagla
K Robasky
K Senger
L Dubois
L Li
L Narlikar
L Narlikar
L Narlikar
Leila Taher
M Capovilla
M Frasch
M Ludwig
M Markstein
M Markstein
M Porsch
M Ruiz-Gomez
M Schwaiger
MA Beer
MB Noyes
MD Biggin
MF Berger
MI Arnone
MJ Blow
MK Baylies
MK Baylies
MK Baylies
MK Gross
Molly J. Bloom
MR Kantorovitz
MS Halfon
MS Halfon
MV Taylor
N Negre
N Reeves
OL Griffith
P Tomancak
PJ Clyne
R Bodmer
R Galant
RG Ramsay
RJ Bryson-Richardson
RP Zinzen
S Barolo
S Knirr
S Knirr
S MacArthur
S Mahony
SA Ness
SB Carroll
SD Weatherbee
SJ Raudys
SM Gallo
SY Kim
T Jagla
T Sandmann
T Sandmann
Terese Tansey
TL Bailey
U Grossniklaus
V Matys
V Tixier
Y Benjamini
YH Liu
Yongsok Kim
Z Han
Publication venue: Public Library of Science
Publication date: 08/03/2012
Field of study

Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of both their unique and their shared sequence features. To address this problem, we combined phylogenetic profiling with a DNA–based enhancer sequence classifier that analyzes the TF binding sites (TFBSs) governing the transcription of a co-expressed gene set. We first assembled a small number of enhancers that are active in Drosophila melanogaster muscle founder cells (FCs) and other mesodermal cell types. Using phylogenetic profiling, we increased the number of enhancers by incorporating orthologous but divergent sequences from other Drosophila species. Functional assays revealed that the diverged enhancer orthologs were active in largely similar patterns as their D. melanogaster counterparts, although there was extensive evolutionary shuffling of known TFBSs. We then built and trained a classifier using this enhancer set and identified additional related enhancers based on the presence or absence of known and putative TFBSs. Predicted FC enhancers were over-represented in proximity to known FC genes; and many of the TFBSs learned by the classifier were found to be critical for enhancer activity, including POU homeodomain, Myb, Ets, Forkhead, and T-box motifs. Empirical testing also revealed that the T-box TF encoded by org-1 is a previously uncharacterized regulator of muscle cell identity. Finally, we found extensive diversity in the composition of TFBSs within known FC enhancers, suggesting that motif combinatorics plays an essential role in the cellular specificity exhibited by such enhancers. In summary, machine learning combined with evolutionary sequence analysis is useful for recognizing novel TFBSs and for facilitating the identification of cognate TFs that coordinate cell type–specific developmental gene expression patterns

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

pROC: an open-source package for R and S+ to analyze and compare ROC curves

Author: A Moise
AI Bandos
AI Bandos
Alexandre Hainard
B Hanczar
C Stephan
CE Metz
DK McClish
DL Streiner
ER DeLong
ES Venkatraman
ES Venkatraman
Frédérique Lisacek
G Campbell
J Carpenter
JA Hanley
JA Hanley
JA Swets
Jean-Charles Sanchez
KH Zou
M Pepe
Markus Müller
MS Pepe
N Turck
Natacha Turck
Natalia Tiberti
P Sonego
R Development Core Team
T Fawcett
T Sing
TM Braun
WJ Ewens
WN Venables
X Robin
Xavier Robin
Y Jiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Analysis of feature selection stability on high dimension and small sample data

Author: Dernoncourt D.
Hanczar B.
Zucker Jean-Daniel
Publication venue
Publication date: 01/01/2014
Field of study

Feature selection is an important step when building a classifier on high dimensional data. As the number of observations is small, the feature selection tends to be unstable. It is common that two feature subsets, obtained from different datasets but dealing with the same classification problem, do not overlap significantly. Although it is a crucial problem, few works have been done on the selection stability. The behavior of feature selection is analyzed in various conditions, not exclusively but with a focus on t-score based feature selection approaches and small sample data. The analysis is in three steps: the first one is theoretical using a simple mathematical model; the second one is empirical and based on artificial data; and the last one is based on real data. These three analyses lead to the same results and give a better understanding of the feature selection problem in high dimension data

Horizon / Pleins textes

Feature construction from synergic pairs to improve microarray-based classification

Author: HANCZAR B
HENEGAR C
SAITTA L
ZUCKER J.D
Publication venue
Publication date: 01/01/2007
Field of study

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Feature construction from synergic pairs to improve microarray-based classification

Author: Hanczar B.
Henegar C.
Saitta L.
Zucker Jean-Daniel
Publication venue
Publication date: 01/01/2007
Field of study

Motivation :Microarray experiments that allow simultaneous expression profiling of thousands of genes in various conditions (tissues, cells or time) generate data whose analysis raises difficult problems. In particular, there is a vast disproportion between the number of attributes (tens of thousands) and the number of examples (several tens). Dimension reduction is therefore a keystep before applying classification approaches. Many methods have been proposed to this purpose, but only a few of them considered a direct quantification of transcriptional interactions. We describe and experimentally validate a new dimension reduction and feature construction method, which assesses interactions between expression profiles to improve microarray-based classification accuracy. Results : Our approach relies on a mutual information measure that exposes some elementary constituents of the information contained in a pair of gene expression profiles. We show that their analysis implies a term that represents the information of the interaction between the two genes. The principle of our method, called FeatKNN, is to exploit the information provided by highly synergic gene pairs to improve classification accuracy. First, a heuristic search selects the most informative gene pairs. Then, for each selected pair, a new feature, representing the classification margin of a KNN classifier in the gene pairs space, is constructed. We show experimentally that the interactional information has a degree of significance comparable to that of the gene expression profiles considered separately. Our method has been tested with different classifiers and yielded significant improvements in accuracy on several public micro array databases. Moreover, a synthetic assessment of the biological significance of the concept of synergic gene pairs suggested its ability to uncover relevant mechanisms underlying interactions among various cellular processes

Horizon / Pleins textes