Search CORE

261 research outputs found

Simultaneous Matrix Diagonalization for Structural Brain Networks Classification

Author: A Kurkumov
A Rohde
A Yeredor
B Fischl
C Cortes
CM Florkowski
CM Tax
F Pedregosa
H Zou
J Rudie
JF Cardoso
L Breiman
P Tichavsky
RS Desikan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/10/2017
Field of study

This paper considers the problem of brain disease classification based on connectome data. A connectome is a network representation of a human brain. The typical connectome classification problem is very challenging because of the small sample size and high dimensionality of the data. We propose to use simultaneous approximate diagonalization of adjacency matrices in order to compute their eigenstructures in more stable way. The obtained approximate eigenvalues are further used as features for classification. The proposed approach is demonstrated to be efficient for detection of Alzheimer's disease, outperforming simple baselines and competing with state-of-the-art approaches to brain disease classification

arXiv.org e-Print Archive

Crossref

Using Bayesian Networks and Machine Learning to Predict Computer Science Success

Author: A Gupta
AM Shahiri
C Romero
E Osmanbegović
J Friedman
J Heaton
JR Quinlan
KB Korb
L Breiman
M Hall
MS Andrade
R Asif
RS Baker
RS Baker
S Boughorbel
W Xing
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2019
Field of study

Bayesian Networks and Machine Learning techniques were evaluated and compared for predicting academic performance of Computer Science students at the University of Cape Town. Bayesian Networks performed similarly to other classification models. The causal links AQ1 inherent in Bayesian Networks allow for understanding of the contributing factors for academic success in this field. The most effective indicators of success in first-year ‘core’ courses in Computer Science included the student’s scores for Mathematics and Physics as well as their aptitude for learning and their work ethos. It was found that unsuccessful students could be identified with ≈91% accuracy. This could help to increase throughput as well as student wellbeing at university

Crossref

UCT Computer Science Research Document Archive

A Machine Learning Trainable Model to Assess the Accuracy of Probabilistic Record Linkage

Author: CJ Burges
DF Williamson
DG Altman
DG Altman
DG Altman
DP Silveira da
HB Newcombe
IP Fellegi
JH Friedman
L Breiman
LE Raileanu
LR Dice
M Tromp
P Christen
RS Michalski
SJ Press
VI Levenshtein
X Meng
Y Siegert
Publication venue: 19th International Conference on Big Data Analytics and Knowledge Discovery (DaWaK)
Publication date: 03/08/2017
Field of study

Record linkage (RL) is the process of identifying and linking data that relates to the same physical entity across multiple heterogeneous data sources. Deterministic linkage methods rely on the presence of common uniquely identifying attributes across all sources while probabilistic approaches use non-unique attributes and calculates similarity indexes for pair wise comparisons. A key component of record linkage is accuracy assessment — the process of manually verifying and validating matched pairs to further refine linkage parameters and increase its overall effectiveness. This process however is time-consuming and impractical when applied to large administrative data sources where millions of records must be linked. Additionally, it is potentially biased as the gold standard used is often the reviewer’s intuition. In this paper, we present an approach for assessing and refining the accuracy of probabilistic linkage based on different supervised machine learning methods (decision trees, naïve Bayes, logistic regression, random forest, linear support vector machines and gradient boosted trees). We used data sets extracted from huge Brazilian socioeconomic and public health care data sources. These models were evaluated using receiver operating characteristic plots, sensitivity, specificity and positive predictive values collected from a 10-fold cross-validation method. Results show that logistic regression outperforms other classifiers and enables the creation of a generalized, very accurate model to validate linkage results

Crossref

UCL Discovery

Investigating the Effect of Emoji in Opinion Classification of Uzbek Movie Review Comments

Author: A Koumpouri
AAL Cunha
AG Vural
B Yergesh
D Aha
E Kuriyozov
J Sido
K Wegrzyn-Wolska
L Besacier
L Breiman
NS Sakenovich
PS Dandannavar
R Dehkharghani
RS Jagdale
S Poria
S Rezaeinia
S Sun
SS Keerthi
V Karthik
V Simaki
Publication venue
Publication date: 01/01/2020
Field of study

Opinion mining on social media posts has become more and more popular. Users often express their opinion on a topic not only with words but they also use image symbols such as emoticons and emoji. In this paper, we investigate the effect of emoji-based features in opinion classification of Uzbek texts, and more specifically movie review comments from YouTube. Several classification algorithms are tested, and feature ranking is performed to evaluate the discriminative ability of the emoji-based features.Comment: 10 pages, 1 figure, 3 table

arXiv.org e-Print Archive

Lund University Publications

Crossref

A feature selection method for classification within functional genomics experiments based on the proportional overlapping score

Author: A Kikuchi
A Statnikov
A Ultsch
Andrew Harrison
Aris Perperoglou
Asma Gul
B Lausen
Berthold Lausen
C Cortes
C Ding
C Ma
C Müssel
C Zou
D Apiletti
D Apiletti
DA Notterman
DeAndresSA Díaz‐Uriarte R
DG Altman
E Baralis
GJ Gordon
H Peng
H‐C Liu
J Fan
J Fan
J Lu
K‐H Chen
L Breiman
L Breiman
L Lausser
M Dramiński
M Marczyk
Metodi V Metodiev
N De Jay
Osama Mahmoud
P Alhopuro
P Laiho
RN Jorissen
RS Croner
RS Croner
S Chiaretti
S Michiels
T Cover
T Jirapech‐Umpai
TR Golub
VG Tusher
W Talloen
Y Saeys
Y Su
Zardad Khan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature's relevance to a classification task.Results: We apply POS, along-with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.Conclusions: A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along-with a novel gene score are exploited to produce the selected subset of genes

University of Essex Research Repository

Crossref

Springer - Publisher Connector

PubMed Central

Explore Bristol Research

Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients

Author: AA Freitas
AA Freitas
AA Freitas
AK Madan
Alex A Freitas
AR Katritzky
B Louis
C Xu
D Newby
DE Clark
EM Amo del
F Lombardo
FY Bois
G Berellini
G Holmes
H Graham
H Witten
HT Yudate
I Mahmood
J Clausen
J Gasteiger
JC Duffy
JH Lin
JR Quinlan
JR Quinlan
K Limbu
KA Min
Kriti Limbu
L Breiman
L Breiman
L Di
LM Berry
M Hall
N Zheng
O Demir-Kavuk
P Paixão
P Poulin
P Poulin
P Poulin
P Poulin
RD Jones
RS Obach
RS Obach
SA Wildman
SE Rosenbaum
SS Buck De
T Ghafourian
T Ghafourian
T Ghafourian
T Peyret
Taravat Ghafourian
TJ Maguire
Y Gong
Y Kato
Z Zhivkova
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Background: Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Results: Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Conclusions: Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Figure not available: see fulltext. Â© 2015 Freitas et al.; licensee Springer

Crossref

Springer - Publisher Connector

PubMed Central

Kent Academic Repository

Sussex Research Online

Human Communication Dynamics in Digital Footsteps: A Study of the Agreement between Self-Reported Ties and Email Networks

Author: A Clauset
A-L Barabási
B Uzzi
Brian Uzzi
D Lazer
DJ Watts
DR Malmgren
E Gilbert
ER Black
G Kossinets
G Kossinets
J Onnela
JM Podolny
JR Tyler
L Adamic
L Breiman
LC Freeman
M Granovetter
MEJ Newman
MT Hansen
N Eagle
NA Christakis
Petter Holme
PV Marsden
RE Reagans
RS Burt
S Wuchty
SP Borgatti
Stefan Wuchty
Y Benjamini
Y Sun
Y Sun
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Digital communication data has created opportunities to advance the knowledge of human dynamics in many areas, including national security, behavioral health, and consumerism. While digital data uniquely captures the totality of a person's communication, past research consistently shows that a subset of contacts makes up a person's “social network” of unique resource providers. To address this gap, we analyzed the correspondence between self-reported social network data and email communication data with the objective of identifying the dynamics in e-communication that correlate with a person's perception of a significant network tie. First, we examined the predictive utility of three popular methods to derive social network data from email data based on volume and reciprocity of bilateral email exchanges. Second, we observed differences in the response dynamics along self-reported ties, allowing us to introduce and test a new method that incorporates time-resolved exchange data. Using a range of robustness checks for measurement and misreporting errors in self-report and email data, we find that the methods have similar predictive utility. Although e-communication has lowered communication costs with large numbers of persons, and potentially extended our number of, and reach to contacts, our case results suggest that underlying behavioral patterns indicative of friendship or professional contacts continue to operate in a classical fashion in email interactions

Crossref

Directory of Open Access Journals

PubMed Central

University of Miami: Scholarship Miami

Pruning of Error Correcting Output Codes by optimization of accuracy–diversity trade off

Author: EJ Candes
EM Kleinberg
G Martinez-Mungoz
J Nocedal
J Weston
L Breiman
L Breiman
L Mason
L Yu
LI Kuncheva
LI Kuncheva
P Ekman
R Smith
Raymond Smith
RS Smith
S Escalera
Süreyya Özöğür-Akyüz
T Dietterich
T Windeatt
T Windeatt
Terry Windeatt
VN Vapnik
X Yin
Y Zhang
YI Tian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/12/2014
Field of study

Ensemble learning is a method of combining learners to obtain more reliable and accurate predictions in supervised and unsupervised learning. However, the ensemble sizes are sometimes unnecessarily large which leads to additional memory usage, computational overhead and decreased effectiveness. To overcome such side effects, pruning algorithms have been developed; since this is a combinatorial problem, finding the exact subset of ensembles is computationally infeasible. Different types of heuristic algorithms have developed to obtain an approximate solution but they lack a theoretical guarantee. Error Correcting Output Code (ECOC) is one of the well-known ensemble techniques for multiclass classification which combines the outputs of binary base learners to predict the classes for multiclass data. In this paper, we propose a novel approach for pruning the ECOC matrix by utilizing accuracy and diversity information simultaneously. All existing pruning methods need the size of the ensemble as a parameter, so the performance of the pruning methods depends on the size of the ensemble. Our unparametrized pruning method is novel as being independent of the size of ensemble. Experimental results show that our pruning method is mostly better than other existing approaches

Crossref

University of Surrey

Surrey Research Insight

Identification of early changes in specific symptoms that predict longer-term response to atypical antipsychotics in the treatment of patients with schizophrenia

Author: A Breier
BJ Kinon
BJ Kinon
BJ Kinon
BJ Kinon
BJ Kinon
BJ Kinon
Bruce J Kinon
CU Correll
DW Heinrichs
G Morken
H Ascher-Svanum
Haya Ascher-Svanum
JE Overall
JM Kane
John Kane
L Breiman
Lei Chen
MJ Berry
O Agid
PV Tran
Robert R Conley
RS Keefe
S Kapur
S Leucht
S Leucht
Sara Kollack-Walker
SC Lemon
SR Kay
Stephen J Ruberg
Virginia Stauffer
VL Stauffer
W Guy
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background To identify a simple decision tree using early symptom change to predict response to atypical antipsychotic therapy in patients with (Diagnostic and Statistical Manual, Fourth Edition, Text Revised) chronic schizophrenia. Methods Data were pooled from moderately to severely ill patients (n = 1494) from 6 randomized, double-blind trials (N = 2543). Response was defined as a ≥30% reduction in Positive and Negative Syndrome Scale (PANSS) Total score by Week 8 of treatment. Analyzed predictors were change in individual PANSS items at Weeks 1 and 2. A decision tree was constructed using classification and regression tree (CART) analysis to identify predictors that most effectively differentiated responders from non-responders. Results A 2-branch, 6-item decision tree was created, producing 3 distinct groups. First branch criterion was a 2-point score decrease in at least 2 of 5 PANSS positive items (Week 2). Second branch criterion was a 2-point score decrease in the PANSS excitement item (Week 2). "Likely responders" met the first branch criteria; "likely non-responders" did not meet first or second criterion; "not predictable" patients did not meet the first but did meet the second criterion. Using this approach, response to treatment could be predicted in most patients (92%) with high positive predictive value (79%) and high negative predictive value (75%). Predictive findings were confirmed through analysis of data from 2 independent trials. Conclusions Using a data-driven approach, we identified decision rules using early change in the scores of selected PANSS items to accurately predict longer-term treatment response or non-response to atypical antipsychotic therapy. This could lead to development of a simple quantitative evaluation tool to help guide early treatment decisions. Trial Registration This is a retrospective, non-intervention study in which pooled results from 6 previously published reports were analyzed; thus, clinical trial registration is not required.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Voice analysis as an objective state marker in bipolar disorder

Author: A Grünerbl
AC Powell
AWG Buijink
B Martínez-Pérez
C Sobin
CR Cooke
DJ Kupfer
E Moore
E Renfordt
F Eyben
H Kuhs
I Singh
JC Mundt
JF Greden
JF Greden
JK Wing
L Breiman
L Srivastava
LM Weinstock
LV Kessing
M Alpert
M Faurholt-Jepsen
M Faurholt-Jepsen
M Faurholt-Jepsen
M Faurholt-Jepsen
M Hamilton
MA Frye
N Vanello
P Musiat
P Partila
RC Young
RS McIntyre
S Kapur
S Monteith
S Newman
SJ Wenze
T Donker
T Donker
T Glenn
TB Murdoch
V Osmani
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Crossref

Copenhagen University Research Information System

The IT University of Copenhagen's Repository

Online Research Database In Technology