Search CORE

128 research outputs found

Learning to recognize webpage genres

Author: Efstathios Stamatatos
Finn
Forman
Ioannis Kanaris
Lim
Meyer zu Eissen
Robnik-Sikonja
Santini
Sebastiani
Swales
Publication venue: 'Elsevier BV'
Publication date
Field of study

Spatially Uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions

Author: A Motsinger
AL Tyler
B McKinney
BA McKinney
Casey S Greene
CS Greene
CS Greene
CS Greene
I Kononenko
J Hardy
J Jakobsdottir
Jason H Moore
Jeff Kiralis
JH Moore
JH Moore
JH Moore
JH Moore
JH Moore
JH Moore
JN Hirschhorn
K Kira
L Beretta
M Robnik-Sikonja
M Robnik-Sikonja
MI McCarthy
MM Iles
Nadia M Penrod
P Kraft
RR Sokal
U Finckh
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Genome-wide association studies are becoming the de facto standard in the genetic analysis of common human diseases. Given the complexity and robustness of biological networks such diseases are unlikely to be the result of single points of failure but instead likely arise from the joint failure of two or more interacting components. The hope in genome-wide screens is that these points of failure can be linked to single nucleotide polymorphisms (SNPs) which confer disease susceptibility. Detecting interacting variants that lead to disease in the absence of single-gene effects is difficult however, and methods to exhaustively analyze sets of these variants for interactions are combinatorial in nature thus making them computationally infeasible. Efficient algorithms which can detect interacting SNPs are needed. ReliefF is one such promising algorithm, although it has low success rate for noisy datasets when the interaction effect is small. ReliefF has been paired with an iterative approach, Tuned ReliefF (TuRF), which improves the estimation of weights in noisy data but does not fundamentally change the underlying ReliefF algorithm. To improve the sensitivity of studies using these methods to detect small effects we introduce Spatially Uniform ReliefF (SURF). Results SURF's ability to detect interactions in this domain is significantly greater than that of ReliefF. Similarly SURF, in combination with the TuRF strategy significantly outperforms TuRF alone for SNP selection under an epistasis model. It is important to note that this success rate increase does not require an increase in algorithmic complexity and allows for increased success rate, even with the removal of a nuisance parameter from the algorithm. Conclusion Researchers performing genetic association studies and aiming to discover gene-gene interactions associated with increased disease susceptibility should use SURF in place of ReliefF. For instance, SURF should be used instead of ReliefF to filter a dataset before an exhaustive MDR analysis. This change increases the ability of a study to detect gene-gene interactions. The SURF algorithm is implemented in the open source Multifactor Dimensionality Reduction (MDR) software package available from <url>http://www.epistasis.org</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

Physiological Indicators for User Trust in Machine Learning with Influence Enhanced Fact-Checking

Author: A Bechara
A Calero Valdez
A Ilyas
D Fisher
J Zhou
J Zhou
J Zhou
J Zhou
JD Lee
LR Ye
M Brahimi
M Nilsson
M Robnik-Sikonja
PB Brandtzaeg
Z Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

© IFIP International Federation for Information Processing 2019. Trustworthy Machine Learning (ML) is one of significant challenges of “black-box” ML for its wide impact on practical applications. This paper investigates the effects of presentation of influence of training data points on machine learning predictions to boost user trust. A framework of fact-checking for boosting user trust is proposed in a predictive decision making scenario to allow users to interactively check the training data points with different influences on the prediction by using parallel coordinates based visualization. This work also investigates the feasibility of physiological signals such as Galvanic Skin Response (GSR) and Blood Volume Pulse (BVP) as indicators for user trust in predictive decision making. A user study found that the presentation of influences of training data points significantly increases the user trust in predictions, but only for training data points with higher influence values under the high model performance condition, where users can justify their actions with more similar facts to the testing data point. The physiological signal analysis showed that GSR and BVP features correlate to user trust under different influence and model performance conditions. These findings suggest that physiological indicators can be integrated into the user interface of AI applications to automatically communicate user trust variations in predictive decision making

Crossref

OPUS - University of Technology Sydney

Feature Selection for MAUC-Oriented Classification Systems

Author: Aha
Bluma
Cortes
Cortes
Dong
Dong
Edwards
Fawcett
Forman
Fürnkranz
Guyon
Guyon
Hand
Hanley
Hastie
He
Hong
Huang
Hull
Hunt
Ke Tang
Kohavi
Landgrebe
Liang
Liu
Liu
Neher
Peng
Press
Provost
Pruzansky
Quinlan
Robnik-Sikonja
Rui Wang
Tang
Witten
Yu
Yukinawa
Zhao
Zhu
Zhu
Zhu
Zhu
Publication venue: 'Elsevier BV'
Publication date: 15/05/2011
Field of study

Feature selection is an important pre-processing step for many pattern classification tasks. Traditionally, feature selection methods are designed to obtain a feature subset that can lead to high classification accuracy. However, classification accuracy has recently been shown to be an inappropriate performance metric of classification systems in many cases. Instead, the Area Under the receiver operating characteristic Curve (AUC) and its multi-class extension, MAUC, have been proved to be better alternatives. Hence, the target of classification system design is gradually shifting from seeking a system with the maximum classification accuracy to obtaining a system with the maximum AUC/MAUC. Previous investigations have shown that traditional feature selection methods need to be modified to cope with this new objective. These methods most often are restricted to binary classification problems only. In this study, a filter feature selection method, namely MAUC Decomposition based Feature Selection (MDFS), is proposed for multi-class classification problems. To the best of our knowledge, MDFS is the first method specifically designed to select features for building classification systems with maximum MAUC. Extensive empirical results demonstrate the advantage of MDFS over several compared feature selection methods.Comment: A journal length pape

arXiv.org e-Print Archive

Crossref

Spatiotemporal patterns of population in mainland China, 1990 to 2010

Author: A Liaw
A Murakami
A Schneider
A Sorichetta
A. E. Gaughan
AE Gaughan
AJ Tatem
B Lehner
C Linard
C Linard
CC Fan
D Azar
D Azar
D Balk
D Balk
D Lopez-Carr
E Doxsey-Whitfield
F. R. Stevens
FR Stevens
H Bagan
H Buhaug
J Mennis
JE Dobson
L Breiman
L Wang
M Gilbert
M Haklay
M Robnik-Sikonja
NN Patel
U Deichmann
W Xiaogang
WFirst Lavely
XM Bai
Y Cai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2016
Field of study

According to UN forecasts, global population will increase to over 8 billion by 2025, with much of this anticipated population growth expected in urban areas. In China, the scale of urbanization has, and continues to be, unprecedented in terms of magnitude and rate of change. Since the late 1970s, the percentage of Chinese living in urban areas increased from ~18% to over 50%. To quantify these patterns spatially we use time-invariant or temporally-explicit data, including census data for 1990, 2000, and 2010 in an ensemble prediction model. Resulting multi-temporal, gridded population datasets are unique in terms of granularity and extent, providing fine-scale (~100 m) patterns of population distribution for mainland China. For consistency purposes, the Tibet Autonomous Region, Taiwan, and the islands in the South China Sea were excluded. The statistical model and considerations for temporally comparable maps are described, along with the resulting datasets. Final, mainland China population maps for 1990, 2000, and 2010 are freely available as products from the WorldPop Project website and the WorldPop Dataverse Repository

Southampton (e-Prints Soton)

Crossref

PubMed Central

DI-fusion

Repository of the University of Namur

Improving feature selection performance using pairwise pre-evaluation

Author: AG Karegowda
BO Adegoke
C Ding
F Thabtah
J Liang
JR Vergara
L Ladha
LA Kurgan
M Robnik-Sikonja
P Romanski
RK Singh
S Jungjit
Sejong Oh
Songlu Li
V Bolón-Canedo
V Kumar
W Snedecor
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Practical Platform for Blood Biomarker Study by Using Global Gene Expression Profiling of Peripheral Whole Blood

Author: Aimee K. Zaas
BM Bolstad
Bonnie Berger
C Wright
DC Thach
E Wu
Erxi Wu
F Borovecki
Hui Yao
I Kononenko
I Osman
IH Witten
Isaac S. Kohane
J Liu
K Kira
K Kuhn
KJ Martin
L Rainen
LA Field
LX Qin
M Robnik-Sikonja
Michal Galdzicki
MW Pfaffl
Nathan Palmer
Patrick Schmid
RC Gentleman
RJ Feezor
S Debey
S Debey
V Chai
Y Benjamini
Z Tian
Ze Tian
Publication venue: Public Library of Science
Publication date: 17/04/2009
Field of study

Background: Although microarray technology has become the most common method for studying global gene expression, a plethora of technical factors across the experiment contribute to the variable of genome gene expression profiling using peripheral whole blood. A practical platform needs to be established in order to obtain reliable and reproducible data to meet clinical requirements for biomarker study. Methods and Findings: We applied peripheral whole blood samples with globin reduction and performed genome-wide transcriptome analysis using Illumina BeadChips. Real-time PCR was subsequently used to evaluate the quality of array data and elucidate the mode in which hemoglobin interferes in gene expression profiling. We demonstrated that, when applied in the context of standard microarray processing procedures, globin reduction results in a consistent and significant increase in the quality of beadarray data. When compared to their pre-globin reduction counterparts, post-globin reduction samples show improved detection statistics, lowered variance and increased sensitivity. More importantly, gender gene separation is remarkably clearer in post-globin reduction samples than in pre-globin reduction samples. Our study suggests that the poor data obtained from pre-globin reduction samples is the result of the high concentration of hemoglobin derived from red blood cells either interfering with target mRNA binding or giving the pseudo binding background signal. Conclusion: We therefore recommend the combination of performing globin mRNA reduction in peripheral whole blood samples and hybridizing on Illumina BeadChips as the practical approach for biomarker study

Public Library of Science (PLOS)

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

Improving activity recognition using a wearable barometric pressure sensor in mobility-impaired stroke patients

Author: A Godfrey
A Moncada-Torres
A Paraschiv-Ionescu
A Salarian
A Salarian
A Salarian
AC Novak
Andreas R. Luft
Anisoara Paraschiv-Ionescu
Arash Arami
B Najafi
B Najafi
C-C Yang
EH Mamdani
F Cheong
F Massé
F Schasfoort
Fabien Massé
G Raju
GD Fulk
IC Gyllensten
J Demšar
J Favre
J Lester
JA Steeves
Kamiar Aminian
L Ada
L Blum
L Breiman
L Lam
LS Williams
M Friedman
M Hall
M Robnik-Sikonja
MA Alzahrani
MJ Powell
MN Berberan-Santos
MS Orendurff
PM Grant
PW Duncan
R Ganea
R Ganea
Roman R. Gonzenbach
SFM Chastin
U Lindemann
U Lindemann
VL Feigin
VT Hees van
WGM Janssen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

Author: A Anand
A Andreeva
A Elofsson
A Krogh
A Paiardini
A Reinhardt
AG Murzin
AY Istomin
B Niu
B Rost
B Rost
C Chen
C Chen
C Orengo
C Zheng
CA Floudas
D Aha
D Jones
D Jones
D Przybylski
EP Carpenter
F Gu
G John
G von Heijne
GP Zhou
H Bigelow
H He
H Kim
H Liu
H Zhang
HM Berman
I Majumdar
I Witten
IB Kuznetsov
J Ruan
J Song
JM Bujnicki
JY Yang
K Bryson
K Chen
K Ginalski
K Kedarisetti
K Kedarisetti
K Tomii
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KY Feng
L Carlacci
L Dong
L Homaeian
L Jin
LA Kurgan
LA Kurgan
LA Kurgan
LA Kurgan
LA Kurgan
LT Huang
Lukasz Kurgan
M Punta
M Punta
M Robnik-Sikonja
MA Hall
Marcin J Mizianty
MM Gromiha
MM Gromiha
MM Gromiha
O Gotoh
OV Galzitskaya
P Baldi
P Langley
P Raman
QS Du
R Apweiler
R Gupta
R Kohavi
RL Dunbrack
RL Marsden
S Brenner
S Cessie
S Costantini
S Costantini
S Jahandideh
S Jahandideh
S Keerthi
S Lee
S Wu
SF Altschul
SR Amirova
T Liu
TF Smith
TL Zhang
TL Zhang
W Chen
X Xiao
X Xiao
X Xiao
X Zheng
Y Cai
Y Cai
Y Cai
Y Cai
Y Cao
Y Zhang
YD Cai
YK Yu
YS Ding
YS Ding
Z Xiang
Z Zhang
ZC Li
ZC Li
ZX Wang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at <url>http://biomine.ece.ualberta.ca/MODAS/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central