Search CORE

12 research outputs found

Algebraic Comparison of Partial Lists in Bioinformatics

Author: A Gobbi
A Kalousis
A Kossenkov
A Sboner
AC Haury
AL Boulesteix
Arkady B. Khodursky
B Di Camillo
B Efron
B Efron
B Efron
B Schowe
C Cortes
C Cortes
C Furlanello
C Schneider
C Schneider
C Soneson
C Yao
Cesare Furlanello
Consortium The MicroArray Quality Control (MAQC)
D Albanese
D Cai
D Corrada
D Critchlow
D Saari
D Witten
G Guzzetta
G Jurman
G Jurman
G Lance
G Lance
G Smyth
Giuseppe Jurman
GS Cheon
I Guyon
I Jeffery
I Lönnstedt
J Bar-Ilan
J Borda
J Chen
J Ioannidis
J Neter
J Storey
L Ein-Dor
L Kuncheva
L Yu
L Zhang
M Desarkar
M Kauers
M Kauers
M Kendall
M Schimek
M Schimek
M Slawski
M Villarino
M Villarino
O Bousquet
P Baldi
P Diaconis
P Diaconis
P Hall
P Hall
P Krízek
PC Boutros
R Fagin
R Gentleman
R Graham
R Pearson
R Pique-Regi
R Pique-Regi
R Simon
Roberto Visintainer
S Abramov
S Dudoit
S Lin
S Lin
S Mukherjee
S Setlur
S Simićc
S Vanderlooy
Samantha Riccadonna
SK Lau
T Bø
T Calders
V Tusher
Visintainer
W Fury
W Hoeffding
W Shi
X Wang
X Yang
Y Xiao
Y Xiao
Z He
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 08/04/2010
Field of study

The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or just within a meta-analysis comparison, instead of one list it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained. Here we introduce a method, based on the algebraic theory of symmetric groups, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated first on synthetic data in a gene filtering task and then for finding gene profiles on a recent prostate cancer dataset

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Directory of Open Access Journals

PubMed Central

A family of measures for best top-n class-selective decision rules

Author: Frelicot Carl
Le Capitaine Hoel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

International audienceWhen classes strongly overlap in the feature space, or when some classes are not known in advance, the performance of a classifier heavily decreases. To overcome this problem, the reject option has been introduced. It simply consists in withdrawing the decision, and let another classifier, or an expert, take the decision whenever exclusively classifying is not reliable enough. The classification problem is then a matter of class-selection, from none to all classes. In this paper, we propose a family of measures suitable to define such decision rules. It is based on a new family of operators that are able to detect blocks of similar values within a set of numbers in the unit interval, the soft labels of an incoming pattern to be classified, using a single threshold. Experiments on synthetic and real datasets available in the public domain show the efficiency of our approach

The Metabolomic Profile in Amyotrophic Lateral Sclerosis Changes According to the Progression of the Disease: An Exploratory Study

Author: Buonocore Michela
Campiglia Pietro
Ciaglia Tania
D’Ursi Anna Maria
Grimaldi Manuela
Marino Carmen
Polverino Arianna
Salviati Emanuela
Santoro Angelo
Sommella Eduardo Maria
Sorrentino Giuseppe
Sorrentino Pierpaolo
Trojsi Francesca
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

Amyotrophic lateral sclerosis (ALS) is a multifactorial neurodegenerative pathology of the upper or lower motor neuron. Evaluation of ALS progression is based on clinical outcomes considering the impairment of body sites. ALS has been extensively investigated in the pathogenetic mechanisms and the clinical profile; however, no molecular biomarkers are used as diagnostic criteria to establish the ALS pathological staging. Using the source-reconstructed magnetoencephalography (MEG) approach, we demonstrated that global brain hyperconnectivity is associated with early and advanced clinical ALS stages. Using nuclear magnetic resonance (1H-NMR) and high resolution mass spectrometry (HRMS) spectroscopy, here we studied the metabolomic profile of ALS patients’ sera characterized by different stages of disease progression—namely early and advanced. Multivariate statistical analysis of the data integrated with the network analysis indicates that metabolites related to energy deficit, abnormal concentrations of neurotoxic metabolites and metabolites related to neurotransmitter production are pathognomonic of ALS in the advanced stage. Furthermore, analysis of the lipidomic profile indicates that advanced ALS patients report significant alteration of phosphocholine (PCs), lysophosphatidylcholine (LPCs), and sphingomyelin (SMs) metabolism, consistent with the exigency of lipid remodeling to repair advanced neuronal degeneration and inflammatio

Multidisciplinary Digital Publishing Institute

Archivio della ricerca - Università degli studi di Napoli "Parthenope"

Directory of Open Access Journals

PubMed Central

Archivio della Ricerca - Università di Salerno

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Information-Theoretic Measures for Objective Evaluation of Classifications

Author: He Ran
Hu Bao-Gang
Yuan XiaoTong
Publication venue: 'China Science Publishing & Media Ltd.'
Publication date: 10/07/2011
Field of study

This work presents a systematic study of objective evaluations of abstaining classifications using Information-Theoretic Measures (ITMs). First, we define objective measures for which they do not depend on any free parameter. This definition provides technical simplicity for examining "objectivity" or "subjectivity" directly to classification evaluations. Second, we propose twenty four normalized ITMs, derived from either mutual information, divergence, or cross-entropy, for investigation. Contrary to conventional performance measures that apply empirical formulas based on users' intuitions or preferences, the ITMs are theoretically more sound for realizing objective evaluations of classifications. We apply them to distinguish "error types" and "reject types" in binary classifications without the need for input data of cost terms. Third, to better understand and select the ITMs, we suggest three desirable features for classification assessment measures, which appear more crucial and appealing from the viewpoint of classification applications. Using these features as "meta-measures", we can reveal the advantages and limitations of ITMs from a higher level of evaluation knowledge. Numerical examples are given to corroborate our claims and compare the differences among the proposed measures. The best measure is selected in terms of the meta-measures, and its specific properties regarding error types and reject types are analytically derived.Comment: 25 Pages, 1 Figure, 10 Table

arXiv.org e-Print Archive

Crossref

An Optimum Class-Rejective Decision Rule and Its Evaluation

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

A hybrid computational intelligence approach to groundwater spring potential mapping

Author: Ahmad BB
Bui DT
Chapi K
Chen W
Khosravi K
Lee S
Pham BT
Pradhan B
Shahabi H
Shirzadi A
Singh VP
Publication venue: 'MDPI AG'
Publication date: 27/09/2019
Field of study

© 2019 by the authors. This study proposes a hybrid computational intelligence model that is a combination of alternating decision tree (ADTree) classifier and AdaBoost (AB) ensemble, namely "AB-ADTree", for groundwater spring potential mapping (GSPM) at the Chilgazi watershed in the Kurdistan province, Iran. Although ADTree and its ensembles have been widely used for environmental and ecological modeling, they have rarely been applied to GSPM. To that end, a groundwater spring inventory map and thirteen conditioning factors tested by the chi-square attribute evaluation (CSAE) technique were used to generate training and testing datasets for constructing and validating the proposed model. The performance of the proposed model was evaluated using statistical-index-based measures, such as positive predictive value (PPV), negative predictive value (NPV), sensitivity, specificity accuracy, root mean square error (RMSE), and the area under the receiver operating characteristic (ROC) curve (AUROC). The proposed hybrid model was also compared with five state-of-the-art benchmark soft computing models, including singleADTree, support vector machine (SVM), stochastic gradient descent (SGD), logistic model tree (LMT), logistic regression (LR), and random forest (RF). Results indicate that the proposed hybrid model significantly improved the predictive capability of the ADTree-based classifier (AUROC = 0.789). In addition, it was found that the hybrid model, AB-ADTree, (AUROC = 0.815), had the highest goodness-of-fit and prediction accuracy, followed by the LMT (AUROC = 0.803), RF (AUC = 0.803), SGD, and SVM (AUROC = 0.790) models. Indeed, this model is a powerful and robust technique for mapping of groundwater spring potential in the study area. Therefore, the proposed model is a promising tool to help planners, decision makers, managers, and governments in the management and planning of groundwater resources

Multidisciplinary Digital Publishing Institute

OPUS - University of Technology Sydney

Universiti Teknologi Malaysia Institutional Repository

Landslide susceptibility mapping using remote sensing data and geographic information system-based algorithms

Author: Mohammadi Ayub
Publication venue
Publication date: 01/03/2019
Field of study

Whether they occur due to natural triggers or human activities, landslides lead to loss of life and damages to properties which impact infrastructures, road networks and buildings. Landslide Susceptibility Map (LSM) provides the policy and decision makers with some valuable information. This study aims to detect landslide locations by using Sentinel-1 data, the only freely available online Radar imagery, and to map areas prone to landslide using a novel algorithm of AB-ADTree in Cameron Highlands, Pahang, Malaysia. A total of 152 landslide locations were detected by using integration of Interferometry Synthetic Aperture RADAR (InSAR) technique, Google Earth (GE) images and extensive field survey. However, 80% of the data were employed for training the machine learning algorithms and the remaining 20% for validation purposes. Seventeen triggering and conditioning factors, namely slope, aspect, elevation, distance to road, distance to river, proximity to fault, road density, river density, Normalized Difference Vegetation Index (NDVI), rainfall, land cover, lithology, soil types, curvature, profile curvature, Stream Power Index (SPI) and Topographic Wetness Index (TWI), were extracted from satellite imageries, digital elevation model (DEM), geological and soil maps. These factors were utilized to generate landslide susceptibility maps using Logistic Regression (LR) model, Logistic Model Tree (LMT), Random Forest (RF), Alternating Decision Tree (ADTree), Adaptive Boosting (AdaBoost) and a novel hybrid model from ADTree and AdaBoost models, namely AB-ADTree model. The validation was based on area under the ROC curve (AUC) and statistical measurements of Positive Predictive Value (PPV), Negative Predictive Value (NPV), sensitivity, specificity, accuracy and Root Mean Square Error (RMSE). The results showed that AUC was 90%, 92%, 88%, 59%, 96% and 94% for LR, LMT, RF, ADTree, AdaBoost and AB-ADTree algorithms, respectively. Non-parametric evaluations of the Friedman and Wilcoxon were also applied to assess the models’ performance: the findings revealed that ADTree is inferior to the other models used in this study. Using a handheld Global Positioning System (GPS), field study and validation were performed for almost 20% (30 locations) of the detected landslide locations and the results revealed that the landslide locations were correctly detected. In conclusion, this study can be applicable for hazard mitigation purposes and regional planning

Universiti Teknologi Malaysia Institutional Repository

Kernel-Based Ranking. Methods for Learning and Performance Estimation

Author: Airola Antti
Publication venue: Turku Centre for Computer Science
Publication date: 12/12/2011
Field of study

Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.Siirretty Doriast

UTUPub