Search CORE

eScholarship - University of California

Multi-Target Prediction: A Unifying View on Problems and Methods

Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Bayesian aggregation versus majority vote in the characterization of non-specific arm pain based on quantitative needle electromyography

Author: A Hamilton-Wright
Andrew Hamilton-Wright
B Larsson
B Larsson
B Larsson
C Nadeau
Daniel W Stashuk
DV Budescu
DW Stashuk
DW Stashuk
DW Stashuk
DW Stashuk
DW Stashuk
E Stålberg
G Hagg
G Pfeiffer
G Pfeiffer
GJ Macfarlane
I Kononenko
J Greening
J Greening
J Greening
J Greening
J Greening
J Lipscomb
JM Harrington
K Calder
K Calder
K Walker-Bone
Kristina M Calder
Linda McLean
M Urwin
M West
Q McNemar
R Kohavi
RO Duda
RT Clemen
S Podner
S Podner
SE Larsson
SE Larsson
SE Larsson
SE Larsson
T Rosqvist
VL Durkalski
WF Brown
X Dennett
Z Barutcuoglu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

Author: A Mccallum
AC McHardy
B Fei
B Mirkin
B Slabbinck
B Slabbinck
B Slabbinck
Bernard De Baets
BM Hansen
Bram Slabbinck
C Kunitsky
C Vens
D Hutsebaut
D Koller
DE Stead
E Pruesse
E Stackebrandt
E Stackebrandt
F Wu
HP Kriegel
I Letunic
IH Witten
J Felsenstein
J Felsenstein
J Rousu
JP Euzéby
JP Parker
JS Lee
KT Konstantinidis
KT Konstantinidis
L Breiman
L Breiman
LG Wayne
M Heyndrickx
M Vancanneyt
N Cesa-Bianchi
N Diaz
N Saitou
P Dawyndt
P Dawyndt
P Kämpfer
Paul De Vos
Peter Dawyndt
R Craig
R Kohavi
RO Duda
RR Sokal
S Cheong
S Dumais
SP Lapage
T Hastie
T Hofmann
Willem Waegeman
Z Barutcuoglu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.</p

Ghent University Academic Bibliography

Effects of supplemented isoenergetic diets varying in cereal fiber and protein content on the bile acid metabolic signature and relation to insulin resistance

Author: A Damms-Machado
AM Styer
B Barutcuoglu
B Cariou
B Liaset
B Liaset
BM Owen
C Steiner
CB Ferrebee
CJ Martoni
F Tremblay
FG Schaap
GI Smith
GS Gerhard
GV Vahouny
HEK Virtanen
HJ Leidy
I Kyrou
I Sluijs
JG Hattersley
JG Hattersley
JH Cummings
JM Gallego-Escuredo
JS Munter de
KE Boyle
KG Alberti
KW Horst Ter
L Martinez de la Escalera
L Vitek
LC Hillman
M Bortolotti
M Krebs
M Möhlig
M Nielen van
M Scherer
MA Eastwood
MB Schulze
MD Jensen
MJ Potthoff
MJ Potthoff
ML Jones
MO Weickert
MO Weickert
MO Weickert
P Lefebvre
Puneet Puri
R Kohli
RA Haeusler
RA Haeusler
SA Jones
SA Kliewer
SH Um
T Harach
T Inagaki
T Linn
U Ericson
Vanessa Legry
W Sun
WG Hardison
WR Russell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Bile acids (BA) are potent metabolic regulators influenced by diet. We studied effects of isoenergetic increases in the dietary protein and cereal-fiber contents on circulating BA and insulin resistance (IR) in overweight and obese adults. Randomized controlled nutritional intervention (18 weeks) in 72 non-diabetic participants (overweight/obese: 29/43) with at least one further metabolic risk factor. Participants were group-matched and allocated to four isoenergetic supplemented diets: control; high cereal fiber (HCF); high-protein (HP); or moderately increased cereal fiber and protein (MIX). Whole-body IR and insulin-mediated suppression of hepatic endogenous glucose production were measured using euglycaemic–hyperinsulinemic clamps with [6-62H2] glucose infusion. Circulating BA, metabolic biomarkers, and IR were measured at 0, 6, and 18 weeks. Under isoenergetic conditions, HP-intake worsened IR in obese participants after 6 weeks (M-value: 3.77 ± 0.58 vs. 3.07 ± 0.44 mg/kg/min, p = 0.038), with partial improvement back to baseline levels after 18 weeks (3.25 ± 0.45 mg/kg/min, p = 0.089). No deleterious effects of HP-intake on IR were observed in overweight participants. HCF-diet improved IR in overweight participants after 6 weeks (M-value 4.25 ± 0.35 vs. 4.81 ± 0.31 mg/kg/min, p = 0.016), but did not influence IR in obese participants. Control and MIX diets did not influence IR. HP-induced, but not HCF-induced changes in IR strongly correlated with changes of BA profiles. MIX-diet significantly increased most BA at 18 weeks in obese, but not in overweight participants. BA remained unchanged in controls. Pooled BA concentrations correlated with fasting fibroblast growth factor-19 (FGF-19) plasma levels (r = 0.37; p = 0.003). Higher milk protein intake was the only significant dietary predictor for raised total and primary BA in regression analyses (total BA, p = 0.017; primary BA, p = 0.011). Combined increased intake of dietary protein and cereal fibers markedly increased serum BA concentrations in obese, but not in overweight participants. Possible mechanisms explaining this effect may include compensatory increases of the BA pool in the insulin resistant, obese state; or defective BA transport

University of Regensburg Publication Server

Aston Publications Explorer

Incorporating functional inter-relationships into protein function prediction algorithms

Author: A Herscovics
A Mateos
A Ruepp
AJ Parodi
ASN Seshasayee
B Shahbaba
C Stark
C Wang
Chad L Myers
CL Myers
D Lin
E Nabieva
F Azuaje
F Reggiori
G Pandey
G Pandey
G Tsoumakas
Gaurav Pandey
H Yu
J Geng
J Helenius
JE Shea
JJ Jiang
JL Sevilla
K Tarassov
M Ashburner
M Kuramochi
M Schuldiner
MP Brown
NJ Krogan
P D'haeseleer
P Resnik
PN Lipke
PN Tan
PW Lord
S Carroll
S Mnaimneh
S Siegel
S Vincenti
SW Stevens
T Gabaldon
T Xu
TR Hughes
Vipin Kumar
Y Tao
Z Barutcuoglu
Publication venue: BioMed Central
Publication date: 07/01/2008
Field of study

Abstract Background Functional classification schemes (e.g. the Gene Ontology) that serve as the basis for annotation efforts in several organisms are often the source of gold standard information for computational efforts at supervised protein function prediction. While successful function prediction algorithms have been developed, few previous efforts have utilized more than the protein-to-functional class label information provided by such knowledge bases. For instance, the Gene Ontology not only captures protein annotations to a set of functional classes, but it also arranges these classes in a DAG-based hierarchy that captures rich inter-relationships between different classes. These inter-relationships present both opportunities, such as the potential for additional training examples for small classes from larger related classes, and challenges, such as a harder to learn distinction between similar GO terms, for standard classification-based approaches. Results We propose a method to enhance the performance of classification-based protein function prediction algorithms by addressing the issue of using these interrelationships between functional classes constituting functional classification schemes. Using a standard measure for evaluating the semantic similarity between nodes in an ontology, we quantify and incorporate these inter-relationships into the <it>k</it>-nearest neighbor classifier. We present experiments on several large genomic data sets, each of which is used for the modeling and prediction of over hundred classes from the GO Biological Process ontology. The results show that this incorporation produces more accurate predictions for a large number of the functional classes considered, and also that the classes benefitted most by this approach are those containing the fewest members. In addition, we show how our proposed framework can be used for integrating information from the entire GO hierarchy for improving the accuracy of predictions made over a set of base classes. Finally, we provide qualitative and quantitative evidence that this incorporation of functional inter-relationships enables the discovery of interesting biology in the form of novel functional annotations for several yeast proteins, such as Sna4, Rtn1 and Lin1. Conclusion We implemented and evaluated a methodology for incorporating interrelationships between functional classes into a standard classification-based protein function prediction algorithm. Our results show that this incorporation can help improve the accuracy of such algorithms, and help uncover novel biology in the form of previously unknown functional annotations. The complete source code, a sample data set and the additional files for this paper are available free of charge for non-commercial use at <url>http://www.cs.umn.edu/vk/gaurav/functionalsimilarity/</url>.</p

University of Minnesota Digital Conservancy

Building multiclass classifiers for remote homology detection and fold recognition

Author: A Heger
A Krogh
A Sun
AG Murzin
B Taskar
C Leslie
C Leslie
CA Orengo
CD Huang
CH Ding
D Mittelman
E le
E Lindahl
EL Allwein
F Aiolli
F Rosenblatt
George Karypis
H Rangwala
H Saigo
Huzefa Rangwala
I Tsochantaridis
J Rousu
J Shi
J Weston
K Crammer
K Crammer
L Holm
L Liao
M Collins
M Collins
M Marti-Renom
P Baldi
R Kuang
R Rifkin
S Altschul
SB Needleman
SE Brenner
T Jaakkola
T Jaakkola
T Joachims
TF Smith
TG Dietterich
V Vapnik
W Pearson
Y Guermeur
Y Guermeur
Y Hou
Y Hou
Z Barutcuoglu
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Protein remote homology detection and fold recognition are central problems in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for solving these problems. These methods are primarily used to solve binary classification problems and they have not been extensively used to solve the more general multiclass remote homology prediction and fold recognition problems. RESULTS: We present a comprehensive evaluation of a number of methods for building SVM-based multiclass classification schemes in the context of the SCOP protein classification. These methods include schemes that directly build an SVM-based multiclass model, schemes that employ a second-level learning approach to combine the predictions generated by a set of binary SVM-based classifiers, and schemes that build and combine binary classifiers for various levels of the SCOP hierarchy beyond those defining the target classes. CONCLUSION: Analyzing the performance achieved by the different approaches on four different datasets we show that most of the proposed multiclass SVM-based classification approaches are quite effective in solving the remote homology prediction and fold recognition problems and that the schemes that use predictions from binary models constructed for ancestral categories within the SCOP hierarchy tend to not only lead to lower error rates but also reduce the number of errors in which a superfamily is assigned to an entirely different fold and a fold is predicted as being from a different SCOP class. Our results also show that the limited size of the training data makes it hard to learn complex second-level models, and that models of moderate complexity lead to consistently better results

CiteSeerX

University of Minnesota Digital Conservancy

Predicting gene function using hierarchical multi-label decision tree ensembles

Author: A Clare
A Clare
A Clare
B Hayete
C Vens
Celine Vens
D Kocev
Dragi Kocev
E Zdobnov
F Provost
F Wilcoxon
G Obozinski
GR Lanckriet
H Blockeel
H Blockeel
H Blockeel
H Chua
H Drucker
H Lee
H Mewes
Hendrik Blockeel
J Davis
J Gough
J Quinlan
J Rousu
J Struyf
Jan Struyf
L Breiman
L Breiman
L Breiman
L Breiman
L Pena-Castillo
Leander Schietgat
M Ashburner
M Deng
M Ouali
N Cesa-Bianchi
O Troyanskaya
R Caruana
S Altschul
S Mostafavi
Sašo Džeroski
T Hughes
T Joachims
U Karaoz
W Kim
W Tian
Y Chen
Y Guan
Z Barutcuoglu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background <it>S. cerevisiae</it>, <it>A. thaliana </it>and <it>M. musculus </it>are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability. Results We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use. Conclusions Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.</p

Leiden University Scholary Publications

Mammalian MicroRNA Prediction through a Support Vector Machine Model of Sequence and Structure

BACKGROUND: MicroRNAs (miRNAs) are endogenous small noncoding RNA gene products, on average 22 nt long, found in a wide variety of organisms. They play important regulatory roles by targeting mRNAs for degradation or translational repression. There are 377 known mouse miRNAs and 475 known human miRNAs in the May 2007 release of the miRBase database, the majority of which are conserved between the two species. A number of recent reports imply that it is likely that many mammalian miRNAs remain to be discovered. The possibility that there are more of them expressed at lower levels or in more specialized expression contexts calls for the exploitation of genome sequence information to accelerate their discovery. METHODOLOGY/PRINCIPAL FINDINGS: In this article, we describe a computational method-mirCoS-that uses three support vector machine models sequentially to discover new miRNA candidates in mammalian genomes based on sequence, secondary structure, and conservation. mirCoS can efficiently detect the majority of known miRNAs and predicts an extensive set of hairpin structures based on human-mouse comparisons. In total, 3476 mouse candidates and 3441 human candidates were found. These hairpins are more similar to known miRNAs than to negative controls in several aspects not considered by the prediction algorithm. A significant fraction of predictions is supported by existing expression evidence. CONCLUSIONS/SIGNIFICANCE: Using a novel approach, mirCoS performs comparably to or better than existing miRNA prediction methods, and contributes a significant number of new candidate miRNAs for experimental verification

Computational algorithms to predict Gene Ontology annotations

Author: A Canakoglu
A Hamosh
A Lazaric
A Nuzzo
AJ Perez
B Done
D Chicco
D Chicco
D Croft
D Korobkin
Davide Chicco
DM Blei
E Lavezzo
F Pessina
G Pandey
G Yu
KG Becker
L Wang
M Ashburner
M Kanehisa
M Masseroli
M Masseroli
M Masseroli
M Zitnik
Marco Masseroli
MM Kordmahalleh
OD King
P Khatri
P Pinoli
P Pinoli
Pietro Pinoli
S Raychaudhuri
S Vembu
ST Dumais
T Fawcett
T. Hofmann
X Robin
Y Tao
Z Barutcuoglu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study