Search CORE

303 research outputs found

ILP Experiments in Detecting Traffic Problems

Author: Dzeroski Saso
Jacobs Nico
Molina Martin
Moure Carlos
Publication venue: Facultad de Informática (UPM)
Publication date: 31/12/1997
Field of study

The paper describes experiments in automated acquisition of knowledge in traffic problem detection. Preliminary results show that ILP can be used to successfully learn to detect traffic problems

CiteSeerX

Scipedia

Archivo Digital UPM

Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction

Author: Daniela Stojanova
Donato Malerba
Michelangelo Ceci
Saso Dzeroski
Publication venue: Springer Nature
Publication date: 26/09/2013
Field of study

BACKGROUND: Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers. RESULTS: This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function. CONCLUSIONS: Our newly developed method for HMC takes into account network information in the learning phase: When used for gene function prediction in the context of PPI networks, the explicit consideration of network autocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for different gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved when the PPI network is dense and contains a large proportion of function-relevant interactions

Springer - Publisher Connector

PubMed Central

Network regression with predictive clustering trees

Author: Appice Annalisa
Ceci Michelangelo
Dzeroski Saso
Stojanova Daniela
Publication venue
Publication date: 01/01/2011
Field of study

Archivio istituzionale della ricerca - Università di Bari

Open Access Repository

Dealing with spatial autocorrelation when learning predictive clustering trees

Author: Appice Annalisa
Ceci Michelangelo
Dzeroski Saso
Malerba Donato
Stojanova Daniela
Publication venue
Publication date: 01/01/2013
Field of study

Archivio istituzionale della ricerca - Università di Bari

Open Access Repository

Hierarchical Multi-classification with Predictive Clustering Trees in Functional Genomics

Author: A Clare
Amanda Clare
Hendrik Blockeel
Jan Struyf
Saso Dzeroski
Publication venue
Publication date: 01/01/2005
Field of study

This paper investigates how predictive clustering trees can be used to predict gene function in the genome of the yeast Saccharomyces cerevisiae. We consider the MIPS FunCat classification scheme, in which each gene is annotated with one or more classes selected from a given functional class hierarchy. This setting presents two important challenges to machine learning: (1) each instance is labeled with a set of classes instead of just one class, and (2) the classes are structured in a hierarchy; ideally the learning algorithm should also take this hierarchical information into account. Predictive clustering trees generalize decision trees and can be applied to a wide range of prediction tasks by plugging in a suitable distance metric. We define an appropriate distance metric for hierarchical multi-classification and present experiments evaluating this approach on a number of data sets that are available for yeast

CiteSeerX

Aberystwyth Research Portal

Data-Driven Structuring of the Output Space Improves the Performance of Multi-Target Regressors

Author: Dzeroski Saso
Kocev Dragi
Nikoloski Stevanche
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/01/2019
Field of study

peer-reviewedThe task of multi-target regression (MTR) is concerned with learning predictive models capable of predicting multiple target variables simultaneously. MTR has attracted an increasing attention within research community in recent years, yielding a variety of methods. The methods can be divided into two main groups: problem transformation and problem adaptation. The former transform a MTR problem into simpler (typically single target) problems and apply known approaches, while the latter adapt the learning methods to directly handle the multiple target variables and learn better models which simultaneously predict all of the targets. Studies have identified the latter group of methods as having competitive advantage over the former, probably due to the fact that it exploits the interrelations of the multiple targets. In the related task of multi-label classification, it has been recently shown that organizing the multiple labels into a hierarchical structure can improve predictive performance. In this paper, we investigate whether organizing the targets into a hierarchical structure can improve the performance for MTR problems. More precisely, we propose to structure the multiple target variables into a hierarchy of variables, thus translating the task of MTR into a task of hierarchical multi-target regression (HMTR). We use four data-driven methods for devising the hierarchical structure that cluster the real values of the targets or the feature importance scores with respect to the targets. The evaluation of the proposed methodology on 16 benchmark MTR datasets reveals that structuring the multiple target variables into a hierarchy improves the predictive performance of the corresponding MTR models. The results also show that data-driven methods produce hierarchies that can improve the predictive performance even more than expert constructed hierarchies. Finally, the improvement in predictive performance is more pronounced for the datasets with very large numbers (more than hundred) of targets.European Commissio

T-Stór

Relevance Grounding for Planning in Relational Domains

Author: A. Baddeley
C. Boutilier
C. Boutilier
D.S. Ruchkin
D.S. Weld
J. Anderson
M. Otterlo van
S. Dzeroski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Abstract. Probabilistic relational models are an efficient way to learn and represent the dynamics in realistic environments consisting of many objects. Autonomous intelligent agents that ground this representation for all objects need to plan in exponentially large state spaces and large sets of stochastic actions. A key insight for computational efficiency is that successful planning typically involves only a small subset of relevant objects. In this paper, we introduce a probabilistic model to represent planning with subsets of objects and provide a definition of object relevance. Our definition is sufficient to prove consistency between repeated planning in partially grounded models restricted to relevant objects and planning in the fully grounded model. We propose an algorithm that exploits object relevance to plan efficiently in complex domains. Empirical results in a simulated 3D blocksworld with an articulated manipulator and realistic physics prove the effectiveness of our approach.

CiteSeerX

Crossref

Error curves for evaluating the quality of feature rankings

Author: Dzeroski Saso
Geurts Pierre
Kocev Dragi
Petkovic Matej
Slavkov Ivica
Publication venue: 'PeerJ'
Publication date: 07/12/2020
Field of study

peer reviewedIn this article, we propose a method for evaluating feature ranking algorithms. A feature ranking algorithm estimates the importance of descriptive features when predicting the target variable, and the proposed method evaluates the correctness of these importance values by computing the error measures of two chains of predictive models. The models in the first chain are built on nested sets of top-ranked features, while the models in the other chain are built on nested sets of bottom ranked features. We investigate which predictive models are appropriate for building these chains, showing empirically that the proposed method gives meaningful results and can detect differences in feature ranking quality. This is first demonstrated on synthetic data, and then on several real-world classification benchmark problems

Open Repository and Bibliography - Liège

Random Prism: a noise-tolerant alternative to Random Forests

Author: Bramer
Bramer
Bramer
Breiman
Breiman
Cendrowska
Chawla
Dean
Dzeroski
Han
Ho
Kolter
Michalski
Minku
Panda
Provost
Quinlan
Quinlan
Smyth
Stahl
Stahl
Stahl
Stahl
Stahl
Stahl
Stahl
Witten
Zliobaite
Publication venue: 'Wiley'
Publication date: 01/11/2014
Field of study

Ensemble learning can be used to increase the overall classification accuracy of a classifier by generating multiple base classifiers and combining their classification results. A frequently used family of base classifiers for ensemble learning are decision trees. However, alternative approaches can potentially be used, such as the Prism family of algorithms that also induces classification rules. Compared with decision trees, Prism algorithms generate modular classification rules that cannot necessarily be represented in the form of a decision tree. Prism algorithms produce a similar classification accuracy compared with decision trees. However, in some cases, for example, if there is noise in the training and test data, Prism algorithms can outperform decision trees by achieving a higher classification accuracy. However, Prism still tends to overfit on noisy data; hence, ensemble learners have been adopted in this work to reduce the overfitting. This paper describes the development of an ensemble learner using a member of the Prism family as the base classifier to reduce the overfitting of Prism algorithms on noisy datasets. The developed ensemble classifier is compared with a stand-alone Prism classifier in terms of classification accuracy and resistance to noise

Central Archive at the University of Reading

Crossref