134 research outputs found
ILP Experiments in Detecting Traffic Problems
The paper describes experiments in automated acquisition of knowledge in traffic problem detection. Preliminary results show that ILP can be used to successfully learn to detect traffic problems
Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction
BACKGROUND: Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers. RESULTS: This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function. CONCLUSIONS: Our newly developed method for HMC takes into account network information in the learning phase: When used for gene function prediction in the context of PPI networks, the explicit consideration of network autocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for different gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved when the PPI network is dense and contains a large proportion of function-relevant interactions
Data-Driven Structuring of the Output Space Improves the Performance of Multi-Target Regressors
peer-reviewedThe task of multi-target regression (MTR) is concerned with learning predictive models
capable of predicting multiple target variables simultaneously. MTR has attracted an increasing attention
within research community in recent years, yielding a variety of methods. The methods can be divided
into two main groups: problem transformation and problem adaptation. The former transform a MTR
problem into simpler (typically single target) problems and apply known approaches, while the latter
adapt the learning methods to directly handle the multiple target variables and learn better models which
simultaneously predict all of the targets. Studies have identified the latter group of methods as having
competitive advantage over the former, probably due to the fact that it exploits the interrelations of the
multiple targets. In the related task of multi-label classification, it has been recently shown that organizing
the multiple labels into a hierarchical structure can improve predictive performance.
In this paper, we investigate whether organizing the targets into a hierarchical structure can improve the
performance for MTR problems. More precisely, we propose to structure the multiple target variables into
a hierarchy of variables, thus translating the task of MTR into a task of hierarchical multi-target regression
(HMTR). We use four data-driven methods for devising the hierarchical structure that cluster the real values
of the targets or the feature importance scores with respect to the targets. The evaluation of the proposed
methodology on 16 benchmark MTR datasets reveals that structuring the multiple target variables into a
hierarchy improves the predictive performance of the corresponding MTR models. The results also show
that data-driven methods produce hierarchies that can improve the predictive performance even more than
expert constructed hierarchies. Finally, the improvement in predictive performance is more pronounced for
the datasets with very large numbers (more than hundred) of targets.European Commissio
Hierarchical Multi-classification with Predictive Clustering Trees in Functional Genomics
This paper investigates how predictive clustering trees can be used to predict gene function in the genome of the yeast Saccharomyces cerevisiae. We consider the MIPS FunCat classification scheme, in which each gene is annotated with one or more classes selected from a given functional class hierarchy. This setting presents two important challenges to machine learning: (1) each instance is labeled with a set of classes instead of just one class, and (2) the classes are structured in a hierarchy; ideally the learning algorithm should also take this hierarchical information into account. Predictive clustering trees generalize decision trees and can be applied to a wide range of prediction tasks by plugging in a suitable distance metric. We define an appropriate distance metric for hierarchical multi-classification and present experiments evaluating this approach on a number of data sets that are available for yeast
Error curves for evaluating the quality of feature rankings
peer reviewedIn this article, we propose a method for evaluating feature ranking algorithms. A feature ranking algorithm estimates the importance of descriptive features when predicting the target variable, and the proposed method evaluates the correctness of these importance values by computing the error measures of two chains of predictive models. The models in the first chain are built on nested sets of top-ranked features, while the models in the other chain are built on nested sets of bottom ranked features. We investigate which predictive models are appropriate for building these chains, showing empirically that the proposed method gives meaningful results and can detect differences in feature ranking quality. This is first demonstrated on synthetic data, and then on several real-world classification benchmark problems
Detecting Traffic Problems with ILP
Expert systems for decision support have recently been successfully introduced in road transport management. In this paper, we apply three state-of-the art ILP systems to learn how to detect traffic problems
Correlative Fluorescence and Raman Microscopy to Define Mitotic Stages at the Single-Cell Level: Opportunities and Limitations in the AI Era
Nowadays, morphology and molecular analyses at the single-cell level have a fundamental role in understanding biology better. These methods are utilized for cell phenotyping and in-depth studies of cellular processes, such as mitosis. Fluorescence microscopy and optical spectroscopy techniques, including Raman micro-spectroscopy, allow researchers to examine biological samples at the single-cell level in a non-destructive manner. Fluorescence microscopy can give detailed
morphological information about the localization of stained molecules, while Raman microscopy can produce label-free images at the subcellular level; thus, it can reveal the spatial distribution of molecular fingerprints, even in live samples. Accordingly, the combination of correlative fluorescence and Raman microscopy (CFRM) offers a unique approach for studying cellular stages at the singlecell level. However, subcellular spectral maps are complex and challenging to interpret. Artificial
intelligence (AI) may serve as a valuable solution to characterize the molecular backgrounds of phenotypes and biological processes by finding the characteristic patterns in spectral maps. The major contributions of the manuscript are: (I) it gives a comprehensive review of the literature focusing on AI techniques in Raman-based cellular phenotyping; (II) via the presentation of a case study, a new neural network-based approach is described, and the opportunities and limitations of AI, specifically deep learning, are discussed regarding the analysis of Raman spectroscopy data to
classify mitotic cellular stages based on their spectral maps
- …