Search CORE

76 research outputs found

Improving the Efficiency of Inductive Logic Programming Through the Use of Query Packs

Author: Blockeel H.
Dehaspe L.
Demoen B.
Janssens G.
Ramon J.
Vandecasteele H.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2002
Field of study

Inductive logic programming, or relational learning, is a powerful paradigm for machine learning or data mining. However, in order for ILP to become practically useful, the efficiency of ILP systems must improve substantially. To this end, the notion of a query pack is introduced: it structures sets of similar queries. Furthermore, a mechanism is described for executing such query packs. A complexity analysis shows that considerable efficiency improvements can be achieved through the use of this query pack execution mechanism. This claim is supported by empirical results obtained by incorporating support for query pack execution in two existing learning systems

arXiv.org e-Print Archive

Lirias

Crossref

Gene Function Classification Using Bayesian Models with Hierarchy-Based Priors

Author: A Clare
A McCallum
AS Weigend
B Rost
B Schoikowski
B Shahbaba
Babak Shahbaba
BE Engelhardt
D Koller
EM Marcotte
FR Blattner
H Blockeel
I Tsochantaridis
IUBMB
J DeRisi
J Fox
J Goodman
J Struyf
J Zhang
JA Eisen
JR Guest
K Sjölander
L Cai
L Dehaspe
M Brown
M Deng
M Deng
M Eisen
M Riley
M Riley
N Cesa-Bianchi
O Dekel
P Pavlidis
R Caruana
R Eisner
Radford M Neal
RD King
RD King
RM Neal
RM Neal
RM Neal
S Rison
S Sattath
S Spiro
SF Altschul
ST Dumais
WR Pearson
Z Barutcuoglu
Publication venue
Publication date: 01/01/2006
Field of study

We investigate the application of hierarchical classification schemes to the annotation of gene function based on several characteristics of protein sequences including phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL) model, a hierarchical model based on a set of nested MNL models, and a MNL model with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. We also provide a new scheme for combining different sources of information. We use these models to predict the functional class of Open Reading Frames (ORFs) from the E. coli genome. The results from all three models show substantial improvement over previous methods, which were based on the C5 algorithm. The MNL model using a prior based on the hierarchy outperforms both the non-hierarchical MNL model and the nested MNL model. In contrast to previous attempts at combining these sources of information, our approach results in a higher accuracy rate when compared to models that use each data source alone. Together, these results show that gene function can be predicted with higher accuracy than previously achieved, using Bayesian models that incorporate suitable prior information

arXiv.org e-Print Archive

University of Toronto Research Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Towards semantic web mining

Author: A. Hotho
A. Maedche
A. Maedche
B. Berendt
B. Berendt
B. Ganter
B. Ganter
B. Mobasher
D. Hand
E.H. Chi
G. Chang
J. Hobbs
J. M. Kleinberg
J. Srivastava
L. Dehaspe
M. Craven
M. Fernández
M. Kifer
M. Spiliopoulou
R. Cooley
S. Chakrabarti
S. Chakrabarti
W. Lin
Publication venue: Springer
Publication date: 01/01/2002
Field of study

Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. The idea is to improve, on the one hand, the results of Web Mining by exploiting the new semantic structures in the Web; and to make use of Web Mining, on the other hand, for building up the Semantic Web. This paper gives an overview of where the two areas meet today, and sketches ways of how a closer integration could be profitable

CiteSeerX

Crossref

DSpace an der Universität Kassel

A Sliding Window Algorithm for Relational Frequent Patterns Mining from Data Streams

Author: B. Mozafari
C. Silvestri
F.A. Lisi
G.D. Plotkin
H. Blockeel
H. Mannila
J. Ren
L. Dehaspe
R. Rymon
S. Ceri
S. Džeroski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Crossref

Spatial associative classification: propositional vs structural approach

Author: A. Appice
A. Appice
Annalisa Appice
D. Malerba
E. Baralis
F. A. Lisi
G. Dong
H. Mannila
J. A. Robinson
J. Fitzpatrick
J. Knobbe
J. R. Quinlan
K. Koperski
L. Dehaspe
L. Dehaspe
M. C. Ludl
M. Ceci
M. Krogel
M. Modrzejewski
M. Pazzani
Michelangelo Ceci
N. Lavrač
P. Domingos
P. Flach
R. E. Bellman
S. Džeroski
S. Kramer
S. Muggleton
S. Shekhar
T. Mitchell
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Generating Explainable and Effective Data Descriptors Using Relational Learning: Application to Cancer Biology

Author: A Cherkasov
A Clare
A Gaulton
A Koleti
A Srinivasan
AE Hoerl
DS Wishart
EP Barracchia
I Olier
J Verma
JW Lloyd
L Breiman
L Dehaspe
M Ceci
M Zitnik
MP Menden
NP Tatonetti
R Tibshirani
RD King
RD King
S Fröhler
S Muggleton
S Sonnenburg
SJ Russell
T Dash
T Takeda
W Jeon
Y Chen
Y LeCun
Y Park
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

The key to success in machine learning is the use of effective data representations. The success of deep neural networks (DNNs) is based on their ability to utilize multiple neural network layers, and big data, to learn how to convert simple input representations into richer internal representations that are effective for learning. However, these internal representations are sub-symbolic and difficult to explain. In many scientific problems explainable models are required, and the input data is semantically complex and unsuitable for DNNs. This is true in the fundamental problem of understanding the mechanism of cancer drugs, which requires complex background knowledge about the functions of genes/proteins, their cells, and the molecular structure of the drugs. This background knowledge cannot be compactly expressed propositionally, and requires at least the expressive power of Datalog. Here we demonstrate the use of relational learning to generate new data descriptors in such semantically complex background knowledge. These new descriptors are effective: adding them to standard propositional learning methods significantly improves prediction accuracy. They are also explainable, and add to our understanding of cancer. Our approach can readily be expanded to include other complex forms of background knowledge, and combines the generality of relational learning with the efficiency of standard propositional learning

Crossref

Chalmers Research

A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models

Author: A Andreeva
A Ben-Hur
A Karwath
A Karwath
A Shah
Alessandra Carbone
B Liu
B Qian
B Webb-Robertson
C Ferreira
C Leslie
D Higgins
F Wilcoxon
G Yona
Gerson Zaverucha
H Rangwala
H Saigo
J Bernardes
J Davis
J Gough
J Quinlan
J Soeding
J Weston
Juliana S Bernardes
L De Raedt
L Dehaspe
L Liao
N Shan-Hwei
Q Dong
Q Su
R Agrawal
R Hughey
R King
R King
R Kuang
R Sadreyev
S Altschul
S Altschul
S Brenner
S Eddy
S Eddy
S Kawashima
S Lee
T Handstad
T Jaakkola
T Lingner
U Syed
V Alexandrov
V Atalay
Y Hou
Y Hou
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Remote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. However, when we deal with proteins in the "twilight zone" we can observe that only some segments of sequences (motifs) are conserved. We introduce a novel logical representation that allows us to represent physico-chemical properties of sequences, conserved amino acid positions and conserved physico-chemical positions in the MSA. From this, Inductive Logic Programming (ILP) finds the most frequent patterns (motifs) and uses them to train propositional models, such as decision trees and support vector machines (SVM). Results We use the SCOP database to perform our experiments by evaluating protein recognition within the same superfamily. Our results show that our methodology when using SVM performs significantly better than some of the state of the art methods, and comparable to other. However, our method provides a comprehensible set of logical rules that can help to understand what determines a protein function. Conclusions The strategy of selecting only the most frequent patterns is effective for the remote homology detection. This is possible through a suitable first-order logical representation of homologous properties, and through a set of frequent patterns, found by an ILP system, that summarizes essential features of protein functions.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

HAL-Inserm

PubMed Central