Search CORE

11 research outputs found

Sequence Pattern Mining with Variables

Author: Grimaila Michael R
Mills Robert F
Okolica James S.
Peterson Gilbert L
Publication venue: AFIT Scholar
Publication date: 19/11/2018
Field of study

Sequence pattern mining (SPM) seeks to ﬁnd multiple items that commonly occur together in a speciﬁc order. One common assumption is that all of the relevant differences between items are captured through creating distinct items, e.g., if color matters then the same item in two different colors would have two items created, one for each color. In some domains, that is unrealistic. This paper makes two contributions. The ﬁrst extends SPM algorithms to allow item differentiation through attribute variables for domains with large numbers of items, e.g, by having one item with a variable with a color attribute rather than distinct items for each color. It demonstrates this by incorporating variables into Discontinuous Varied Order Sequence Mining (DVSM). The second contribution is the creation of Sequence Mining of Temporal Clusters (SMTC), a new SPM that addresses the interleaving issue common to SPM algorithms. Most SPM algorithms address interleaving by using a distance measure to separate co-occurring sequences. SMTC addresses interleaving by clustering all subsets of temporally close items and deferring the sequencing of mined patterns until the entire dataset if examined. Evaluation of the SPM algorithms on a digital forensics media analysis task results in a 96% reduction in terms to review, 100% detection of true positives and no false positives

AFTI Scholar (Air Force Institute of Technology)

Constraining the Search Space in Temporal Pattern Mining

Author: Herzog Otthein
Lattner Andreas D.
Publication venue
Publication date: 28/04/2011
Field of study

Agents in dynamic environments have to deal with complex situations including various temporal interrelations of actions and events. Discovering frequent patterns in such scenes can be useful in order to create prediction rules which can be used to predict future activities or situations. We present the algorithm MiTemP which learns frequent patterns based on a time intervalbased relational representation. Additionally the problem has also been transfered to a pure relational association rule mining task which can be handled by WARMR. The two approaches are compared in a number of experiments. The experiments show the advantage of avoiding the creation of impossible or redundant patterns with MiTemP. While less patterns have to be explored on average with MiTemP more frequent patterns are found at an earlier refinement level

University of Hildesheim

Learning Probabilistic Temporal Safety Properties from Examples in Relational Domains

Author: De Raedt Luc
Raskin Jean-François
Rens Gavin
Yang Wen-Chi
Publication venue
Publication date: 07/11/2022
Field of study

We propose a framework for learning a fragment of probabilistic computation tree logic (pCTL) formulae from a set of states that are labeled as safe or unsafe. We work in a relational setting and combine ideas from relational Markov Decision Processes with pCTL model-checking. More specifically, we assume that there is an unknown relational pCTL target formula that is satisfied by only safe states, and has a horizon of maximum

k

steps and a threshold probability

\alpha

. The task then consists of learning this unknown formula from states that are labeled as safe or unsafe by a domain expert. We apply principles of relational learning to induce a pCTL formula that is satisfied by all safe states and none of the unsafe ones. This formula can then be used as a safety specification for this domain, so that the system can avoid getting into dangerous situations in future. Following relational learning principles, we introduce a candidate formula generation process, as well as a method for deciding which candidate formula is a satisfactory specification for the given labeled states. The cases where the expert knows and does not know the system policy are treated, however, much of the learning process is the same for both cases. We evaluate our approach on a synthetic relational domain.Comment: 25 pages, 3 figures, 5 tables, 2 algorithms, preprin

arXiv.org e-Print Archive

Logic-based machine learning using a bounded hypothesis space: the lattice structure, refinement operators and a genetic algorithm approach

Author: Tamaddoni Nezhad Alireza
Publication venue: Computing, Imperial College London
Publication date: 01/03/2014
Field of study

Rich representation inherited from computational logic makes logic-based machine learning a competent method for application domains involving relational background knowledge and structured data. There is however a trade-off between the expressive power of the representation and the computational costs. Inductive Logic Programming (ILP) systems employ different kind of biases and heuristics to cope with the complexity of the search, which otherwise is intractable. Searching the hypothesis space bounded below by a bottom clause is the basis of several state-of-the-art ILP systems (e.g. Progol and Aleph). However, the structure of the search space and the properties of the refinement operators for theses systems have not been previously characterised. The contributions of this thesis can be summarised as follows: (i) characterising the properties, structure and morphisms of bounded subsumption lattice (ii) analysis of bounded refinement operators and stochastic refinement and (iii) implementation and empirical evaluation of stochastic search algorithms and in particular a Genetic Algorithm (GA) approach for bounded subsumption. In this thesis we introduce the concept of bounded subsumption and study the lattice and cover structure of bounded subsumption. We show the morphisms between the lattice of bounded subsumption, an atomic lattice and the lattice of partitions. We also show that ideal refinement operators exist for bounded subsumption and that, by contrast with general subsumption, efficient least and minimal generalisation operators can be designed for bounded subsumption. In this thesis we also show how refinement operators can be adapted for a stochastic search and give an analysis of refinement operators within the framework of stochastic refinement search. We also discuss genetic search for learning first-order clauses and describe a framework for genetic and stochastic refinement search for bounded subsumption. on. Finally, ILP algorithms and implementations which are based on this framework are described and evaluated.Open Acces

Spiral - Imperial College Digital Repository

A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models

Author: A Andreeva
A Ben-Hur
A Karwath
A Karwath
A Shah
Alessandra Carbone
B Liu
B Qian
B Webb-Robertson
C Ferreira
C Leslie
D Higgins
F Wilcoxon
G Yona
Gerson Zaverucha
H Rangwala
H Saigo
J Bernardes
J Davis
J Gough
J Quinlan
J Soeding
J Weston
Juliana S Bernardes
L De Raedt
L Dehaspe
L Liao
N Shan-Hwei
Q Dong
Q Su
R Agrawal
R Hughey
R King
R King
R Kuang
R Sadreyev
S Altschul
S Altschul
S Brenner
S Eddy
S Eddy
S Kawashima
S Lee
T Handstad
T Jaakkola
T Lingner
U Syed
V Alexandrov
V Atalay
Y Hou
Y Hou
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Remote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. However, when we deal with proteins in the "twilight zone" we can observe that only some segments of sequences (motifs) are conserved. We introduce a novel logical representation that allows us to represent physico-chemical properties of sequences, conserved amino acid positions and conserved physico-chemical positions in the MSA. From this, Inductive Logic Programming (ILP) finds the most frequent patterns (motifs) and uses them to train propositional models, such as decision trees and support vector machines (SVM). Results We use the SCOP database to perform our experiments by evaluating protein recognition within the same superfamily. Our results show that our methodology when using SVM performs significantly better than some of the state of the art methods, and comparable to other. However, our method provides a comprehensible set of logical rules that can help to understand what determines a protein function. Conclusions The strategy of selecting only the most frequent patterns is effective for the remote homology detection. This is possible through a suitable first-order logical representation of homologous properties, and through a set of frequent patterns, found by an ILP system, that summarizes essential features of protein functions.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

HAL-Inserm

PubMed Central

The lattice structure and refinement operators for the hypothesis space bounded by a bottom clause

Author: A. L. Duboc
A. Srinivasan
A. Tamaddoni-Nezhad
A. Tamaddoni-Nezhad
A. Tamaddoni-Nezhad
Alireza Tamaddoni-Nezhad
B.A. Davey
C. Rouveirol
J. C. Reynolds
M. R. Garey
P. R. J. Laag van der
S. H. Muggleton
S. Muggleton
S.-H. Nienhuys-Cheng
Stephen Muggleton
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Workplace values in the Japanese public sector: a constraining factor in the drive for continuous improvement

Author: Jobmann Christian
McCullen Peter
Publication venue: Centre for Concurrent Enterprise, Nottingham University Business School
Publication date: 06/07/2008
Field of study

University of Brighton Research Portal

Constraint based mining of first order sequences in SeqLog

Author: C. Masson
H. Hirsh
H. Mannila
H. Mannila
K. Wang
L. Raedt De
M.N. Garofalakis
N. Jacobs
R. Agrawal
R. Agrawal
R. Srikant
S.-H. Nienhuys-Cheng
T. Mitchell
Publication venue: Springer
Publication date: 01/01/2004
Field of study

Abstract. A logical language, SeqLog, for mining and querying sequential data and databases is presented. In SeqLog, data takes the form of a sequence of logical atoms, background knowledge can be specified using DataLog style clauses and sequential queries or patterns correspond to subsequences of logical atoms. SeqLog is then used as the representation language for the inductive database mining system MineSeqLog. Inductive queries in MineSeqLog take the form of a conjunction of a monotonic and an anti-monotonic constraint on sequential patterns. Given such an inductive query, MineSeqLog will efficiently compute the borders of the solution space. MineSeqLog uses variants of the famous level-wise algorithm together with ideas from version spaces to realize this. Finally, we report on a number of experiments in the domains of usermodeling that validate of the approach. 1 Introduction Data mining has been a hot research topic in recent years, and the mining ofknowledge from data of various models has been studied. One popular data model that has attracted a lot of attention concerns sequential data [2, 20, 13,6, 21, 22]. Many of these approaches are extensions of the classical level-wise itemset discovery algorithm "Apriori"[1]. However, the data models that havebeen used so far for modeling sequential patterns are not very expressive and often based on some form of propositional logic. The need for more expressivekind of patterns arises e.g. when modeling Unix-users [9]. E.g. the command sequence 1. ls 2. vi paper.tex 3. latex paper.tex 4. dvips paper.dvi 5. lpr paper.p

Lirias

CiteSeerX

Crossref