35,406 research outputs found
Interpretable Categorization of Heterogeneous Time Series Data
Understanding heterogeneous multivariate time series data is important in
many applications ranging from smart homes to aviation. Learning models of
heterogeneous multivariate time series that are also human-interpretable is
challenging and not adequately addressed by the existing literature. We propose
grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs
extend decision trees with a grammar framework. Logical expressions derived
from a context-free grammar are used for branching in place of simple
thresholds on attributes. The added expressivity enables support for a wide
range of data types while retaining the interpretability of decision trees. In
particular, when a grammar based on temporal logic is used, we show that GBDTs
can be used for the interpretable classi cation of high-dimensional and
heterogeneous time series data. Furthermore, we show how GBDTs can also be used
for categorization, which is a combination of clustering and generating
interpretable explanations for each cluster. We apply GBDTs to analyze the
classic Australian Sign Language dataset as well as data on near mid-air
collisions (NMACs). The NMAC data comes from aircraft simulations used in the
development of the next-generation Airborne Collision Avoidance System (ACAS
X).Comment: 9 pages, 5 figures, 2 tables, SIAM International Conference on Data
Mining (SDM) 201
AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments
This report considers the application of Articial Intelligence (AI) techniques to
the problem of misuse detection and misuse localisation within telecommunications
environments. A broad survey of techniques is provided, that covers inter alia
rule based systems, model-based systems, case based reasoning, pattern matching,
clustering and feature extraction, articial neural networks, genetic algorithms, arti
cial immune systems, agent based systems, data mining and a variety of hybrid
approaches. The report then considers the central issue of event correlation, that
is at the heart of many misuse detection and localisation systems. The notion of
being able to infer misuse by the correlation of individual temporally distributed
events within a multiple data stream environment is explored, and a range of techniques,
covering model based approaches, `programmed' AI and machine learning
paradigms. It is found that, in general, correlation is best achieved via rule based approaches,
but that these suffer from a number of drawbacks, such as the difculty of
developing and maintaining an appropriate knowledge base, and the lack of ability
to generalise from known misuses to new unseen misuses. Two distinct approaches
are evident. One attempts to encode knowledge of known misuses, typically within
rules, and use this to screen events. This approach cannot generally detect misuses
for which it has not been programmed, i.e. it is prone to issuing false negatives.
The other attempts to `learn' the features of event patterns that constitute normal
behaviour, and, by observing patterns that do not match expected behaviour, detect
when a misuse has occurred. This approach is prone to issuing false positives,
i.e. inferring misuse from innocent patterns of behaviour that the system was not
trained to recognise. Contemporary approaches are seen to favour hybridisation,
often combining detection or localisation mechanisms for both abnormal and normal
behaviour, the former to capture known cases of misuse, the latter to capture
unknown cases. In some systems, these mechanisms even work together to update
each other to increase detection rates and lower false positive rates. It is concluded
that hybridisation offers the most promising future direction, but that a rule or state
based component is likely to remain, being the most natural approach to the correlation
of complex events. The challenge, then, is to mitigate the weaknesses of
canonical programmed systems such that learning, generalisation and adaptation
are more readily facilitated
From examples to knowledge in model-driven engineering : a holistic and pragmatic approach
Le Model-Driven Engineering (MDE) est une approche de développement logiciel qui
propose d’élever le niveau d’abstraction des langages afin de déplacer l’effort de
conception et de compréhension depuis le point de vue des programmeurs vers celui des
décideurs du logiciel. Cependant, la manipulation de ces représentations abstraites, ou
modèles, est devenue tellement complexe que les moyens traditionnels ne suffisent plus à
automatiser les différentes tâches.
De son côté, le Search-Based Software Engineering (SBSE) propose de reformuler
l’automatisation des tâches du MDE comme des problèmes d’optimisation. Une fois
reformulé, la résolution du problème sera effectuée par des algorithmes métaheuristiques.
Face à la pléthore d’études sur le sujet, le pouvoir d’automatisation du SBSE n’est plus à
démontrer.
C’est en s’appuyant sur ce constat que la communauté du Example-Based MDE (EBMDE)
a commencé à utiliser des exemples d’application pour alimenter la reformulation
SBSE du problème d’apprentissage de tâche MDE. Dans ce contexte, la concordance de la
sortie des solutions avec les exemples devient un baromètre efficace pour évaluer l’aptitude
d’une solution à résoudre une tâche. Cette mesure a prouvé être un objectif sémantique de
choix pour guider la recherche métaheuristique de solutions.
Cependant, s’il est communément admis que la représentativité des exemples a un
impact sur la généralisabilité des solutions, l'étude de cet impact souffre d’un manque de
considération flagrant. Dans cette thèse, nous proposons une formulation globale du
processus d'apprentissage dans un contexte MDE incluant une méthodologie complète pour
caractériser et évaluer la relation qui existe entre la généralisabilité des solutions et deux
propriétés importantes des exemples, leur taille et leur couverture.
Nous effectuons l’analyse empirique de ces deux propriétés et nous proposons un plan
détaillé pour une analyse plus approfondie du concept de représentativité, ou d’autres
représentativités.Model-Driven Engineering (MDE) is a software development approach that proposes to
raise the level of abstraction of languages in order to shift the design and understanding
effort from a programmer point of view to the one of decision makers. However, the
manipulation of these abstract representations, or models, has become so complex that
traditional techniques are not enough to automate its inherent tasks.
For its part, the Search-Based Software Engineering (SBSE) proposes to reformulate
the automation of MDE tasks as optimization problems. Once reformulated, the problem will
be solved by metaheuristic algorithms. With a plethora of studies on the subject, the power
of automation of SBSE has been well established.
Based on this observation, the Example-Based MDE community (EB-MDE) started
using application examples to feed the reformulation into SBSE of the MDE task learning
problem. In this context, the concordance of the output of the solutions with the examples
becomes an effective barometer for evaluating the ability of a solution to solve a task. This
measure has proved to be a semantic goal of choice to guide the metaheuristic search for
solutions.
However, while it is commonly accepted that the representativeness of the examples
has an impact on the generalizability of the solutions, the study of this impact suffers from a
flagrant lack of consideration. In this thesis, we propose a thorough formulation of the
learning process in an MDE context including a complete methodology to characterize and
evaluate the relation that exists between two important properties of the examples, their size
and coverage, and the generalizability of the solutions.
We perform an empirical analysis, and propose a detailed plan for further investigation
of the concept of representativeness, or of other representativities
The use of data-mining for the automatic formation of tactics
This paper discusses the usse of data-mining for the automatic formation of tactics. It was presented at the Workshop on Computer-Supported Mathematical Theory Development held at IJCAR in 2004. The aim of this project is to evaluate the applicability of data-mining techniques to the automatic formation of tactics from large corpuses of proofs. We data-mine information from large proof corpuses to find commonly occurring patterns. These patterns are then evolved into tactics using genetic programming techniques
Neural network controller against environment: A coevolutive approach to generalize robot navigation behavior
In this paper, a new coevolutive method, called Uniform Coevolution, is introduced to learn weights of a neural network controller in autonomous robots. An evolutionary strategy is used to learn high-performance reactive behavior for navigation and collisions avoidance. The introduction of coevolutive over evolutionary strategies allows evolving the environment, to learn a general behavior able to solve the problem in different environments. Using a traditional evolutionary strategy method, without coevolution, the learning process obtains a specialized behavior. All the behaviors obtained, with/without coevolution have been tested in a set of environments and the capability of generalization is shown for each learned behavior. A simulator based on a mini-robot Khepera has been used to learn each behavior. The results show that Uniform Coevolution obtains better generalized solutions to examples-based problems.Publicad
Using rule extraction to improve the comprehensibility of predictive models.
Whereas newer machine learning techniques, like artifficial neural net-works and support vector machines, have shown superior performance in various benchmarking studies, the application of these techniques remains largely restricted to research environments. A more widespread adoption of these techniques is foiled by their lack of explanation capability which is required in some application areas, like medical diagnosis or credit scoring. To overcome this restriction, various algorithms have been proposed to extract a meaningful description of the underlying `blackbox' models. These algorithms' dual goal is to mimic the behavior of the black box as closely as possible while at the same time they have to ensure that the extracted description is maximally comprehensible. In this research report, we first develop a formal definition of`rule extraction and comment on the inherent trade-off between accuracy and comprehensibility. Afterwards, we develop a taxonomy by which rule extraction algorithms can be classiffied and discuss some criteria by which these algorithms can be evaluated. Finally, an in-depth review of the most important algorithms is given.This report is concluded by pointing out some general shortcomings of existing techniques and opportunities for future research.Models; Model; Algorithms; Criteria; Opportunities; Research; Learning; Neural networks; Networks; Performance; Benchmarking; Studies; Area; Credit; Credit scoring; Behavior; Time;
- …