717 research outputs found

    AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments

    Get PDF
    This report considers the application of Articial Intelligence (AI) techniques to the problem of misuse detection and misuse localisation within telecommunications environments. A broad survey of techniques is provided, that covers inter alia rule based systems, model-based systems, case based reasoning, pattern matching, clustering and feature extraction, articial neural networks, genetic algorithms, arti cial immune systems, agent based systems, data mining and a variety of hybrid approaches. The report then considers the central issue of event correlation, that is at the heart of many misuse detection and localisation systems. The notion of being able to infer misuse by the correlation of individual temporally distributed events within a multiple data stream environment is explored, and a range of techniques, covering model based approaches, `programmed' AI and machine learning paradigms. It is found that, in general, correlation is best achieved via rule based approaches, but that these suffer from a number of drawbacks, such as the difculty of developing and maintaining an appropriate knowledge base, and the lack of ability to generalise from known misuses to new unseen misuses. Two distinct approaches are evident. One attempts to encode knowledge of known misuses, typically within rules, and use this to screen events. This approach cannot generally detect misuses for which it has not been programmed, i.e. it is prone to issuing false negatives. The other attempts to `learn' the features of event patterns that constitute normal behaviour, and, by observing patterns that do not match expected behaviour, detect when a misuse has occurred. This approach is prone to issuing false positives, i.e. inferring misuse from innocent patterns of behaviour that the system was not trained to recognise. Contemporary approaches are seen to favour hybridisation, often combining detection or localisation mechanisms for both abnormal and normal behaviour, the former to capture known cases of misuse, the latter to capture unknown cases. In some systems, these mechanisms even work together to update each other to increase detection rates and lower false positive rates. It is concluded that hybridisation offers the most promising future direction, but that a rule or state based component is likely to remain, being the most natural approach to the correlation of complex events. The challenge, then, is to mitigate the weaknesses of canonical programmed systems such that learning, generalisation and adaptation are more readily facilitated

    Design of Evolutionary Methods Applied to the Learning of Bayesian Network Structures

    Get PDF
    Bayesian Network, Ahmed Rebai (Ed.), ISBN: 978-953-307-124-4, pp. 13-38

    Deep Neuroevolution: Smart City Applications

    Get PDF
    Particularmente, la contribución de esta tesis se centra en cuatro aspectos: Primero, proponemos la técnica Mean Absolute Error Random Sampling (MRS) para estimar el rendimiento de una RNN, la cual se basa en la distribución del error observado en un muestreo aleatorio. Nuestros resultados muestran que MRS es una estimación fiable y de bajo coste computacional para predecir el rendimiento de una RNN. Segundo, diseñamos un algoritmo evolutivo (RESN) que explota MRS para optimizar la arquitectura de una RNN. RESN muestra resultados competitivos a la vez que reduce significativamente el tiempo. Tercero, en el contexto de la aplicación, proponemos soluciones para problemas de movilidad, electricidad y gestión de residuos inteligente, y hemos revisado el estado del arte de la ciudad inteligente y su relación con la informática. Cuarto, hemos desarrollado la biblioteca de software Deep Learning OPTimization (DLOPT), la cual está disponible bajo la licencia GNU GPL v3. Ésta contiene la mayor parte del trabajo realizado en esta tesis.El interés por desarrollar redes neuronales artificiales ha resurgido de la mano del Aprendizaje Profundo. En términos simples, el aprendizaje profundo consiste en diseñar y entrenar una red neuronal de gran complejidad y tamaño con una inmensa cantidad de datos. Esta creciente complejidad propone nuevos desafíos, siendo de especial relevancia la optimización del diseño dado un problema. Tradicionalmente, este problema ha sido resuelto en una combinación de conocimiento experto (humano) con prueba y error. Sin embargo, conforme la complejidad aumenta, este acercamiento se vuelve ineficiente (e impracticable). Esta tesis doctoral aborda el diseño de redes neuronales recurrentes (RNN), un tipo de red neuronal profunda, desde la neuroevolución. Concretamente, se combinan técnicas de aprendizaje automático con metaheurísticas avanzadas, con el fin de proveer una solución eficaz y eficiente. Por otra parte, se aplican las técnicas desarrolladas a problemas de la ciudad inteligente

    Efficient Knowledge Extraction from Structured Data

    Get PDF
    Knowledge extraction from structured data aims for identifying valid, novel, potentially useful, and ultimately understandable patterns in the data. The core step of this process is the application of a data mining algorithm in order to produce an enumeration of particular patterns and relationships in large databases. Clustering is one of the major data mining tasks and aims at grouping the data objects into meaningful classes (clusters) such that the similarity of objects within clusters is maximized, and the similarity of objects from different clusters is minimized. In this thesis, we advance the state-of-the-art data mining algorithms for analyzing structured data types. We describe the development of innovative solutions for hierarchical data mining. The EM-based hierarchical clustering method ITCH (Information-Theoretic Cluster Hierarchies) is designed to propose solid solutions for four different challenges. (1) to guide the hierarchical clustering algorithm to identify only meaningful and valid clusters. (2) to represent each cluster content in the hierarchy by an intuitive description with e.g. a probability density function. (3) to consistently handle outliers. (4) to avoid difficult parameter settings. ITCH is built on a hierarchical variant of the information-theoretic principle of Minimum Description Length (MDL). Interpreting the hierarchical cluster structure as a statistical model of the dataset, it can be used for effective data compression by Huffman coding. Thus, the achievable compression rate induces a natural objective function for clustering, which automatically satisfies all four above mentioned goals. The genetic-based hierarchical clustering algorithm GACH (Genetic Algorithm for finding Cluster Hierarchies) overcomes the problem of getting stuck in a local optimum by a beneficial combination of genetic algorithms, information theory and model-based clustering. Besides hierarchical data mining, we also made contributions to more complex data structures, namely objects that consist of mixed type attributes and skyline objects. The algorithm INTEGRATE performs integrative mining of heterogeneous data, which is one of the major challenges in the next decade, by a unified view on numerical and categorical information in clustering. Once more, supported by the MDL principle, INTEGRATE guarantees the usability on real world data. For skyline objects we developed SkyDist, a similarity measure for comparing different skyline objects, which is therefore a first step towards performing data mining on this kind of data structure. Applied in a recommender system, for example SkyDist can be used for pointing the user to alternative car types, exhibiting a similar price/mileage behavior like in his original query. For mining graph-structured data, we developed different approaches that have the ability to detect patterns in static as well as in dynamic networks. We confirmed the practical feasibility of our novel approaches on large real-world case studies ranging from medical brain data to biological yeast networks. In the second part of this thesis, we focused on boosting the knowledge extraction process. We achieved this objective by an intelligent adoption of Graphics Processing Units (GPUs). The GPUs have evolved from simple devices for the display signal preparation into powerful coprocessors that do not only support typical computer graphics tasks but can also be used for general numeric and symbolic computations. As major advantage, GPUs provide extreme parallelism combined with a high bandwidth in memory transfer at low cost. In this thesis, we propose algorithms for computationally expensive data mining tasks like similarity search and different clustering paradigms which are designed for the highly parallel environment of a GPU, called CUDA-DClust and CUDA-k-means. We define a multi-dimensional index structure which is particularly suited to support similarity queries under the restricted programming model of a GPU. We demonstrate the superiority of our algorithms running on GPU over their conventional counterparts on CPU in terms of efficiency

    Evolutionary Computation

    Get PDF
    This book presents several recent advances on Evolutionary Computation, specially evolution-based optimization methods and hybrid algorithms for several applications, from optimization and learning to pattern recognition and bioinformatics. This book also presents new algorithms based on several analogies and metafores, where one of them is based on philosophy, specifically on the philosophy of praxis and dialectics. In this book it is also presented interesting applications on bioinformatics, specially the use of particle swarms to discover gene expression patterns in DNA microarrays. Therefore, this book features representative work on the field of evolutionary computation and applied sciences. The intended audience is graduate, undergraduate, researchers, and anyone who wishes to become familiar with the latest research work on this field

    Prescriptive formalism for constructing domain-specific evolutionary algorithms

    Get PDF
    It has been widely recognised in the computational intelligence and machine learning communities that the key to understanding the behaviour of learning algorithms is to understand what representation is employed to capture and manipulate knowledge acquired during the learning process. However, traditional evolutionary algorithms have tended to employ a fixed representation space (binary strings), in order to allow the use of standardised genetic operators. This approach leads to complications for many problem domains, as it forces a somewhat artificial mapping between the problem variables and the canonical binary representation, especially when there are dependencies between problem variables (e.g. problems naturally defined over permutations). This often obscures the relationship between genetic structure and problem features, making it difficult to understand the actions of the standard genetic operators with reference to problem-specific structures. This thesis instead advocates m..

    Geometric, Feature-based and Graph-based Approaches for the Structural Analysis of Protein Binding Sites : Novel Methods and Computational Analysis

    Get PDF
    In this thesis, protein binding sites are considered. To enable the extraction of information from the space of protein binding sites, these binding sites must be mapped onto a mathematical space. This can be done by mapping binding sites onto vectors, graphs or point clouds. To finally enable a structure on the mathematical space, a distance measure is required, which is introduced in this thesis. This distance measure eventually can be used to extract information by means of data mining techniques
    • …
    corecore