13 research outputs found

    Characterizing approximate-matching dependencies in formal concept analysis with pattern structures

    Get PDF
    Functional dependencies (FDs) provide valuable knowledge on the relations between attributes of a data table. A functional dependency holds when the values of an attribute can be determined by another. It has been shown that FDs can be expressed in terms of partitions of tuples that are in agreement w.r.t. the values taken by some subsets of attributes. To extend the use of FDs, several generalizations have been proposed. In this work, we study approximatematching dependencies that generalize FDs by relaxing the constraints on the attributes, i.e. agreement is based on a similarity relation rather than on equality. Such dependencies are attracting attention in the database field since they allow uncrisping the basic notion of FDs extending its application to many different fields, such as data quality, data mining, behavior analysis, data cleaning or data partition, among others. We show that these dependencies can be formalized in the framework of Formal Concept Analysis (FCA) using a previous formalization introduced for standard FDs. Our new results state that, starting from the conceptual structure of a pattern structure, and generalizing the notion of relation between tuples, approximate-matching dependencies can be characterized as implications in a pattern concept lattice. We finally show how to use basic FCA algorithms to construct a pattern concept lattice that entails these dependencies after a slight and tractable binarization of the original data.Postprint (author's final draft

    Towards the Deployment of Machine Learning Solutions in Network Traffic Classification: A Systematic Survey

    Get PDF
    International audienceTraffic analysis is a compound of strategies intended to find relationships, patterns, anomalies, and misconfigurations, among others things, in Internet traffic. In particular, traffic classification is a subgroup of strategies in this field that aims at identifying the application's name or type of Internet traffic. Nowadays, traffic classification has become a challenging task due to the rise of new technologies, such as traffic encryption and encapsulation, which decrease the performance of classical traffic classification strategies. Machine Learning gains interest as a new direction in this field, showing signs of future success, such as knowledge extraction from encrypted traffic, and more accurate Quality of Service management. Machine Learning is fast becoming a key tool to build traffic classification solutions in real network traffic scenarios; in this sense, the purpose of this investigation is to explore the elements that allow this technique to work in the traffic classification field. Therefore, a systematic review is introduced based on the steps to achieve traffic classification by using Machine Learning techniques. The main aim is to understand and to identify the procedures followed by the existing works to achieve their goals. As a result, this survey paper finds a set of trends derived from the analysis performed on this domain; in this manner, the authors expect to outline future directions for Machine Learning based traffic classification

    Nature-inspired algorithms for solving some hard numerical problems

    Get PDF
    Optimisation is a branch of mathematics that was developed to find the optimal solutions, among all the possible ones, for a given problem. Applications of optimisation techniques are currently employed in engineering, computing, and industrial problems. Therefore, optimisation is a very active research area, leading to the publication of a large number of methods to solve specific problems to its optimality. This dissertation focuses on the adaptation of two nature inspired algorithms that, based on optimisation techniques, are able to compute approximations for zeros of polynomials and roots of non-linear equations and systems of non-linear equations. Although many iterative methods for finding all the roots of a given function already exist, they usually require: (a) repeated deflations, that can lead to very inaccurate results due to the problem of accumulating rounding errors, (b) good initial approximations to the roots for the algorithm converge, or (c) the computation of first or second order derivatives, which besides being computationally intensive, it is not always possible. The drawbacks previously mentioned served as motivation for the use of Particle Swarm Optimisation (PSO) and Artificial Neural Networks (ANNs) for root-finding, since they are known, respectively, for their ability to explore high-dimensional spaces (not requiring good initial approximations) and for their capability to model complex problems. Besides that, both methods do not need repeated deflations, nor derivative information. The algorithms were described throughout this document and tested using a test suite of hard numerical problems in science and engineering. Results, in turn, were compared with several results available on the literature and with the well-known Durand–Kerner method, depicting that both algorithms are effective to solve the numerical problems considered.A Optimização é um ramo da matemática desenvolvido para encontrar as soluções óptimas, de entre todas as possíveis, para um determinado problema. Actualmente, são várias as técnicas de optimização aplicadas a problemas de engenharia, de informática e da indústria. Dada a grande panóplia de aplicações, existem inúmeros trabalhos publicados que propõem métodos para resolver, de forma óptima, problemas específicos. Esta dissertação foca-se na adaptação de dois algoritmos inspirados na natureza que, tendo como base técnicas de optimização, são capazes de calcular aproximações para zeros de polinómios e raízes de equações não lineares e sistemas de equações não lineares. Embora já existam muitos métodos iterativos para encontrar todas as raízes ou zeros de uma função, eles usualmente exigem: (a) deflações repetidas, que podem levar a resultados muito inexactos, devido ao problema da acumulação de erros de arredondamento a cada iteração; (b) boas aproximações iniciais para as raízes para o algoritmo convergir, ou (c) o cálculo de derivadas de primeira ou de segunda ordem que, além de ser computacionalmente intensivo, para muitas funções é impossível de se calcular. Estas desvantagens motivaram o uso da Optimização por Enxame de Partículas (PSO) e de Redes Neurais Artificiais (RNAs) para o cálculo de raízes. Estas técnicas são conhecidas, respectivamente, pela sua capacidade de explorar espaços de dimensão superior (não exigindo boas aproximações iniciais) e pela sua capacidade de modelar problemas complexos. Além disto, tais técnicas não necessitam de deflações repetidas, nem do cálculo de derivadas. Ao longo deste documento, os algoritmos são descritos e testados, usando um conjunto de problemas numéricos com aplicações nas ciências e na engenharia. Os resultados foram comparados com outros disponíveis na literatura e com o método de Durand–Kerner, e sugerem que ambos os algoritmos são capazes de resolver os problemas numéricos considerados

    The germinal centre artificial immune system

    Get PDF
    This thesis deals with the development and evaluation of the Germinal centre artificial immune system (GC-AIS) which is a novel artificial immune system based on advancements in the understanding of the germinal centre reaction of the immune system. The key research questions addressed in this thesis are: can an artificial immune system (AIS) be designed by taking inspiration from recent developments in immunology to tackle multi-objective optimisation problems? How can we incorporate desirable features of the immune system like diversity, parallelism and memory into this proposed AIS? How does the proposed AIS compare with other state of the art techniques in the field of multi-objective optimisation problems? How can we incorporate the learning component of the immune system into the algorithm and investigate the usefulness of memory in dynamic scenarios? The main contributions of the thesis are: • Understanding the behaviour and performance of the proposed GC-AIS on multiobjective optimisation problems and explaining its benefits and drawbacks, by comparing it with simple baseline and state of the art algorithms. • Improving the performance of GC-AIS by incorporating a popular technique from multi-objective optimisation. By overcoming its weaknesses the capability of the improved variant to compete with the state of the art algorithms is evaluated. • Answering key questions on the usefulness of incorporating memory in GC-AIS in a dynamic scenario
    corecore