15 research outputs found

    Graph-Based Multi-Label Classification for WiFi Network Traffic Analysis

    Get PDF
    Network traffic analysis, and specifically anomaly and attack detection, call for sophisticated tools relying on a large number of features. Mathematical modeling is extremely difficult, given the ample variety of traffic patterns and the subtle and varied ways that malicious activity can be carried out in a network. We address this problem by exploiting data-driven modeling and computational intelligence techniques. Sequences of packets captured on the communication medium are considered, along with multi-label metadata. Graph-based modeling of the data are introduced, thus resorting to the powerful GRALG approach based on feature information granulation, identification of a representative alphabet, embedding and genetic optimization. The obtained classifier is evaluated both under accuracy and complexity for two different supervised problems and compared with state-of-the-art algorithms. We show that the proposed preprocessing strategy is able to describe higher level relations between data instances in the input domain, thus allowing the algorithms to suitably reconstruct the structure of the input domain itself. Furthermore, the considered Granular Computing approach is able to extract knowledge on multiple semantic levels, thus effectively describing anomalies as subgraphs-based symbols of the whole network graph, in a specific time interval. Interesting performances can thus be achieved in identifying network traffic patterns, in spite of the complexity of the considered traffic classes

    Complexity vs. performance in granular embedding spaces for graph classification

    Get PDF
    The most distinctive trait in structural pattern recognition in graph domain is the ability to deal with the organization and relations between the constituent entities of the pattern. Even if this can be convenient and/or necessary in many contexts, most of the state-of the art classi\ufb01cation techniques can not be deployed directly in the graph domain without \ufb01rst embedding graph patterns towards a metric space. Granular Computing is a powerful information processing paradigm that can be employed in order to drive the synthesis of automatic embedding spaces from structured domains. In this paper we investigate several classi\ufb01cation techniques starting from Granular Computing-based embedding procedures and provide a thorough overview in terms of model complexity, embedding space complexity and performances on several open-access datasets for graph classi\ufb01cation. We witness that certain classi\ufb01cation techniques perform poorly both from the point of view of complexity and learning performances as the case of non-linear SVM, suggesting that high dimensionality of the synthesized embedding space can negatively affect the effectiveness of these approaches. On the other hand, linear support vector machines, neuro-fuzzy networks and nearest neighbour classi\ufb01ers have comparable performances in terms of accuracy, with second being the most competitive in terms of structural complexity and the latter being the most competitive in terms of embedding space dimensionality

    Towards a general boolean function benchmark suite

    Get PDF
    Algorithms and the Foundations of Software technolog

    General Boolean function benchmark suite

    Get PDF
    Algorithms and the Foundations of Software technolog

    On Information Granulation via Data Filtering for Granular Computing-Based Pattern Recognition: A Graph Embedding Case Study

    Get PDF
    Granular Computing is a powerful information processing paradigm, particularly useful for the synthesis of pattern recognition systems in structured domains (e.g., graphs or sequences). According to this paradigm, granules of information play the pivotal role of describing the underlying (possibly complex) process, starting from the available data. Under a pattern recognition viewpoint, granules of information can be exploited for the synthesis of semantically sound embedding spaces, where common supervised or unsupervised problems can be solved via standard machine learning algorithms. In this companion paper, we follow our previous paper (Martino et al. in Algorithms 15(5):148, 2022) in the context of comparing different strategies for the automatic synthesis of information granules in the context of graph classification. These strategies mainly differ on the specific topology adopted for subgraphs considered as candidate information granules and the possibility of using or neglecting the ground-truth class labels in the granulation process and, conversely, to our previous work, we employ a filtering-based approach for the synthesis of information granules instead of a clustering-based one. Computational results on 6 open-access data sets corroborate the robustness of our filtering-based approach with respect to data stratification, if compared to a clustering-based granulation stage

    Modelling and recognition of protein contact networks by multiple kernel learning and dissimilarity representations

    Get PDF
    Multiple kernel learning is a paradigm which employs a properly constructed chain of kernel functions able to simultaneously analyse different data or different representations of the same data. In this paper, we propose an hybrid classification system based on a linear combination of multiple kernels defined over multiple dissimilarity spaces. The core of the training procedure is the joint optimisation of kernel weights and representatives selection in the dissimilarity spaces. This equips the system with a two-fold knowledge discovery phase: by analysing the weights, it is possible to check which representations are more suitable for solving the classification problem, whereas the pivotal patterns selected as representatives can give further insights on the modelled system, possibly with the help of field-experts. The proposed classification system is tested on real proteomic data in order to predict proteins' functional role starting from their folded structure: specifically, a set of eight representations are drawn from the graph-based protein folded description. The proposed multiple kernel-based system has also been benchmarked against a clustering-based classification system also able to exploit multiple dissimilarities simultaneously. Computational results show remarkable classification capabilities and the knowledge discovery analysis is in line with current biological knowledge, suggesting the reliability of the proposed system

    A multi-objective optimization approach for the synthesis of granular computing-based classification systems in the graph domain

    Get PDF
    The synthesis of a pattern recognition system usually aims at the optimization of a given performance index. However, in many real-world scenarios, there exist other desired facets to take into account. In this regard, multi-objective optimization acts as the main tool for the optimization of different (and possibly conflicting) objective functions in order to seek for potential trade-offs among them. In this paper, we propose a three-objective optimization problem for the synthesis of a granular computing-based pattern recognition system in the graph domain. The core pattern recognition engine searches for suitable information granules (i.e., recurrent and/or meaningful subgraphs from the training data) on the top of which the graph embedding procedure towards the Euclidean space is performed. In the latter, any classification system can be employed. The optimization problem aims at jointly optimizing the performance of the classifier, the number of information granules and the structural complexity of the classification model. Furthermore, we address the problem of selecting a suitable number of solutions from the resulting Pareto Fronts in order to compose an ensemble of classifiers to be tested on previously unseen data. To perform such selection, we employed a multi-criteria decision making routine by analyzing different case studies that differ on how much each objective function weights in the ranking process. Results on five open-access datasets of fully labeled graphs show that exploiting the ensemble is effective (especially when the structural complexity of the model plays a minor role in the decision making process) if compared against the baseline solution that solely aims at maximizing the performances

    A Performance Study of Multiobjective Particle Swarm Optimization Algorithms for Market Timing

    Get PDF
    Market timing is the issue of deciding when to buy or sell a given asset on a financial market. As one of the core issues of algorithmic trading systems, designers of such systems have turned to computational intelligence methods to aid them in this task. In our previous work, we introduced a number of Particle Swarm Optimization (PSO) algorithms to compose strategies for market timing using a novel training and testing methodology that reduced the likelihood of overfitting and tackled market timing as a multiobjective optimization problem. In this paper, we provide a detailed analysis of these multiobjective PSO algorithms and address two limitations in the results presented previously. The first limitation is that the PSO algorithms have not been compared to well-known algorithms or market timing techniques. This is addressed by comparing the results obtained against NSGA-II and MACD, a technique commonly used in market timing strategies. The second limitation is that we have no insight regarding diversity of the Pareto sets returned by the algorithms. We address this by using RadViz to visualize the Pareto sets returned by all the algorithms, including NSGA-II and MACD. The results show that the multiobjective PSO algorithms return statistically significantly better results than NSGA-II and MACD. We also observe that the multiobjective PSOSP algorithm consistently displayed the best spread in its returned Pareto sets despite not having any explicit diversity promoting measures

    Reconsideration and extension of Cartesian genetic programming

    Get PDF
    This dissertation aims on analyzing fundamental concepts and dogmas of a graph-based genetic programming approach called Cartesian Genetic Programming (CGP) and introduces advanced genetic operators for CGP. The results of the experiments presented in this thesis lead to more knowledge about the algorithmic use of CGP and its underlying working mechanisms. CGP has been mostly used with a parametrization pattern, which has been prematurely generalized as the most efficient pattern for standard CGP and its variants. Several parametrization patterns are evaluated with more detailed and comprehensive experiments by using meta-optimization. This thesis also presents a first runtime analysis of CGP. The time complexity of a simple (1+1)-CGP algorithm is analyzed with a simple mathematical problem and a simple Boolean function problem. In the subfield of genetic operators for CGP, new recombination and mutation techniques that work on a phenotypic level are presented. The effectiveness of these operators is demonstrated on a widespread set of popular benchmark problems. Especially the role of recombination can be seen as a big open question in the field of CGP, since the lack of an effective recombination operator limits CGP to mutation-only use. Phenotypic exploration analysis is used to analyze the effects caused by the presented operators. This type of analysis also leads to new insights into the search behavior of CGP in continuous and discrete fitness spaces. Overall, the outcome of this thesis leads to a reconsideration of how CGP is effectively used and extends its adaption from Darwin's and Lamarck's theories of biological evolution
    corecore