    Intrusion detection in wi-fi networks by modular and optimized ensemble of classifiers

    With the breakthrough of pervasive advanced networking infrastructures and paradigms such as 5G and IoT, cybersecurity became an active and crucial field in the last years. Furthermore, machine learning techniques are gaining more and more attention as prospective tools for mining of (possibly malicious) packet traces and automatic synthesis of network intrusion detection systems. In this work, we propose a modular ensemble of classifiers for spotting malicious attacks on Wi-Fi networks. Each classifier in the ensemble is tailored to characterize a given attack class and is individually optimized by means of a genetic algorithm wrapper with the dual goal of hyper-parameters tuning and retaining only relevant features for a specific attack class. Our approach also considers a novel false alarm management procedure thanks to a proper reliability measure formulation. The proposed system has been tested on the well-known AWID dataset, showing performances comparable with other state of the art works both in terms of accuracy and knowledge discovery capabilities. Our system is also characterized by a modular design of the classification model, allowing to include new possible attack classes in an efficient way.

    Gene function finding through cross-organism ensemble learning

    Background: Structured biological information about genes and proteins is a valuable resource to improve discovery and understanding of complex biological processes via machine learning algorithms. Gene Ontology (GO) controlled annotations describe, in a structured form, features and functions of genes and proteins of many organisms. However, such valuable annotations are not always reliable and sometimes are incomplete, especially for rarely studied organisms. Here, we present GeFF (Gene Function Finder), a novel cross-organism ensemble learning method able to reliably predict new GO annotations of a target organism from GO annotations of another source organism evolutionarily related and better studied. Results: Using a supervised method, GeFF predicts unknown annotations from random perturbations of existing annotations. The perturbation consists in randomly deleting a fraction of known annotations in order to produce a reduced annotation set. The key idea is to train a supervised machine learning algorithm with the reduced annotation set to predict, namely to rebuild, the original annotations. The resulting prediction model, in addition to accurately rebuilding the original known annotations for an organism from their perturbed version, also effectively predicts new unknown annotations for the organism. Moreover, the prediction model is also able to discover new unknown annotations in different target organisms without retraining.We combined our novel method with different ensemble learning approaches and compared them to each other and to an equivalent single model technique. We tested the method with five different organisms using their GO annotations: Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum. The outcomes demonstrate the effectiveness of the cross-organism ensemble approach, which can be customized with a trade-off between the desired number of predicted new annotations and their precision.A Web application to browse both input annotations used and predicted ones, choosing the ensemble prediction method to use, is publicly available at http://tiny.cc/geff/. Conclusions: Our novel cross-organism ensemble learning method provides reliable predicted novel gene annotations, i.e., functions, ranked according to an associated likelihood value. They are very valuable both to speed the annotation curation, focusing it on the prioritized new annotations predicted, and to complement known annotations available

    Graph-Based Multi-Label Classification for WiFi Network Traffic Analysis

    Network traffic analysis, and specifically anomaly and attack detection, call for sophisticated tools relying on a large number of features. Mathematical modeling is extremely difficult, given the ample variety of traffic patterns and the subtle and varied ways that malicious activity can be carried out in a network. We address this problem by exploiting data-driven modeling and computational intelligence techniques. Sequences of packets captured on the communication medium are considered, along with multi-label metadata. Graph-based modeling of the data are introduced, thus resorting to the powerful GRALG approach based on feature information granulation, identification of a representative alphabet, embedding and genetic optimization. The obtained classifier is evaluated both under accuracy and complexity for two different supervised problems and compared with state-of-the-art algorithms. We show that the proposed preprocessing strategy is able to describe higher level relations between data instances in the input domain, thus allowing the algorithms to suitably reconstruct the structure of the input domain itself. Furthermore, the considered Granular Computing approach is able to extract knowledge on multiple semantic levels, thus effectively describing anomalies as subgraphs-based symbols of the whole network graph, in a specific time interval. Interesting performances can thus be achieved in identifying network traffic patterns, in spite of the complexity of the considered traffic classes

    Critical review of text mining and sentiment analysis for stock market prediction

    The paper is aimed at a critical review of the literature dealing with text mining and sentiment analysis for stock market prediction. The aim of this work is to create a critical review of the literature, especially with regard to the latest findings of research articles in the selected topic strictly focused on stock markets represented by stock indices or stock titles. This requires examining and critically analyzing the methods used in the analysis of sentiment from textual data, with special regard to the possibility of generalization and transferability of research results. For this reason, an analytical approach is also used in working with the literature and a critical approach in its organization, especially for completeness, coherence, and consistency. Based on the selected criteria, 260 articles corresponding to the subject area are selected from the world databases of Web of Science and Scopus. These studies are graphically captured through bibliometric analysis. Subsequently, the selection of articles was narrowed to 49. The outputs are synthesized and the main findings and limits of the current state of research are highlighted with possible future directions of subsequent research

    Hands-on science. Rethinking STEAM education in times of uncertainty

    After over two years of major constraints imposed by the COVID pandemic, the education world is still trying to find ways to adapt in order to keep providing, in an effective way, its crucial contribution to the world’ development our societies need and expect

    Knowledge discovery techniques for transactional data model

    In this work we give solutions to two key knowledge discovery problems for the Transactional Data model: Cluster analysis and Itemset mining. By knowledge discovery in context of these two problems, we specifically mean novel and useful ways of extracting clusters and itemsets from transactional data. Transactional Data model is widely used in a variety of applications. In cluster analysis the goal is to find clusters of similar transactions in the data with the collective properties of each cluster being unique. We propose the first clustering algorithm for transactional data which uses the latest model definition. All previously proposed algorithms did not use the important utility information in the data. Our novel technique effectively solves this problem. We also propose two new cluster validation metrics based on the criterion of high utility patterns. When comparing our technique with competing algorithms, we miss much fewer high utility patterns of importance than them. Itemset mining is the problem of searching for repeating patterns of high importance in the data. We show that the current model for itemset mining leads to information loss. It ignores the presence of clusters in the data. We propose a new itemset mining model which incorporates the cluster structure information. This allows the model to make predictions for future itemsets. We show that our model makes accurate predictions successfully, by discovering 30-40% future itemsets in most experiments on two benchmark datasets with negligible inaccuracies. There are no other present itemset prediction models, so accurate prediction is an accomplishment of ours. We provide further theoretical improvements in our model by making it capable of giving predictions for specific future windows by using time series forecasting. We also perform a detailed analysis of various clustering algorithms and study the effect of the Big Data phenomenon on them. This inspired us to further refine our model based on a classification problem design. This addition allows the mining of itemsets based on maximizing a customizable objective function made of different prediction metrics. The final framework design proposed by us is the first of its kind to make itemset predictions by using the cluster structure. It is capable of adapting the predictions to a specific future window and customizes the mining process to any specified prediction criterion. We create an implementation of the framework on a Web analytics data set, and notice that it successfully makes optimal prediction configuration choices with a high accuracy of 0.895

    From industry to artworks by Lourdes Castro and Ângelo de Sousa: conservation studies on cast acrylic sheet

    Acrylic sheet, a plastic based on poly(methyl methacrylate) (PMMA), became popular within artists during the 1960s, when it was also used by two major Portuguese artists, Lourdes Castro (*1930) and Ângelo de Sousa (1938–2011). Taking as a starting point their work with this material, this thesis seeks to expand knowledge about the use of PMMA in art, its stability, and preservation. Therefore, a survey on artworks containing PMMA in Portuguese collections, as well as research on the history of production of this material in Portugal, was conducted in parallel with a material study of acrylic sheets used by those artists, and sheets produced by two Portuguese companies from which production processes were investigated and compared. This study included an artificial ageing experiment conducted in a solarbox (λ > 300 nm) for 8000 h, besides the characterization of the samples combining optical microscopy, colorimetry, gravimetry, micro-indentation, Raman, infrared and ultraviolet-visible spectroscopies, size exclusion chromatography, thermogravimetry, and thermo-desorption-gas chromatography/mass spectrometry. For the survey, 137 artworks from 8 Portuguese art collections were considered, providing an overview of the use of this material by artists and of its condition. Artworks surveyed included paintings, sculptures, objects/reliefs, photography, installations and artist books. From the 69 authors, 48 were Portuguese and have used PMMA from the 1960s to nowadays. Most of the artworks were in good or fair condition, and the main problems observed were dust and dirt deposits, abrasion, and scratches. This research showed that PMMA sheet was produced in Portugal between 1955 and 2009. At least four companies operated during the 1960s and all except one, produced nacreous PMMA sheets from recovered monomer obtained by depolymerization of acrylic residues. The material used by Ângelo de Sousa falls within this category. Concerns about the quality of these sheets led to a comparative study between PMMA samples of different typologies and origins. Results revealed a connection between particular aspects of the production technique (polymerization conditions, organic additives, and origin of monomer) with the properties (molecular weight, hardness, thermal stability) and long-term behaviour of acrylic sheets. The pigment responsible for the nacreous effect in Ângelo de Sousa‘s acrylic sheets showed signs of instability during the photodegradation experience, and was identified as plumbonacrite, Pb5(CO3)3O(OH)2, by Raman microspectroscopy. Cleaning and polishing treatments (as the ones used by Lourdes Castro for finishing her artworks) were also investigated in terms of immediate and long-term effects in the samples that had presented higher and lower photostability in the artificial ageing experiment. Treatments‘ impact seems to be dependent of the particularities of the acrylic sheet under testing. This research highlights that not all acrylic sheets present the same stability, which may be relevant for establishing new monitoring plans and preventive conservation measures, as well as to consider when testing interventive treatments in this material.A chapa acrílica, um plástico à base de poli(metacrilato de metilo) (PMMA), tornou-se popular entre os artistas durante a década de 1960, período em que foi também utilizada por dois grandes artistas portugueses, Lourdes Castro (*1930) e Ângelo de Sousa (1938-2011). Tendo como ponto de partida o seu trabalho com este material, esta tese procura contribuir para o conhecimento sobre a utilização do PMMA na arte, sobre a sua estabilidade e sobre a sua preservação. Assim, em paralelo com um levantamento de obras de arte contendo PMMA em colecções portuguesas, bem como uma investigação sobre a história da produção deste material em Portugal, foi realizado um estudo material comparativo, que abrangeu tanto chapas acrílicas utilizadas pelos dois artistas em estudo, como chapas produzidas por duas empresas portuguesas das quais os processos de produção foram investigados. Este estudo incluiu uma experiência de envelhecimento artificial realizada numa solarbox (λ > 300 nm) durante 8000 h, e a caracterização das amostras combinando microscopia óptica, colorimetria, gravimetria, microindentação, espectroscopias Raman, de infravermelho e de ultravioleta-visível, cromatografia de exclusão molecular, termogravimetria, e espectrometria de massa por cromatografia em fase gasosa. Para o levantamento, foram consideradas 137 obras de arte de 8 colecções nacionais, permitindo obter uma visão geral acerca da forma como os artistas utilizaram este material, mas também do seu estado de conservação. As obras de arte consideradas incluem pinturas, esculturas, objectos/relevos, fotografia, instalações e livros de artista. Dos 69 autores, 48 são portugueses e utilizaram PMMA desde os anos 60 até aos dias de hoje. A maioria das obras de arte estava em boas condições ou razoáveis; os principais problemas observados foram depósitos de pó e sujidade, abrasão e riscos. Esta investigação mostrou que chapa de PMMA foi produzida em Portugal entre 1955 e 2009. Pelo menos quatro empresas operaram durante a década de 1960 e todas, excepto uma, produziram chapas de PMMA nacaradas a partir de monómero recuperado obtido por despolimerização de resíduos acrílicos. O material utilizado por Ângelo de Sousa enquadra-se nesta categoria. Preocupações sobre a qualidade destas chapas levaram a um estudo comparativo entre amostras de PMMA de diferentes tipologias e origens. Os resultados revelaram uma relação entre aspectos particulares da técnica de produção (condições de polimerização, aditivos orgânicos, e origem do monómero) com as propriedades (ex. peso molecular, dureza, estabilidade térmica) e comportamento a longo prazo das chapas acrílicas. O pigmento responsável pelo efeito nacarado nas chapas acrílicas de Ângelo de Sousa mostrou sinais de instabilidade durante a experiência da fotodegradação, e foi identificado como plumbonacrite, Pb5(CO3)3O(OH)2, através de microespectroscopia Raman. Tratamentos de limpeza e polimento (como os utilizados por Lourdes Castro para acabamento das suas obras) foram também investigados em termos de efeitos imediatos e a longo prazo nas amostras que tinham apresentado maior e menor foto-estabilidade na experiência de envelhecimento artificial. O impacto dos tratamentos parece estar dependente das particularidades da chapa acrílica em teste. Esta investigação salienta que nem todas as chapas de acrílico apresentam a mesma estabilidade, o que pode ser relevante para estabelecer novos planos de monitorização e medidas de conservação preventiva, bem como para intervenções de conservação e restauro neste material