166 research outputs found

    An empirical study on anomaly detection algorithms for extremely imbalanced datasets

    Get PDF
    Anomaly detection attempts to identify abnormal events that deviate from normality. Since such events are often rare, data related to this domain is usually imbalanced. In this paper, we compare diverse preprocessing and Machine Learning (ML) state-of-the-art algorithms that can be adopted within this anomaly detection context. These include two unsupervised learning algorithms, namely Isolation Forests (IF) and deep dense AutoEncoders (AE), and two supervised learning approaches, namely Random Forest and an Automated ML (AutoML) method. Several empirical experiments were conducted by adopting seven extremely imbalanced public domain datasets. Overall, the IF and AE unsupervised methods obtained competitive anomaly detection results, which also have the advantage of not requiring labeled data.This work has been supported by the European Regional Development Fund (FEDER) through a grant of the Operational Programme for Competitivity and Internationalization of Portugal 2020 Partnership Agreement (PRODUTECH4S&C, POCI-01-0247-FEDER-046102)

    Predicting yarn breaks in textile fabrics: a machine learning approach

    Get PDF
    In this paper, we propose a Machine Learning (ML) approach to predict faults that may occur during the production of fabrics and that often cause production downtime delays. We worked with a textile company that produces fabrics under the Industry 4.0 concept. In particular, we deal with a client customization requisite that impacts on production planning and scheduling, where there is a crucial need of limiting machine stoppage. Thus, the prediction of machine stops enables the manufacturer to react to such situation. If a specific loom is expected to have more breaks, several measures can be taken: slower loom speed, special attention by the operator, change in the used yarn, stronger sizing recipe, etc. The goal is to model three regression tasks related with the number of weft breaks, warp breaks, and yarn bursts. To reduce the modeling effort, we adopt several Automated Machine Learning (AutoML) tools (H2O, AutoGluon, AutoKeras), allowing us to compare distinct ML approaches: using a single (one model per task) and Multi-Target Regression (MTR); and using the direct output target or a logarithm transformed one. Several experiments were held by considering Internet of Things (IoT) historical data from a Portuguese textile company. Overall, the best results for the three tasks were obtained by the single-target approach with the H2O tool using logarithm transformed data, achieving an R2 of 0.73 for weft breaks. Furthermore, a Sensitivity Analysis eXplainable Artificial Intelligence (SA XAI) approach was executed over the selected H2OAutoML model, showing its potential value to extract useful explanatory knowledge for the analyzed textile domain.This work is supported by the European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project PPC4.0 - Production Planning Control 4.0; Funding Reference: POCI-01-0247-FEDER-069803]

    SMS-I: Intelligent Security for Cyber–Physical Systems

    Get PDF
    Critical infrastructures are an attractive target for attackers, mainly due to the catastrophic impact of these attacks on society. In addition, the cyber–physical nature of these infrastructures makes them more vulnerable to cyber–physical threats and makes the detection, investigation, and remediation of security attacks more difficult. Therefore, improving cyber–physical correlations, forensics investigations, and Incident response tasks is of paramount importance. This work describes the SMS-I tool that allows the improvement of these security aspects in critical infrastructures. Data from heterogeneous systems, over different time frames, are received and correlated. Both physical and logical security are unified and additional security details are analysed to find attack evidence. Different Artificial Intelligence (AI) methodologies are used to process and analyse the multi-dimensional data exploring the temporal correlation between cyber and physical Alerts and going beyond traditional techniques to detect unusual Events, and then find evidence of attacks. SMS-I’s Intelligent Dashboard supports decision makers in a deep analysis of how the breaches and the assets were explored and compromised. It assists and facilitates the security analysts using graphical dashboards and Alert classification suggestions. Therefore, they can more easily identify anomalous situations that can be related to possible Incident occurrences. Users can also explore information, with different levels of detail, including logical information and technical specifications. SMS-I also integrates with a scalable and open Security Incident Response Platform (TheHive) that enables the sharing of information about security Incidents and helps different organizations better understand threats and proactively defend their systems and networks.This research was funded by the Horizon 2020 Framework Programme under grant agreement No 832969. This output reflects the views only of the author(s), and the European Union cannot be held responsible for any use which may be made of the information contained therein. For more information on the project see: http://satie-h2020.eu/.info:eu-repo/semantics/publishedVersio

    Coupling traffic originated urban air pollution estimation with an atmospheric chemistry model

    Get PDF
    Due to increasing issues of air pollution in urban areas continuous research is being conducted to upgrade models, which can predict the distribution of pollutants and thus enable timely interventions to mitigate their negative effects. To support these efforts, traffic data from an integrated transport model was used to drive the COPERT traffic emission model and the WRF-Chem atmospheric chemistry model. With reliable macroscopic traffic data from the Budapest region, traffic state estimations were calculated for every fifteen minutes of the day using dynamic assignment with predefined and time-varying static demand matrices. Then the COPERT vehicular emission model of average speeds was applied to provide the emission factors, so that the macroscopic emissions for the traffic network could be calculated. As a next step the WRF-Chem online coupled weather and atmospheric chemistry model was adapted to estimate atmospheric dispersion of pollutants (CO, NOx, O3). The coupled models are presented in a 2-day case study with qualitative comparison of obtained results with measurements. As a result, it can be stated that combining macroscopic road traffic modeling with atmospheric models can enhance the estimation efficiency of urban air pollution

    Modelling and recognition of protein contact networks by multiple kernel learning and dissimilarity representations

    Get PDF
    Multiple kernel learning is a paradigm which employs a properly constructed chain of kernel functions able to simultaneously analyse different data or different representations of the same data. In this paper, we propose an hybrid classification system based on a linear combination of multiple kernels defined over multiple dissimilarity spaces. The core of the training procedure is the joint optimisation of kernel weights and representatives selection in the dissimilarity spaces. This equips the system with a two-fold knowledge discovery phase: by analysing the weights, it is possible to check which representations are more suitable for solving the classification problem, whereas the pivotal patterns selected as representatives can give further insights on the modelled system, possibly with the help of field-experts. The proposed classification system is tested on real proteomic data in order to predict proteins' functional role starting from their folded structure: specifically, a set of eight representations are drawn from the graph-based protein folded description. The proposed multiple kernel-based system has also been benchmarked against a clustering-based classification system also able to exploit multiple dissimilarities simultaneously. Computational results show remarkable classification capabilities and the knowledge discovery analysis is in line with current biological knowledge, suggesting the reliability of the proposed system

    Gene function finding through cross-organism ensemble learning

    Get PDF
    Background: Structured biological information about genes and proteins is a valuable resource to improve discovery and understanding of complex biological processes via machine learning algorithms. Gene Ontology (GO) controlled annotations describe, in a structured form, features and functions of genes and proteins of many organisms. However, such valuable annotations are not always reliable and sometimes are incomplete, especially for rarely studied organisms. Here, we present GeFF (Gene Function Finder), a novel cross-organism ensemble learning method able to reliably predict new GO annotations of a target organism from GO annotations of another source organism evolutionarily related and better studied. Results: Using a supervised method, GeFF predicts unknown annotations from random perturbations of existing annotations. The perturbation consists in randomly deleting a fraction of known annotations in order to produce a reduced annotation set. The key idea is to train a supervised machine learning algorithm with the reduced annotation set to predict, namely to rebuild, the original annotations. The resulting prediction model, in addition to accurately rebuilding the original known annotations for an organism from their perturbed version, also effectively predicts new unknown annotations for the organism. Moreover, the prediction model is also able to discover new unknown annotations in different target organisms without retraining.We combined our novel method with different ensemble learning approaches and compared them to each other and to an equivalent single model technique. We tested the method with five different organisms using their GO annotations: Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum. The outcomes demonstrate the effectiveness of the cross-organism ensemble approach, which can be customized with a trade-off between the desired number of predicted new annotations and their precision.A Web application to browse both input annotations used and predicted ones, choosing the ensemble prediction method to use, is publicly available at http://tiny.cc/geff/. Conclusions: Our novel cross-organism ensemble learning method provides reliable predicted novel gene annotations, i.e., functions, ranked according to an associated likelihood value. They are very valuable both to speed the annotation curation, focusing it on the prioritized new annotations predicted, and to complement known annotations available
    corecore