173 research outputs found

    An empirical study on anomaly detection algorithms for extremely imbalanced datasets

    Get PDF
    Anomaly detection attempts to identify abnormal events that deviate from normality. Since such events are often rare, data related to this domain is usually imbalanced. In this paper, we compare diverse preprocessing and Machine Learning (ML) state-of-the-art algorithms that can be adopted within this anomaly detection context. These include two unsupervised learning algorithms, namely Isolation Forests (IF) and deep dense AutoEncoders (AE), and two supervised learning approaches, namely Random Forest and an Automated ML (AutoML) method. Several empirical experiments were conducted by adopting seven extremely imbalanced public domain datasets. Overall, the IF and AE unsupervised methods obtained competitive anomaly detection results, which also have the advantage of not requiring labeled data.This work has been supported by the European Regional Development Fund (FEDER) through a grant of the Operational Programme for Competitivity and Internationalization of Portugal 2020 Partnership Agreement (PRODUTECH4S&C, POCI-01-0247-FEDER-046102)

    Predicting yarn breaks in textile fabrics: a machine learning approach

    Get PDF
    In this paper, we propose a Machine Learning (ML) approach to predict faults that may occur during the production of fabrics and that often cause production downtime delays. We worked with a textile company that produces fabrics under the Industry 4.0 concept. In particular, we deal with a client customization requisite that impacts on production planning and scheduling, where there is a crucial need of limiting machine stoppage. Thus, the prediction of machine stops enables the manufacturer to react to such situation. If a specific loom is expected to have more breaks, several measures can be taken: slower loom speed, special attention by the operator, change in the used yarn, stronger sizing recipe, etc. The goal is to model three regression tasks related with the number of weft breaks, warp breaks, and yarn bursts. To reduce the modeling effort, we adopt several Automated Machine Learning (AutoML) tools (H2O, AutoGluon, AutoKeras), allowing us to compare distinct ML approaches: using a single (one model per task) and Multi-Target Regression (MTR); and using the direct output target or a logarithm transformed one. Several experiments were held by considering Internet of Things (IoT) historical data from a Portuguese textile company. Overall, the best results for the three tasks were obtained by the single-target approach with the H2O tool using logarithm transformed data, achieving an R2 of 0.73 for weft breaks. Furthermore, a Sensitivity Analysis eXplainable Artificial Intelligence (SA XAI) approach was executed over the selected H2OAutoML model, showing its potential value to extract useful explanatory knowledge for the analyzed textile domain.This work is supported by the European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project PPC4.0 - Production Planning Control 4.0; Funding Reference: POCI-01-0247-FEDER-069803]

    SMS-I: Intelligent Security for Cyber–Physical Systems

    Get PDF
    Critical infrastructures are an attractive target for attackers, mainly due to the catastrophic impact of these attacks on society. In addition, the cyber–physical nature of these infrastructures makes them more vulnerable to cyber–physical threats and makes the detection, investigation, and remediation of security attacks more difficult. Therefore, improving cyber–physical correlations, forensics investigations, and Incident response tasks is of paramount importance. This work describes the SMS-I tool that allows the improvement of these security aspects in critical infrastructures. Data from heterogeneous systems, over different time frames, are received and correlated. Both physical and logical security are unified and additional security details are analysed to find attack evidence. Different Artificial Intelligence (AI) methodologies are used to process and analyse the multi-dimensional data exploring the temporal correlation between cyber and physical Alerts and going beyond traditional techniques to detect unusual Events, and then find evidence of attacks. SMS-I’s Intelligent Dashboard supports decision makers in a deep analysis of how the breaches and the assets were explored and compromised. It assists and facilitates the security analysts using graphical dashboards and Alert classification suggestions. Therefore, they can more easily identify anomalous situations that can be related to possible Incident occurrences. Users can also explore information, with different levels of detail, including logical information and technical specifications. SMS-I also integrates with a scalable and open Security Incident Response Platform (TheHive) that enables the sharing of information about security Incidents and helps different organizations better understand threats and proactively defend their systems and networks.This research was funded by the Horizon 2020 Framework Programme under grant agreement No 832969. This output reflects the views only of the author(s), and the European Union cannot be held responsible for any use which may be made of the information contained therein. For more information on the project see: http://satie-h2020.eu/.info:eu-repo/semantics/publishedVersio

    Coupling traffic originated urban air pollution estimation with an atmospheric chemistry model

    Get PDF
    Due to increasing issues of air pollution in urban areas continuous research is being conducted to upgrade models, which can predict the distribution of pollutants and thus enable timely interventions to mitigate their negative effects. To support these efforts, traffic data from an integrated transport model was used to drive the COPERT traffic emission model and the WRF-Chem atmospheric chemistry model. With reliable macroscopic traffic data from the Budapest region, traffic state estimations were calculated for every fifteen minutes of the day using dynamic assignment with predefined and time-varying static demand matrices. Then the COPERT vehicular emission model of average speeds was applied to provide the emission factors, so that the macroscopic emissions for the traffic network could be calculated. As a next step the WRF-Chem online coupled weather and atmospheric chemistry model was adapted to estimate atmospheric dispersion of pollutants (CO, NOx, O3). The coupled models are presented in a 2-day case study with qualitative comparison of obtained results with measurements. As a result, it can be stated that combining macroscopic road traffic modeling with atmospheric models can enhance the estimation efficiency of urban air pollution

    Exploring the Integration of Agent-Based Modelling, Process Mining, and Business Process Management through a Text Analytics–Based Literature Review

    Get PDF
    Agent-based modelling and business process management are two interrelated yet distinct concepts. To explore the relationship between these two fields, we conducted a systematic literature review to investigate existing methods and identify research gaps in the integration of agent-based modelling, process mining, and business process management. Our search yielded 359 research papers, which were evaluated using predefined criteria and quality measures. This resulted in a final selection of forty-two papers. Our findings reveal several research gaps, including the need for enhanced validation methods, the modelling of complex agents and environments, and the integration of process mining and business process management with emerging technologies. Existing agent-based approaches within process mining and business process management have paved the way for identifying the validation methods for performance evaluation. The addressed research gaps primarily concern validation before delving deeper into specific research topics. These include improved validation methods, modelling of complex agents and environments, and a preliminary exploration of integrating process mining and business process management with emerging technologies

    Modelling and recognition of protein contact networks by multiple kernel learning and dissimilarity representations

    Get PDF
    Multiple kernel learning is a paradigm which employs a properly constructed chain of kernel functions able to simultaneously analyse different data or different representations of the same data. In this paper, we propose an hybrid classification system based on a linear combination of multiple kernels defined over multiple dissimilarity spaces. The core of the training procedure is the joint optimisation of kernel weights and representatives selection in the dissimilarity spaces. This equips the system with a two-fold knowledge discovery phase: by analysing the weights, it is possible to check which representations are more suitable for solving the classification problem, whereas the pivotal patterns selected as representatives can give further insights on the modelled system, possibly with the help of field-experts. The proposed classification system is tested on real proteomic data in order to predict proteins' functional role starting from their folded structure: specifically, a set of eight representations are drawn from the graph-based protein folded description. The proposed multiple kernel-based system has also been benchmarked against a clustering-based classification system also able to exploit multiple dissimilarities simultaneously. Computational results show remarkable classification capabilities and the knowledge discovery analysis is in line with current biological knowledge, suggesting the reliability of the proposed system
    corecore