113 research outputs found

    Anomaly Detection Methods to Improve Supply Chain Data Quality and Operations

    Get PDF
    Supply chain operations drive the planning, manufacture, and distribution of billions of semiconductors a year, spanning thousands of products across many supply chain configurations. The customizations span from wafer technology to die stacking and chip feature enablement. Data quality drives efficiency in these processes and anomalies in data can be very disruptive, and at times, consequential. Developing preventative measures that automate the detection of anomalies before they reach downstream execution systems would result in significant efficiency gain for the organization. The purpose of this research is to identify an effective, actionable, and computationally efficient approach to highlight anomalies in a sparse and highly variable supply chain data structure. This research highlights the application of ensemble unsupervised learning algorithms for anomaly detection on supply chain demand data. The outlier detection algorithms explored include Angle-Based Outlier Detection, Isolation Forest, Local Outlier Factor and K-Nearest Neighbors. The application of an ensemble technique on unconstrained forecast signal, which is traditionally a consistent demand line, demonstrated a dramatic decrease in false positives. The application of the ensemble technique to the sales-order netted demand forecast, a signal that is irregular in structure, the algorithm identifies true anomalous observations relative to historical observations across time. The research team concluded that assessing an outlier is not limited to the most recent forecast’s observations but must be considered in the context of historical demand patterns across time

    Machine-learning-based condition assessment of gas turbine: a review

    Get PDF
    Condition monitoring, diagnostics, and prognostics are key factors in today’s competitive industrial sector. Equipment digitalisation has increased the amount of available data throughout the industrial process, and the development of new and more advanced techniques has significantly improved the performance of industrial machines. This publication focuses on surveying the last decade of evolution of condition monitoring, diagnostic, and prognostic techniques using machinelearning (ML)-based models for the improvement of the operational performance of gas turbines. A comprehensive review of the literature led to a performance assessment of ML models and their applications to gas turbines, as well as a discussion of the major challenges and opportunities for the research on these kind of engines. This paper further concludes that the combination of the available information captured through the collectors and the ML techniques shows promising results in increasing the accuracy, robustness, precision, and generalisation of industrial gas turbine equipment.This research was funded by Siemens Energy.Peer ReviewedPostprint (published version

    Heath-PRIOR: An Intelligent Ensemble Architecture to Identify Risk Cases in Healthcare

    Get PDF
    Smart city environments, when applied to healthcare, improve the quality of people\u27s lives, enabling, for instance, disease prediction and treatment monitoring. In medical settings, case prioritization is of great importance, with beneficial outcomes both in terms of patient health and physicians\u27 daily work. Recommender systems are an alternative to automatically integrate the data generated in such environments with predictive models and recommend actions, content, or services. The data produced by smart devices are accurate and reliable for predictive and decision-making contexts. This study main purpose is to assist patients and doctors in the early detection of disease or prediction of postoperative worsening through constant monitoring. To achieve this objective, this study proposes an architecture for recommender systems applied to healthcare, which can prioritize emergency cases. The architecture brings an ensemble approach for prediction, which adopts multiple Machine Learning algorithms. The methodology used to carry out the study followed three steps. First, a systematic literature mapping, second, the construction and development of the architecture, and third, the evaluation through two case studies. The results demonstrated the feasibility of the proposal. The predictions are promising and adherent to the application context for accurate datasets with a low amount of noises or missing values

    Multistage feature selection methods for data classification

    Get PDF
    In data analysis process, a good decision can be made with the assistance of several sub-processes and methods. The most common processes are feature selection and classification processes. Various methods and processes have been proposed to solve many issues such as low classification accuracy, and long processing time faced by the decision-makers. The analysis process becomes more complicated especially when dealing with complex datasets that consist of large and problematic datasets. One of the solutions that can be used is by employing an effective feature selection method to reduce the data processing time, decrease the used memory space, and increase the accuracy of decisions. However, not all the existing methods are capable of dealing with these issues. The aim of this research was to assist the classifier in giving a better performance when dealing with problematic datasets by generating optimised attribute set. The proposed method comprised two stages of feature selection processes, that employed correlation-based feature selection method using a best first search algorithm (CFS-BFS) and as well as a soft set and rough set parameter selection method (SSRS). CFS-BFS is used to eliminate uncorrelated attributes in a dataset meanwhile SSRS was utilized to manage any problematic values such as uncertainty in a dataset. Several bench-marking feature selection methods such as classifier subset evaluation (CSE) and principle component analysis (PCA) and different classifiers such as support vector machine (SVM) and neural network (NN) were used to validate the obtained results. ANOVA and T-test were also conducted to verify the obtained results. The obtained averages for two experimentalworks have proven that the proposed method equally matched the performance of other benchmarking methods in terms of assisting the classifier in achieving high classification performance for complex datasets. The obtained average for another experimental work has shown that the proposed work has outperformed the other benchmarking methods. In conclusion, the proposed method is significant to be used as an alternative feature selection method and able to assist the classifiers in achieving better accuracy in the classification process especially when dealing with problematic datasets

    DEK-Forecaster: A Novel Deep Learning Model Integrated with EMD-KNN for Traffic Prediction

    Full text link
    Internet traffic volume estimation has a significant impact on the business policies of the ISP (Internet Service Provider) industry and business successions. Forecasting the internet traffic demand helps to shed light on the future traffic trend, which is often helpful for ISPs decision-making in network planning activities and investments. Besides, the capability to understand future trend contributes to managing regular and long-term operations. This study aims to predict the network traffic volume demand using deep sequence methods that incorporate Empirical Mode Decomposition (EMD) based noise reduction, Empirical rule based outlier detection, and KK-Nearest Neighbour (KNN) based outlier mitigation. In contrast to the former studies, the proposed model does not rely on a particular EMD decomposed component called Intrinsic Mode Function (IMF) for signal denoising. In our proposed traffic prediction model, we used an average of all IMFs components for signal denoising. Moreover, the abnormal data points are replaced by KK nearest data points average, and the value for KK has been optimized based on the KNN regressor prediction error measured in Root Mean Squared Error (RMSE). Finally, we selected the best time-lagged feature subset for our prediction model based on AutoRegressive Integrated Moving Average (ARIMA) and Akaike Information Criterion (AIC) value. Our experiments are conducted on real-world internet traffic datasets from industry, and the proposed method is compared with various traditional deep sequence baseline models. Our results show that the proposed EMD-KNN integrated prediction models outperform comparative models.Comment: 13 pages, 9 figure

    Cyber Security of Critical Infrastructures

    Get PDF
    Critical infrastructures are vital assets for public safety, economic welfare, and the national security of countries. The vulnerabilities of critical infrastructures have increased with the widespread use of information technologies. As Critical National Infrastructures are becoming more vulnerable to cyber-attacks, their protection becomes a significant issue for organizations as well as nations. The risks to continued operations, from failing to upgrade aging infrastructure or not meeting mandated regulatory regimes, are considered highly significant, given the demonstrable impact of such circumstances. Due to the rapid increase of sophisticated cyber threats targeting critical infrastructures with significant destructive effects, the cybersecurity of critical infrastructures has become an agenda item for academics, practitioners, and policy makers. A holistic view which covers technical, policy, human, and behavioural aspects is essential to handle cyber security of critical infrastructures effectively. Moreover, the ability to attribute crimes to criminals is a vital element of avoiding impunity in cyberspace. In this book, both research and practical aspects of cyber security considerations in critical infrastructures are presented. Aligned with the interdisciplinary nature of cyber security, authors from academia, government, and industry have contributed 13 chapters. The issues that are discussed and analysed include cybersecurity training, maturity assessment frameworks, malware analysis techniques, ransomware attacks, security solutions for industrial control systems, and privacy preservation methods

    Review of automated time series forecasting pipelines

    Get PDF
    Time series forecasting is fundamental for various use cases in different domains such as energy systems and economics. Creating a forecasting model for a specific use case requires an iterative and complex design process. The typical design process includes the five sections (1) data pre-processing, (2) feature engineering, (3) hyperparameter optimization, (4) forecasting method selection, and (5) forecast ensembling, which are commonly organized in a pipeline structure. One promising approach to handle the ever-growing demand for time series forecasts is automating this design process. The present paper, thus, analyzes the existing literature on automated time series forecasting pipelines to investigate how to automate the design process of forecasting models. Thereby, we consider both Automated Machine Learning (AutoML) and automated statistical forecasting methods in a single forecasting pipeline. For this purpose, we firstly present and compare the proposed automation methods for each pipeline section. Secondly, we analyze the automation methods regarding their interaction, combination, and coverage of the five pipeline sections. For both, we discuss the literature, identify problems, give recommendations, and suggest future research. This review reveals that the majority of papers only cover two or three of the five pipeline sections. We conclude that future research has to holistically consider the automation of the forecasting pipeline to enable the large-scale application of time series forecasting

    Information Theory and Its Application in Machine Condition Monitoring

    Get PDF
    Condition monitoring of machinery is one of the most important aspects of many modern industries. With the rapid advancement of science and technology, machines are becoming increasingly complex. Moreover, an exponential increase of demand is leading an increasing requirement of machine output. As a result, in most modern industries, machines have to work for 24 hours a day. All these factors are leading to the deterioration of machine health in a higher rate than before. Breakdown of the key components of a machine such as bearing, gearbox or rollers can cause a catastrophic effect both in terms of financial and human costs. In this perspective, it is important not only to detect the fault at its earliest point of inception but necessary to design the overall monitoring process, such as fault classification, fault severity assessment and remaining useful life (RUL) prediction for better planning of the maintenance schedule. Information theory is one of the pioneer contributions of modern science that has evolved into various forms and algorithms over time. Due to its ability to address the non-linearity and non-stationarity of machine health deterioration, it has become a popular choice among researchers. Information theory is an effective technique for extracting features of machines under different health conditions. In this context, this book discusses the potential applications, research results and latest developments of information theory-based condition monitoring of machineries

    Innovative Two-Stage Fuzzy Classification for Unknown Intrusion Detection

    Get PDF
    Intrusion detection is the essential part of network security in combating against illegal network access or malicious cyberattacks. Due to the constantly evolving nature of cyber attacks, it has been a technical challenge for an intrusion detection system (IDS) to effectively recognize unknown attacks or known attacks with inadequate training data. Therefore in this dissertation work, an innovative two-stage classifier is developed for accurately and efficiently detecting both unknown attacks and known attacks with insufficient or inaccurate training information. The novel two-stage fuzzy classification scheme is based on advanced machine learning techniques specifically for handling the ambiguity of traffic connections and network data. In the first stage of the classification, a fuzzy C-means (FCM) algorithm is employed to softly compute and optimize clustering centers of the training datasets with some degree of fuzziness counting for feature inaccuracy and ambiguity in the training data. Subsequently, a distance-weighted k-NN (k-nearest neighbors) classifier, combined with the Dempster-Shafer Theory (DST), is introduced to assess the belief functions and pignistic probabilities of the incoming data associated with each of known classes to further address the data uncertainty issue in the cyberattack data. In the second stage of the proposed classification algorithm, a subsequent classification scheme is implemented based on the obtained pignistic probabilities and their entropy functions to determine if the input data are normal, one of the known attacks or an unknown attack. Secondly, to strengthen the robustness to attacks, we form the three-layer hierarchy ensemble classifier based on the FCM weighted k-NN DST classifier to have more precise inferences than those made by a single classifier. The proposed intrusion detection algorithm is evaluated through the application of the KDD’99 datasets and their variants containing known and unknown attacks. The experimental results show that the new two-stage fuzzy KNN-DST classifier outperforms other well-known classifiers in intrusion detection and is especially effective in detecting unknown attacks
    • …
    corecore