3,522 research outputs found

    Improving Knowledge-Based Systems with statistical techniques, text mining, and neural networks for non-technical loss detection

    Get PDF
    Currently, power distribution companies have several problems that are related to energy losses. For example, the energy used might not be billed due to illegal manipulation or a breakdown in the customer’s measurement equipment. These types of losses are called non-technical losses (NTLs), and these losses are usually greater than the losses that are due to the distribution infrastructure (technical losses). Traditionally, a large number of studies have used data mining to detect NTLs, but to the best of our knowledge, there are no studies that involve the use of a Knowledge-Based System (KBS) that is created based on the knowledge and expertise of the inspectors. In the present study, a KBS was built that is based on the knowledge and expertise of the inspectors and that uses text mining, neural networks, and statistical techniques for the detection of NTLs. Text mining, neural networks, and statistical techniques were used to extract information from samples, and this information was translated into rules, which were joined to the rules that were generated by the knowledge of the inspectors. This system was tested with real samples that were extracted from Endesa databases. Endesa is one of the most important distribution companies in Spain, and it plays an important role in international markets in both Europe and South America, having more than 73 million customers

    Shallow and deep networks intrusion detection system : a taxonomy and survey

    Get PDF
    Intrusion detection has attracted a considerable interest from researchers and industries. The community, after many years of research, still faces the problem of building reliable and efficient IDS that are capable of handling large quantities of data, with changing patterns in real time situations. The work presented in this manuscript classifies intrusion detection systems (IDS). Moreover, a taxonomy and survey of shallow and deep networks intrusion detection systems is presented based on previous and current works. This taxonomy and survey reviews machine learning techniques and their performance in detecting anomalies. Feature selection which influences the effectiveness of machine learning (ML) IDS is discussed to explain the role of feature selection in the classification and training phase of ML IDS. Finally, a discussion of the false and true positive alarm rates is presented to help researchers model reliable and efficient machine learning based intrusion detection systems

    Machine learning based anomaly detection in release testing of 5g mobile networks

    Get PDF
    Abstract. The need of high-quality phone and internet connections, high-speed streaming ability and reliable traffic with no interruptions has increased because of the advancements the wireless communication world witnessed since the start of 5G (fifth generation) networks. The amount of data generated, not just every day but also, every second made most of the traditional approaches or statistical methods used previously for data manipulation and modeling inefficient and unscalable. Machine learning (ML) and especially, the deep learning (DL)-based models achieve the state-of-art results because of their ability to recognize complex patterns that even human experts are not able to recognize. Machine learning-based anomaly detection is one of the current hot topics in both research and industry because of its practical applications in almost all domains. Anomaly detection is mainly used for two purposes. The first purpose is to understand why this anomalous behavior happens and as a result, try to prevent it from happening by solving the root cause of the problem. The other purpose is to, as well, understand why this anomalous behavior happens and try to be ready for dealing with this behavior as it would be predictable behavior in that case, such as the increased traffic through the weekends or some specific hours of the day. In this work, we apply anomaly detection on a univariate time series target, the block error rate (BLER). We experiment with different statistical approaches, classic supervised machine learning models, unsupervised machine learning models, and deep learning models and benchmark the final results. The main goal is to select the best model that achieves the balance of the best performance and less resources and apply it in a multivariate time series context where we are able to test the relationship between the different time series features and their influence on each other. Through the final phase, the model selected will be used, integrated, and deployed as part of an automatic system that detects and flags anomalies in real-time. The simple proposed deep learning model outperforms the other models in terms of the accuracy related metrics. We also emphasize the acceptable performance of the statistical approach that enters the competition of the best model due to its low training time and required computational resources

    The Application of Deep Learning and Cloud Technologies to Data Science

    Get PDF
    Machine Learning and Cloud Computing have become a staple to businesses and educational institutions over the recent years. The two forefronts of big data solutions have garnered technology giants to race for the superior implementation of both Machine Learning and Cloud Computing. The objective of this thesis is to test and utilize AWS SageMaker in three different applications: time-series forecasting with sentiment analysis, automated Machine Learning (AutoML), and finally anomaly detection. The first study covered is a sentiment-based LSTM for stock price prediction. The LSTM was created with two methods, the first being SQL Server Data Tools, and the second being an implementation of LSTM using the Keras library. These results were then evaluated using accuracy, precision, recall, f-1 score, mean absolute error (MAE), root mean squared error (RMSE), and symmetric mean absolute percentage error (SMAPE). The results of this project were that the sentiment models all outperformed the control LSTM. The public model for Facebook on SQL Server Data Tools performed the best overall with 0.9743 accuracy and 0.9940 precision. The second study covered is an application of AWS SageMaker AutoPilot which is an AutoML platform designed to make Machine Learning more accessible to those without programming backgrounds. The methodology of this study follows the application of AWS Data Wrangler and AutoPilot from beginning of the process to completion. The results were evaluated using the metrics of: accuracy, precision, recall, and f-1 score. The best accuracy is given to the LightGBM model on the AI4I Maintenance dataset with an accuracy of 0.983. This model also scored the best on precision, recall, and F1 Score. The final study covered is an anomaly detection system for cyber security intrusion detection system data. The Intrusion Detection Systems that have been rule based are able to catch most of the cyber threats that are prevalent in network traffic; however, the copious amounts of alerts are nearly impossible for humans to keep up with. The methodology of this study follows a typical taxonomy of: data collection, data processing, model creation, and model evaluation. Both Random Cut Forest and XGBoost are implemented using AWS SageMaker. The Supervised Learning Algorithm of XGBoost was able to have the highest accuracy of all models with Model 2 giving an accuracy of 0.6183. This model also showed a Precision of 0.5902, Recall of 0.9649, and F1 Score 0.7324
    • …
    corecore