Search CORE

1,797 research outputs found

A Review of using Data Mining Techniques in Power Plants

Author: Salim Naomie
Publication venue: Sudan University of Science and Technology
Publication date: 14/11/2017
Field of study

Data mining techniques and their applications have developed rapidly during the last two decades. This paper reviews application of data mining techniques in power systems, specially in power plants, through a survey of literature between the year 2000 and 2015. Keyword indices, articles’ abstracts and conclusions were used to classify more than 86 articles about application of data mining in power plants, from many academic journals and research centers. Because this paper concerns about application of data mining in power plants; the paper started by providing a brief introduction about data mining and power systems to give the reader better vision about these two different disciplines. This paper presents a comprehensive survey of the collected articles and classifies them according to three categories: the used techniques, the problem and the application area. From this review we found that data mining techniques (classification, regression, clustering and association rules) could be used to solve many types of problems in power plants, like predicting the amount of generated power, failure prediction, failure diagnosis, failure detection and many others. Also there is no standard technique that could be used for a specific problem. Application of data mining in power plants is a rich research area and still needs more exploration

SUST Journal Systems (Sudan Univ. of Science and Technology)

Interpretable Machine Learning을 활용한 구간단속시스템 설치에 따른 인명피해사고 감소 효과 연구

Author: 홍경식
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 건설환경공학부, 2020. 8. 김동규.In this study, a prediction model for casualty crash occurrence was developed considering whether to install SSES and the effect of SSES installation was quantified by dividing it into direct and indirect effects through the analysis of mediation effect. Also, it was recommended what needs to be considered in selecting the candidate sites for SSES installation. For this, crash prediction model was developed by using the machine learning for binary classification based on whether or not casualty crash occurred and the effects of SSES installation were analyzed based on crashes and speed-related variables. Especially, the IML methodology was applied that considered the predictive performance as well as the interpretability of the forecast results as important. When developing the IML which consisted of black-box and interpretable model, KNN, RF, and SVM were reviewed as black-box model, and DT and BLR were reviewed as interpretable model. In the model development, the hyper-parameters that could be set in each methodology were optimized through k-fold cross validation. The SVM with a polynomial kernel trick was selected as black-box model and the BLR was selected as interpretable model to predict the probability of casualty crash occurrence. For the developed IML model, the evaluation was conducted through comparison with the typical BLR from the perspective of the PDR framework. The evaluation confirmed that the results of the IML were more excellent than the typical BLR in terms of predictive accuracy, descriptive accuracy, and relevancy from a human in the loop. Using the result of IML's model development, the effect on SSES installation were quantified based on the probability equation of casualty crash occurrence. The equation is the logistic function that consists of SSES, SOR, SV, TVL, HVR, and CR. The result of analysis confirmed that the SSES installation reduced the probability of casualty crash occurrence by about 28%. In addition, the analysis of mediation effects on the variables affected by installing SSES was conducted to quantify the direct and indirect effects on the probability of reducing the casualty crashes caused by the SSES installation. The proportion of indirect effects through reducing the ratio of exceeding the speed limit (SOR) was about 30% and the proportion of indirect effects through reduction of speed variance (SV) was not statistically significant at the 95% confidence level. Finally, the probability equation of casualty crash occurrence developed in this study was applied to the sections of Yeongdong Expressway to compare the crash risk section with the actual crash data to examine the applicability of the development model. The analysis result verified that the equation was reasonable. Therefore, it may be considered to select dangerous sites based on casualty crash and speeding firstly, and then to install SSES at the section where traffic volume (TVL), heavy vehicle ratio (HVR), and curve ratio (CR) are higher than the other sections.본 연구에서는 구간단속시스템(Section Speed Enforcement System, SSES) 설치 효과를 정량화하기 위해 인명피해사고 예측모형을 개발하고, 매개효과 분석을 통해 SSES 설치에 대한 직접효과와 간접효과를 구분하여 정량화하였다. 또한, 개발한 예측모형에 대한 고속도로에서의 적용 가능성을 검토하고, SSES 설치 대상지 선정 시 고려해야할 사항을 제안하였다. 모형 개발에는 인명피해사고 발생 여부를 종속변수로 하는 이진분류형 기계학습을 활용하였으며, 기계학습 중에서는 모형의 예측 성능과 더불어 예측 결과에 대한 해석력을 중요하게 고려하는 인터프리터블 머신 러닝(Interpretable Machine Learning, IML) 방법론을 적용하였다. IML은 블랙박스 모델과 인터프리터블 모델로 구성되며, 본 연구에서는 블랙박스 모델로 KNN, RF 및 SVM을, 인터프리터블 모델로 DT와 BLR을 검토하였다. 모형 개발 시에는 각 기법에서 튜닝이 가능한 하이퍼 파라미터에 대하여 교차검증 과정을 거쳐 최적화하였다. 블랙박스 모델은 폴리노미얼 커널 트릭을 활용한 SVM을, 인터프리터블 모델은 BLR을 적용하여 인명피해사고 발생 확률을 예측하는 모형을 개발하였다. 개발된 IML 모델에 대해서는 PDR(Predictive accuracy, Descriptive accuracy and Relevancy) 프레임워크 관점에서 (typical) BLR 모델과 비교 평가를 진행하였다. 평가 결과 예측 정확도, 해석 정확도 및 인간의 이해관점에서의 적합성 등에서 모두 IML 모델이 우수함을 확인하였다. 또한, 본 연구에서 개발된 IML 모델 기반의 인명피해사고 발생 확률식은 SSES, SOR, SV, TVL, HVR 및 CR의 독립변수로 구성되었으며, 이 확률식을 기반으로 SSES 설치에 대한 효과를 정량화하였다. 정량화 분석 결과, SSES 설치로 인해 약 28% 정도의 인명피해사고 발생 확률이 감소함을 확인할 수 있었다. 또한, 모형 개발에 활용된 변수 중 SSES 설치로 인해 영향을 받는 변수들(SOR 및 SV)에 대한 매개효과 분석을 통해 SSES 설치로 인한 인명피해사고 감소 확률을 직접효과와 간접효과를 구분하여 제시하였다. 분석 결과, SSES와 제한속도 초과비율(SOR)의 관계에서 있어서는 약 30%가 간접효과이고, SSES와 속도분산(SV)의 관계에 있어서는 매개효과가 통계적으로 유의하지 않음을 확인할 수 있었다. 마지막으로 영동고속도로를 대상으로 인명피해사고 발생 확률식 기반의 예측 위험구간과 실제 인명사고 다발 구간에 대한 비교 분석을 통해 연구 결과의 활용 가능성을 확인하였다. 또한, SSES 설치 대상지 선정 시에는 사고 및 속도 분석을 통한 위험구간을 선별한 후 교통량(TVL)이 많은 곳, 통과차량 중 중차량 비율(HVR)이 높은 곳 및 구간 내 곡선비율(CR)이 높은 곳을 우선적으로 검토하는 것을 제안하였다.1. Introduction 1 1.1. Background of research 1 1.2. Objective of research 4 1.3. Research Flow 6 2. Literature Review 11 2.1. Research related to SSES 11 2.1.1. Effectiveness of SSES 11 2.1.2. Installation criteria of SSES 15 2.2. Machine learning about transportation 17 2.2.1. Machine learning algorithm 17 2.2.2. Machine learning algorithm about transportation 19 2.3. Crash prediction model 23 2.3.1. Frequency of crashes 23 2.3.2. Severity of crash 26 2.4. Interpretable Machine Learning (IML) 31 2.4.1. Introduction 31 2.4.2. Application of IML 33 3. Model Specification 37 3.1. Analysis of SSES effectiveness 37 3.1.1. Crashes analysis 37 3.1.2. Speed analysis 39 3.2. Data collection & pre-analysis 40 3.2.1. Data collection 40 3.2.2. Basic statistics of variables 42 3.3. Response variable selection 50 3.4. Model selection 52 3.4.1. Binary classification 52 3.4.2. Accuracy vs. Interpretability 53 3.4.3. Overview of IML 54 3.4.4. Process of model specification 57 4. Model development 59 4.1. Black-box and interpretable model 59 4.1.1. Consists of IML 59 4.1.2. Black-box model 60 4.1.3. Interpretable model 68 4.2. Model development 72 4.2.1. Procedure 72 4.2.2. Measures of effectiveness 74 4.2.3. K-fold cross validation 76 4.3. Result of model development 78 4.3.1. Result of black-box model 78 4.3.2. Result of interpretable model 85 5. Evaluation & Application 91 5.1. Evaluation 91 5.1.1. The PDR framework for IML 91 5.1.2. Predictive accuracy 93 5.1.3. Descriptive accuracy 94 5.1.4. Relevancy 99 5.2. Impact of Casualty Crash Reduction 102 5.2.1. Quantification of the effectiveness 102 5.2.2. Mediation effect analysis 106 5.3. Application for the Korean expressway 118 6. Conclusion 121 6.1. Summary and Findings 121 6.2. Further Research 125Docto

SNU Open Repository and Archive

MODEL PROGNOZOWANIA SYSTEMÓW GRZEWCZYCH BUDYNKÓW UŻYTECZNOŚCI PUBLICZNEJ: PORÓWNANIE METODY SUPPORT VECTOR MACHINE I RANDOM FOREST

Author: Amirgaliyev Yedilkhan
Chencheva Olga
Chenchevoi Vladimir
Kalizhanova Aliya
Kovalenko Alexandr
Kushch-Zhyrko Mykhailo
Perekrest Andrii
Publication venue: 'Politechnika Lubelska'
Publication date: 01/01/2022
Field of study

Data analysis and predicting play an important role in managing heat-supplying systems. Applying the models of predicting the systems’ parameters is possible for qualitative management, accepting appropriate decisions relating control that will be aimed at increasing energy efficiency and decreasing the amount of the consumed power source, diagnosing and defining non-typical processes in the functioning of the systems. The article deals with comparing two methods of ma-chine learning: random forest (RF) and support vector machine (SVM) for predicting the temperature of the heat-carrying agent in the heating system based on the data of electronic weather-dependent controller. The authors use the following parameters to compare the models: accuracy, source cost and the opportunity to interpret the results and non-obvious interrelations. The time spent for defining the optimal hyperparameters and conducting the SVM model training is deter-mined to exceed significantly the data of the RF parameter despite the close meanings of the root mean square error (RMSE). The change from 15-min data to once-a-minute ones is done to improve the RF model accuracy. RMSE of the RF model on the test data equals 0.41°С. The article studies the importance of the contribution of variables to the prediction accuracy.Analiza danych i prognozowanie odgrywają ważną rolę w zarządzaniu systemami zaopatrzenia w ciepło. Wykorzystanie modeli do przewidywania parametrów systemu jest możliwe do zarządzania jakością, podejmowania odpowiednich decyzji sterujących, które będą miały na celu poprawę efektywności energetycznej i zmniejszenie ilości zużywanego źródła energii elektrycznej, diagnozowania i wykrywania nietypowych procesów w funkcjonowaniu systemu. W artykule porównano dwie metody uczenia maszynowego: Random Forest (RF) i Support Vector Machine (SVM) do przewidywania temperatury czynnika grzewczego w systemie grzewczym na podstawie danych elektronicznego regulatora pogodowego. Do porównania modeli autorzy wykorzystują następujące parametry: dokładność, koszt początkowy oraz możliwość interpretacji wyników i nieoczywistych zależności. Ustalono, że czas poświęcony na wyznaczenie optymalnych hiperparametrów i wytrenowanie modelu SVM znacznie przekracza dane parametru RF, pomimo zbliżonych wartości błędu średniokwadratowego (RMSE). Zmiana z danych 15-minutowych na dane raz na minutę została dokonana w celu poprawy dokładności modelu RF. RMSE modelu RF z danych testowych wynosi 0,41°C. W pracy zbadano znaczenie wkładu zmiennych w dokładność prognozy

Biblioteka Nauki - repozytorium artykuÅÃ³w

Lublin University of Technology Journals

Recommended from our members

MDEA : malware detection with evolutionary adversarial learning

Author: Wang Xiruo
Publication venue
Publication date: 20/03/2020
Field of study

Many applications have used machine learning as a tool to detect malware. These applications take in raw or processed binary data to feed neural network models to classify benign or malicious files. Even though this approach has proved effective against dynamic changes, such as encrypting, obfuscating and packing techniques, it is vulnerable to specific evasion attacks to where that small changes to the input data cause misclassification at test time. In this paper, I propose MDEA, an Adversarial Malware Detection model that combines a neural network and evolutionary optimization attack samples to make the network robust against evasion attacks. By retraining the model with the evolved malware samples, network performance improves a big margin.Computer Science

Texas ScholarWorks

Neural malware detection

Author: Park Sean
Publication venue: 'Federation University Australia'
Publication date: 01/01/2019
Field of study

At the heart of today’s malware problem lies theoretically infinite diversity created by metamorphism. The majority of conventional machine learning techniques tackle the problem with the assumptions that a sufficiently large number of training samples exist and that the training set is independent and identically distributed. However, the lack of semantic features combined with the models under these wrong assumptions result largely in overfitting with many false positives against real world samples, resulting in systems being left vulnerable to various adversarial attacks. A key observation is that modern malware authors write a script that automatically generates an arbitrarily large number of diverse samples that share similar characteristics in program logic, which is a very cost-effective way to evade detection with minimum effort. Given that many malware campaigns follow this paradigm of economic malware manufacturing model, the samples within a campaign are likely to share coherent semantic characteristics. This opens up a possibility of one-to-many detection. Therefore, it is crucial to capture this non-linear metamorphic pattern unique to the campaign in order to detect these seemingly diverse but identically rooted variants. To address these issues, this dissertation proposes novel deep learning models, including generative static malware outbreak detection model, generative dynamic malware detection model using spatio-temporal isomorphic dynamic features, and instruction cognitive malware detection. A comparative study on metamorphic threats is also conducted as part of the thesis. Generative adversarial autoencoder (AAE) over convolutional network with global average pooling is introduced as a fundamental deep learning framework for malware detection, which captures highly complex non-linear metamorphism through translation invariancy and local variation insensitivity. Generative Adversarial Network (GAN) used as a part of the framework enables oneshot training where semantically isomorphic malware campaigns are identified by a single malware instance sampled from the very initial outbreak. This is a major innovation because, to the best of our knowledge, no approach has been found to this challenging training objective against the malware distribution that consists of a large number of very sparse groups artificially driven by arms race between attackers and defenders. In addition, we propose a novel method that extracts instruction cognitive representation from uninterpreted raw binary executables, which can be used for oneto- many malware detection via one-shot training against frequency spectrum of the Transformer’s encoded latent representation. The method works regardless of the presence of diverse malware variations while remaining resilient to adversarial attacks that mostly use random perturbation against raw binaries. Comprehensive performance analyses including mathematical formulations and experimental evaluations are provided, with the proposed deep learning framework for malware detection exhibiting a superior performance over conventional machine learning methods. The methods proposed in this thesis are applicable to a variety of threat environments here artificially formed sparse distributions arise at the cyber battle fronts.Doctor of Philosoph

Federation ResearchOnline