1,797 research outputs found

    A Review of using Data Mining Techniques in Power Plants

    Get PDF
    Data mining techniques and their applications have developed rapidly during the last two decades. This paper reviews application of data mining techniques in power systems, specially in power plants, through a survey of literature between the year 2000 and 2015. Keyword indices, articlesโ€™ abstracts and conclusions were used to classify more than 86 articles about application of data mining in power plants, from many academic journals and research centers. Because this paper concerns about application of data mining in power plants; the paper started by providing a brief introduction about data mining and power systems to give the reader better vision about these two different disciplines. This paper presents a comprehensive survey of the collected articles and classifies them according to three categories: the used techniques, the problem and the application area. From this review we found that data mining techniques (classification, regression, clustering and association rules) could be used to solve many types of problems in power plants, like predicting the amount of generated power, failure prediction, failure diagnosis, failure detection and many others. Also there is no standard technique that could be used for a specific problem. Application of data mining in power plants is a rich research area and still needs more exploration

    Interpretable Machine Learning์„ ํ™œ์šฉํ•œ ๊ตฌ๊ฐ„๋‹จ์†์‹œ์Šคํ…œ ์„ค์น˜์— ๋”ฐ๋ฅธ ์ธ๋ช…ํ”ผํ•ด์‚ฌ๊ณ  ๊ฐ์†Œ ํšจ๊ณผ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ๊ฑด์„คํ™˜๊ฒฝ๊ณตํ•™๋ถ€, 2020. 8. ๊น€๋™๊ทœ.In this study, a prediction model for casualty crash occurrence was developed considering whether to install SSES and the effect of SSES installation was quantified by dividing it into direct and indirect effects through the analysis of mediation effect. Also, it was recommended what needs to be considered in selecting the candidate sites for SSES installation. For this, crash prediction model was developed by using the machine learning for binary classification based on whether or not casualty crash occurred and the effects of SSES installation were analyzed based on crashes and speed-related variables. Especially, the IML methodology was applied that considered the predictive performance as well as the interpretability of the forecast results as important. When developing the IML which consisted of black-box and interpretable model, KNN, RF, and SVM were reviewed as black-box model, and DT and BLR were reviewed as interpretable model. In the model development, the hyper-parameters that could be set in each methodology were optimized through k-fold cross validation. The SVM with a polynomial kernel trick was selected as black-box model and the BLR was selected as interpretable model to predict the probability of casualty crash occurrence. For the developed IML model, the evaluation was conducted through comparison with the typical BLR from the perspective of the PDR framework. The evaluation confirmed that the results of the IML were more excellent than the typical BLR in terms of predictive accuracy, descriptive accuracy, and relevancy from a human in the loop. Using the result of IML's model development, the effect on SSES installation were quantified based on the probability equation of casualty crash occurrence. The equation is the logistic function that consists of SSES, SOR, SV, TVL, HVR, and CR. The result of analysis confirmed that the SSES installation reduced the probability of casualty crash occurrence by about 28%. In addition, the analysis of mediation effects on the variables affected by installing SSES was conducted to quantify the direct and indirect effects on the probability of reducing the casualty crashes caused by the SSES installation. The proportion of indirect effects through reducing the ratio of exceeding the speed limit (SOR) was about 30% and the proportion of indirect effects through reduction of speed variance (SV) was not statistically significant at the 95% confidence level. Finally, the probability equation of casualty crash occurrence developed in this study was applied to the sections of Yeongdong Expressway to compare the crash risk section with the actual crash data to examine the applicability of the development model. The analysis result verified that the equation was reasonable. Therefore, it may be considered to select dangerous sites based on casualty crash and speeding firstly, and then to install SSES at the section where traffic volume (TVL), heavy vehicle ratio (HVR), and curve ratio (CR) are higher than the other sections.๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ตฌ๊ฐ„๋‹จ์†์‹œ์Šคํ…œ(Section Speed Enforcement System, SSES) ์„ค์น˜ ํšจ๊ณผ๋ฅผ ์ •๋Ÿ‰ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์ธ๋ช…ํ”ผํ•ด์‚ฌ๊ณ  ์˜ˆ์ธก๋ชจํ˜•์„ ๊ฐœ๋ฐœํ•˜๊ณ , ๋งค๊ฐœํšจ๊ณผ ๋ถ„์„์„ ํ†ตํ•ด SSES ์„ค์น˜์— ๋Œ€ํ•œ ์ง์ ‘ํšจ๊ณผ์™€ ๊ฐ„์ ‘ํšจ๊ณผ๋ฅผ ๊ตฌ๋ถ„ํ•˜์—ฌ ์ •๋Ÿ‰ํ™”ํ•˜์˜€๋‹ค. ๋˜ํ•œ, ๊ฐœ๋ฐœํ•œ ์˜ˆ์ธก๋ชจํ˜•์— ๋Œ€ํ•œ ๊ณ ์†๋„๋กœ์—์„œ์˜ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ๊ฒ€ํ† ํ•˜๊ณ , SSES ์„ค์น˜ ๋Œ€์ƒ์ง€ ์„ ์ • ์‹œ ๊ณ ๋ คํ•ด์•ผํ•  ์‚ฌํ•ญ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ชจํ˜• ๊ฐœ๋ฐœ์—๋Š” ์ธ๋ช…ํ”ผํ•ด์‚ฌ๊ณ  ๋ฐœ์ƒ ์—ฌ๋ถ€๋ฅผ ์ข…์†๋ณ€์ˆ˜๋กœ ํ•˜๋Š” ์ด์ง„๋ถ„๋ฅ˜ํ˜• ๊ธฐ๊ณ„ํ•™์Šต์„ ํ™œ์šฉํ•˜์˜€์œผ๋ฉฐ, ๊ธฐ๊ณ„ํ•™์Šต ์ค‘์—์„œ๋Š” ๋ชจํ˜•์˜ ์˜ˆ์ธก ์„ฑ๋Šฅ๊ณผ ๋”๋ถˆ์–ด ์˜ˆ์ธก ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ํ•ด์„๋ ฅ์„ ์ค‘์š”ํ•˜๊ฒŒ ๊ณ ๋ คํ•˜๋Š” ์ธํ„ฐํ”„๋ฆฌํ„ฐ๋ธ” ๋จธ์‹  ๋Ÿฌ๋‹(Interpretable Machine Learning, IML) ๋ฐฉ๋ฒ•๋ก ์„ ์ ์šฉํ•˜์˜€๋‹ค. IML์€ ๋ธ”๋ž™๋ฐ•์Šค ๋ชจ๋ธ๊ณผ ์ธํ„ฐํ”„๋ฆฌํ„ฐ๋ธ” ๋ชจ๋ธ๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋ธ”๋ž™๋ฐ•์Šค ๋ชจ๋ธ๋กœ KNN, RF ๋ฐ SVM์„, ์ธํ„ฐํ”„๋ฆฌํ„ฐ๋ธ” ๋ชจ๋ธ๋กœ DT์™€ BLR์„ ๊ฒ€ํ† ํ•˜์˜€๋‹ค. ๋ชจํ˜• ๊ฐœ๋ฐœ ์‹œ์—๋Š” ๊ฐ ๊ธฐ๋ฒ•์—์„œ ํŠœ๋‹์ด ๊ฐ€๋Šฅํ•œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋Œ€ํ•˜์—ฌ ๊ต์ฐจ๊ฒ€์ฆ ๊ณผ์ •์„ ๊ฑฐ์ณ ์ตœ์ ํ™”ํ•˜์˜€๋‹ค. ๋ธ”๋ž™๋ฐ•์Šค ๋ชจ๋ธ์€ ํด๋ฆฌ๋…ธ๋ฏธ์–ผ ์ปค๋„ ํŠธ๋ฆญ์„ ํ™œ์šฉํ•œ SVM์„, ์ธํ„ฐํ”„๋ฆฌํ„ฐ๋ธ” ๋ชจ๋ธ์€ BLR์„ ์ ์šฉํ•˜์—ฌ ์ธ๋ช…ํ”ผํ•ด์‚ฌ๊ณ  ๋ฐœ์ƒ ํ™•๋ฅ ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ชจํ˜•์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ๊ฐœ๋ฐœ๋œ IML ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ๋Š” PDR(Predictive accuracy, Descriptive accuracy and Relevancy) ํ”„๋ ˆ์ž„์›Œํฌ ๊ด€์ ์—์„œ (typical) BLR ๋ชจ๋ธ๊ณผ ๋น„๊ต ํ‰๊ฐ€๋ฅผ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ํ‰๊ฐ€ ๊ฒฐ๊ณผ ์˜ˆ์ธก ์ •ํ™•๋„, ํ•ด์„ ์ •ํ™•๋„ ๋ฐ ์ธ๊ฐ„์˜ ์ดํ•ด๊ด€์ ์—์„œ์˜ ์ ํ•ฉ์„ฑ ๋“ฑ์—์„œ ๋ชจ๋‘ IML ๋ชจ๋ธ์ด ์šฐ์ˆ˜ํ•จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋˜ํ•œ, ๋ณธ ์—ฐ๊ตฌ์—์„œ ๊ฐœ๋ฐœ๋œ IML ๋ชจ๋ธ ๊ธฐ๋ฐ˜์˜ ์ธ๋ช…ํ”ผํ•ด์‚ฌ๊ณ  ๋ฐœ์ƒ ํ™•๋ฅ ์‹์€ SSES, SOR, SV, TVL, HVR ๋ฐ CR์˜ ๋…๋ฆฝ๋ณ€์ˆ˜๋กœ ๊ตฌ์„ฑ๋˜์—ˆ์œผ๋ฉฐ, ์ด ํ™•๋ฅ ์‹์„ ๊ธฐ๋ฐ˜์œผ๋กœ SSES ์„ค์น˜์— ๋Œ€ํ•œ ํšจ๊ณผ๋ฅผ ์ •๋Ÿ‰ํ™”ํ•˜์˜€๋‹ค. ์ •๋Ÿ‰ํ™” ๋ถ„์„ ๊ฒฐ๊ณผ, SSES ์„ค์น˜๋กœ ์ธํ•ด ์•ฝ 28% ์ •๋„์˜ ์ธ๋ช…ํ”ผํ•ด์‚ฌ๊ณ  ๋ฐœ์ƒ ํ™•๋ฅ ์ด ๊ฐ์†Œํ•จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋˜ํ•œ, ๋ชจํ˜• ๊ฐœ๋ฐœ์— ํ™œ์šฉ๋œ ๋ณ€์ˆ˜ ์ค‘ SSES ์„ค์น˜๋กœ ์ธํ•ด ์˜ํ–ฅ์„ ๋ฐ›๋Š” ๋ณ€์ˆ˜๋“ค(SOR ๋ฐ SV)์— ๋Œ€ํ•œ ๋งค๊ฐœํšจ๊ณผ ๋ถ„์„์„ ํ†ตํ•ด SSES ์„ค์น˜๋กœ ์ธํ•œ ์ธ๋ช…ํ”ผํ•ด์‚ฌ๊ณ  ๊ฐ์†Œ ํ™•๋ฅ ์„ ์ง์ ‘ํšจ๊ณผ์™€ ๊ฐ„์ ‘ํšจ๊ณผ๋ฅผ ๊ตฌ๋ถ„ํ•˜์—ฌ ์ œ์‹œํ•˜์˜€๋‹ค. ๋ถ„์„ ๊ฒฐ๊ณผ, SSES์™€ ์ œํ•œ์†๋„ ์ดˆ๊ณผ๋น„์œจ(SOR)์˜ ๊ด€๊ณ„์—์„œ ์žˆ์–ด์„œ๋Š” ์•ฝ 30%๊ฐ€ ๊ฐ„์ ‘ํšจ๊ณผ์ด๊ณ , SSES์™€ ์†๋„๋ถ„์‚ฐ(SV)์˜ ๊ด€๊ณ„์— ์žˆ์–ด์„œ๋Š” ๋งค๊ฐœํšจ๊ณผ๊ฐ€ ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜ํ•˜์ง€ ์•Š์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ์˜๋™๊ณ ์†๋„๋กœ๋ฅผ ๋Œ€์ƒ์œผ๋กœ ์ธ๋ช…ํ”ผํ•ด์‚ฌ๊ณ  ๋ฐœ์ƒ ํ™•๋ฅ ์‹ ๊ธฐ๋ฐ˜์˜ ์˜ˆ์ธก ์œ„ํ—˜๊ตฌ๊ฐ„๊ณผ ์‹ค์ œ ์ธ๋ช…์‚ฌ๊ณ  ๋‹ค๋ฐœ ๊ตฌ๊ฐ„์— ๋Œ€ํ•œ ๋น„๊ต ๋ถ„์„์„ ํ†ตํ•ด ์—ฐ๊ตฌ ๊ฒฐ๊ณผ์˜ ํ™œ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋˜ํ•œ, SSES ์„ค์น˜ ๋Œ€์ƒ์ง€ ์„ ์ • ์‹œ์—๋Š” ์‚ฌ๊ณ  ๋ฐ ์†๋„ ๋ถ„์„์„ ํ†ตํ•œ ์œ„ํ—˜๊ตฌ๊ฐ„์„ ์„ ๋ณ„ํ•œ ํ›„ ๊ตํ†ต๋Ÿ‰(TVL)์ด ๋งŽ์€ ๊ณณ, ํ†ต๊ณผ์ฐจ๋Ÿ‰ ์ค‘ ์ค‘์ฐจ๋Ÿ‰ ๋น„์œจ(HVR)์ด ๋†’์€ ๊ณณ ๋ฐ ๊ตฌ๊ฐ„ ๋‚ด ๊ณก์„ ๋น„์œจ(CR)์ด ๋†’์€ ๊ณณ์„ ์šฐ์„ ์ ์œผ๋กœ ๊ฒ€ํ† ํ•˜๋Š” ๊ฒƒ์„ ์ œ์•ˆํ•˜์˜€๋‹ค.1. Introduction 1 1.1. Background of research 1 1.2. Objective of research 4 1.3. Research Flow 6 2. Literature Review 11 2.1. Research related to SSES 11 2.1.1. Effectiveness of SSES 11 2.1.2. Installation criteria of SSES 15 2.2. Machine learning about transportation 17 2.2.1. Machine learning algorithm 17 2.2.2. Machine learning algorithm about transportation 19 2.3. Crash prediction model 23 2.3.1. Frequency of crashes 23 2.3.2. Severity of crash 26 2.4. Interpretable Machine Learning (IML) 31 2.4.1. Introduction 31 2.4.2. Application of IML 33 3. Model Specification 37 3.1. Analysis of SSES effectiveness 37 3.1.1. Crashes analysis 37 3.1.2. Speed analysis 39 3.2. Data collection & pre-analysis 40 3.2.1. Data collection 40 3.2.2. Basic statistics of variables 42 3.3. Response variable selection 50 3.4. Model selection 52 3.4.1. Binary classification 52 3.4.2. Accuracy vs. Interpretability 53 3.4.3. Overview of IML 54 3.4.4. Process of model specification 57 4. Model development 59 4.1. Black-box and interpretable model 59 4.1.1. Consists of IML 59 4.1.2. Black-box model 60 4.1.3. Interpretable model 68 4.2. Model development 72 4.2.1. Procedure 72 4.2.2. Measures of effectiveness 74 4.2.3. K-fold cross validation 76 4.3. Result of model development 78 4.3.1. Result of black-box model 78 4.3.2. Result of interpretable model 85 5. Evaluation & Application 91 5.1. Evaluation 91 5.1.1. The PDR framework for IML 91 5.1.2. Predictive accuracy 93 5.1.3. Descriptive accuracy 94 5.1.4. Relevancy 99 5.2. Impact of Casualty Crash Reduction 102 5.2.1. Quantification of the effectiveness 102 5.2.2. Mediation effect analysis 106 5.3. Application for the Korean expressway 118 6. Conclusion 121 6.1. Summary and Findings 121 6.2. Further Research 125Docto

    MODEL PROGNOZOWANIA SYSTEMร“W GRZEWCZYCH BUDYNKร“W UลปYTECZNOลšCI PUBLICZNEJ: PORร“WNANIE METODY SUPPORT VECTOR MACHINE I RANDOM FOREST

    Get PDF
    Data analysis and predicting play an important role in managing heat-supplying systems. Applying the models of predicting the systemsโ€™ parameters is possible for qualitative management, accepting appropriate decisions relating control that will be aimed at increasing energy efficiency and decreasing the amount of the consumed power source, diagnosing and defining non-typical processes in the functioning of the systems. The article deals with comparing two methods of ma-chine learning: random forest (RF) and support vector machine (SVM) for predicting the temperature of the heat-carrying agent in the heating system based on the data of electronic weather-dependent controller. The authors use the following parameters to compare the models: accuracy, source cost and the opportunity to interpret the results and non-obvious interrelations. The time spent for defining the optimal hyperparameters and conducting the SVM model training is deter-mined to exceed significantly the data of the RF parameter despite the close meanings of the root mean square error (RMSE). The change from 15-min data to once-a-minute ones is done to improve the RF model accuracy. RMSE of the RF model on the test data equals 0.41ยฐะก. The article studies the importance of the contribution of variables to the prediction accuracy.Analiza danych i prognozowanie odgrywajฤ… waลผnฤ… rolฤ™ w zarzฤ…dzaniu systemami zaopatrzenia w ciepล‚o. Wykorzystanie modeli do przewidywania parametrรณw systemu jest moลผliwe do zarzฤ…dzania jakoล›ciฤ…, podejmowania odpowiednich decyzji sterujฤ…cych, ktรณre bฤ™dฤ… miaล‚y na celu poprawฤ™ efektywnoล›ci energetycznej i zmniejszenie iloล›ci zuลผywanego ลบrรณdล‚a energii elektrycznej, diagnozowania i wykrywania nietypowych procesรณw w funkcjonowaniu systemu. W artykule porรณwnano dwie metody uczenia maszynowego: Random Forest (RF) i Support Vector Machine (SVM) do przewidywania temperatury czynnika grzewczego w systemie grzewczym na podstawie danych elektronicznego regulatora pogodowego. Do porรณwnania modeli autorzy wykorzystujฤ… nastฤ™pujฤ…ce parametry: dokล‚adnoล›ฤ‡, koszt poczฤ…tkowy oraz moลผliwoล›ฤ‡ interpretacji wynikรณw i nieoczywistych zaleลผnoล›ci. Ustalono, ลผe czas poล›wiฤ™cony na wyznaczenie optymalnych hiperparametrรณw i wytrenowanie modelu SVM znacznie przekracza dane parametru RF, pomimo zbliลผonych wartoล›ci bล‚ฤ™du ล›redniokwadratowego (RMSE). Zmiana z danych 15-minutowych na dane raz na minutฤ™ zostaล‚a dokonana w celu poprawy dokล‚adnoล›ci modelu RF. RMSE modelu RF z danych testowych wynosi 0,41ยฐC. W pracy zbadano znaczenie wkล‚adu zmiennych w dokล‚adnoล›ฤ‡ prognozy

    Neural malware detection

    Get PDF
    At the heart of todayโ€™s malware problem lies theoretically infinite diversity created by metamorphism. The majority of conventional machine learning techniques tackle the problem with the assumptions that a sufficiently large number of training samples exist and that the training set is independent and identically distributed. However, the lack of semantic features combined with the models under these wrong assumptions result largely in overfitting with many false positives against real world samples, resulting in systems being left vulnerable to various adversarial attacks. A key observation is that modern malware authors write a script that automatically generates an arbitrarily large number of diverse samples that share similar characteristics in program logic, which is a very cost-effective way to evade detection with minimum effort. Given that many malware campaigns follow this paradigm of economic malware manufacturing model, the samples within a campaign are likely to share coherent semantic characteristics. This opens up a possibility of one-to-many detection. Therefore, it is crucial to capture this non-linear metamorphic pattern unique to the campaign in order to detect these seemingly diverse but identically rooted variants. To address these issues, this dissertation proposes novel deep learning models, including generative static malware outbreak detection model, generative dynamic malware detection model using spatio-temporal isomorphic dynamic features, and instruction cognitive malware detection. A comparative study on metamorphic threats is also conducted as part of the thesis. Generative adversarial autoencoder (AAE) over convolutional network with global average pooling is introduced as a fundamental deep learning framework for malware detection, which captures highly complex non-linear metamorphism through translation invariancy and local variation insensitivity. Generative Adversarial Network (GAN) used as a part of the framework enables oneshot training where semantically isomorphic malware campaigns are identified by a single malware instance sampled from the very initial outbreak. This is a major innovation because, to the best of our knowledge, no approach has been found to this challenging training objective against the malware distribution that consists of a large number of very sparse groups artificially driven by arms race between attackers and defenders. In addition, we propose a novel method that extracts instruction cognitive representation from uninterpreted raw binary executables, which can be used for oneto- many malware detection via one-shot training against frequency spectrum of the Transformerโ€™s encoded latent representation. The method works regardless of the presence of diverse malware variations while remaining resilient to adversarial attacks that mostly use random perturbation against raw binaries. Comprehensive performance analyses including mathematical formulations and experimental evaluations are provided, with the proposed deep learning framework for malware detection exhibiting a superior performance over conventional machine learning methods. The methods proposed in this thesis are applicable to a variety of threat environments here artificially formed sparse distributions arise at the cyber battle fronts.Doctor of Philosoph
    • โ€ฆ
    corecore