18 research outputs found

    Forecasting Carbon Dioxide Emission in Thailand Using Machine Learning Techniques

    Get PDF
    Machine Learning (ML) models and the massive quantity of data accessible provide useful tools for analyzing the advancement of climate change trends and identifying major contributors. Random Forest (RF), Gradient Boosting Regression (GBR), XGBoost (XGB), Support Vector Machines (SVC), Decision Trees (DT), K-Nearest Neighbors (KNN), Principal Component Analysis (PCA), ensemble methods, and Genetic Algorithms (GA) are used in this study to predict CO2 emissions in Thailand. A variety of evaluation criteria are used to determine how well these models work, including R-squared (R2), mean absolute error (MAE), root mean squared error (RMSE), mean absolute percentage error (MAPE), and correctness.  The results show that the RF and XGB algorithms function exceptionally well, with high R-squared values and low error rates.  KNN, PCA, ensemble methods, and GA, on the other hand, outperform the top-performing models. Their lower R-squared values and higher error scores indicate that they are unable to accurately anticipate CO2 emissions. This paper contributes to the field of environmental modeling by comparing the effectiveness of various machine learning approaches in forecasting CO2 emissions. The findings can assist Thailand in promoting sustainable development and developing policies that are consistent with worldwide efforts to combat climate change

    Rough Sets Clustering and Markov model for Web Access Prediction

    Get PDF
    Discovering user access patterns from web access log is increasing the importance of information to build up adaptive web server according to the individual user’s behavior. The variety of user behaviors on accessing information also grows, which has a great impact on the network utilization. In this paper, we present a rough set clustering to cluster web transactions from web access logs and using Markov model for next access prediction. Using this approach, users can effectively mine web log records to discover and predict access patterns. We perform experiments using real web trace logs collected from www.dusit.ac.th servers. In order to improve its prediction ration, the model includes a rough sets scheme in which search similarity measure to compute the similarity between two sequences using upper approximation

    Unsupervised Anomaly Detection with Unlabeled Data Using Clustering

    Get PDF
    Intrusions pose a serious security risk in a network environment. New intrusion types, of which detection systems are unaware, are the most difficult to detect. The amount of available network audit data instances is usually large; human labeling is tedious, time-consuming, and expensive. Traditional anomaly detection algorithms require a set of purely normal data from which they train their model. We present a clustering-based intrusion detection algorithm, unsupervised anomaly detection, which trains on unlabeled data in order to detect new intrusions. Our method is able to detect many different types of intrusions, while maintaining a low false positive rate as verified over the Knowledge Discovery and Data Mining - KDD CUP 1999 dataset

    Using Markov Model and Association Rules for Web Access Prediction

    Get PDF
    Mining user patterns of log file can provide significant and useful informative knowledge. A large amount of research has been done on trying to predict correctly the pages a user will request. This task requires the development of models that can predicts a user’s next request to a web server. In this paper, we propose a method for constructing first-order and second-order Markov models of Web site access prediction based on past visitor behavior and compare it association rules technique. This algorithm has been used to cluster similar transition behaviors for efficient used to further improve the efficiency of prediction. From this comparison we propose a best overall method and empirically test the proposed model on real web logs

    Mining Usage Web Log Via Independent Component Analysis And Rough Fuzzy

    Get PDF
    In the past few years, web usage mining techniques have grown rapidly together with the explosive growth of the web, both in the research and commercial areas. Web Usage Mining is that area of Web Mining which deals with the extraction of interesting knowledge from logging information produced by Web servers. A challenge in web classification is how to deal with the high dimensionality of the feature space. In this paper we present Independent Component Analysis (ICA) for feature selection and using Rough Fuzzy for clustering web user sessions. Our experiments indicate can improve the predictive performance when the original feature set for representing web log is large and can handling the different groups of uncertainties/impreciseness accuracy

    Independent Component Analysis And Rough Fuzzy Based Approach To Web Usage Mining

    Get PDF
    Web Usage Mining is that area of Web Mining which deals with the extraction of interesting knowledge from logging information produced by Web servers. A challenge in web classification is how to deal with the high dimensionality of the feature space. In this paper we present Independent Component Analysis (ICA) for feature selection and using Rough Fuzzy for clustering web user sessions. It aims at discovery of trends and regularities in web users’ access patterns. ICA is a very general-purpose statistical technique in which observed random data are linearly transformed into components that are maximally independent from each other, and simultaneously have “interesting� distributions. Our experiments indicate can improve the predictive performance when the original feature set for representing web log is large and can handling the different groups of uncertainties/ impreciseness accuracy

    Hybrid fuzzy techniques for unsupervised intrusion detection system

    Get PDF
    Network intrusion detection is a complex research problem especially when it deals with unknown patterns. Furthermore, if the amount of audit data instances is large, human labelling becomes tedious, time-consuming, and expensive. A technique which can enhance the learning capability of an anomaly intrusion detection system is required. Unsupervised anomaly detection methods have been deployed to address the weaknesses of both signature-based and supervised anomaly detection. These methods take a set of unlabelled data as input, in which the majority of data set is normal traffic, and attempt to find intrusion hidden in the data. Although the unsupervised anomaly detection has received a lot of attention from many researchers, it still has many drawbacks which can be improved. This thesis proposes a framework which comprises three components: feature selection, new clustering and novel cluster labelling. The task of feature selection is to choose relevant feature which is obtained through statistical testing. The new clustering technique is called F2ART which is a hybrid of Fuzzy c-means and Fuzzy Adaptive Resonance Theory. It incorporates a modified similarity measure and a new learning rule which also includes a fuzzy membership value in improving the detection rate. Finally this thesis also proposes a new cluster labelling algorithm called Normal Membership Factor (NMF). This algorithm introduces weighting degree of probability of clusters, which can decrease false positive rate. Based on the experimental results that have been carried out using the KDD Cup 1999 data set, it indicates that the framework provides the best performance in terms of detection rate compared to the current unsupervised anomaly detection approaches. Unlike traditional anomaly detection methods that require 98 percent of the unlabelled data to be in normal pattern, this framework can still work with only 80 percent of the normal pattern. In addition, it can also improve the analysis of new data over time without the need to retrain over all the previous and new dat

    Anomaly detection of intrusion based on integration of rough sets and fuzzy c-means

    Get PDF
    As malicious intrusions are a growing problem, we need a solution to detect the intrusions accurately. Network administrators are continuously looking for new ways to protect their resources from harm, both internally and externally. Intrusion detection systems look for unusual or suspicious activity, such as patterns of network traffic that are likely indicators of unauthorized activity. New intrusion types, of which detection systems are unaware, are the most difficult to detect. The amount of available network audit data instances is usually large; human labeling is tedious, time-consuming, and expensive. The objective of this paper is to describe a rough sets and fuzzy c-means algorithms and discuss its usage to detect intrusion in a computer network. Fuzzy systems have demonstrated their ability to solve different kinds of problems in various applications domains. We are using a Rough Sets to select a subset of input features for clustering with a goal of increasing the detection rate and decreasing the false alarm rate in network intrusion detection. Fuzzy c-Means allow objects to belong to several clusters simultaneously, with different degrees of membership. Experiments were performed with DARPA data sets, which have information on computer networks, during normal behavior and intrusive behavior

    Integrating genetic algorithms and fuzzy c-means for anomaly detection

    Get PDF
    The goal of intrusion detection is to discover unauthorized use of computer systems. New intrusion types, of which detection systems are unaware, are the most difficult to detect. The amount of available network audit data instances is usually large; human labeling is tedious, time-consuming, and expensive. Traditional anomaly detection algorithms require a set of purely normal data from which they train their model. In this paper we propose an intrusion detection method that combines Fuzzy Clustering and Genetic Algorithms. Clustering-based intrusion detection algorithm which trains on unlabeled data in order to detect new intrusions. Fuzzy c-Means allow objects to belong to several clusters simultaneously, with different degrees of membership. Genetic Algorithms (GA) to the problem of selection of optimized feature subsets to reduce the error caused by using land-selected features. Our method is able to detect many different types of intrusions, while maintaining a low false positive rate. We used data set from 1999 KDD intrusion detection contest
    corecore