49 research outputs found
From feature selection to building of bayesian classifiers: A network intrusion detection perspective
Abstract: Problem statement: Implementing a single or multiple classifiers that involve a Bayesian Network (BN) is a rising research interest in network intrusion detection domain. Approach: However, little attention has been given to evaluate the performance of BN classifiers before they could be implemented in a real system. In this research, we proposed a novel approach to select important features by utilizing two selected feature selection algorithms utilizing filter approach. Results: The selected features were further validated by domain experts where extra features were added into the final proposed feature set. We then constructed three types of BN namely, Naive Bayes Classifiers (NBC), Learned BN and Expert-elicited BN by utilizing a standard network intrusion dataset. The performance of each classifier was recorded. We found that there was no difference in overall performance of the BNs and therefore, concluded that the BNs performed equivalently well in detecting network attacks. Conclusion/Recommendations: The results of the study indicated that the BN built using the proposed feature set has less features but the performance was comparable to BNs built using other feature sets generated by the two algorithms
Evaluation on rapid profiling with clustering algorithms for plantation stocks on Bursa Malaysia
Building a stock portfolio often requires extensive financial knowledge and Herculean efforts looking at the amount of financial data to analyse. In this study, we utilized Expectation
Maximization (EM), K-Means (KM), and Hierarchical Clustering (HC) algorithms to cluster the 38 plantation stocks listed on Bursa Malaysia using 14 financial ratios derived from the fundamental
analysis.The clustering allows investors to profile each resulted cluster statistically and assists them in selecting stocks for their
stock portfolios rapidly.The performance of each cluster was then assessed using 1-year stock price movement.The result showed that a cluster resulted from EM had a better profile and obtained a higher average capital gain as compared with the other
clusters
From Feature Selection to Building of Cascaded Classifiers: A Network Intrusion Detection Perspective
This study proposed the cascaded classifier approach in order to demonstrate highly improved performance in detecting rare classes and gives competitive results in detecting dominant classes as compared to previous Intrusion Detection System (IDS) work
An Improvement to StockProF: Profiling Clustered Stocks with Class Association Rule Mining
Using StockProF developed in our previous work, we are able to identify outliers from a pool of stocks and form clusters with the remaining stocks based on their financial performance. The financial performance is measured using financial ratios obtained directly or derived from financial reports. The resulted clusters are then profiled manually using mean and 5-number summary calculated from the financial ratios. However, this is time consuming and a disadvantage to novice investors who are lacking of skills in interpreting financial ratios. In this study, we utilized class association rule mining to overcome the problems. Class association rule mining was used to form rules by finding financial ratios that were strongly associated with a particular cluster. The resulted rules were more intuitive to investors as compared with our previous work. Thus, the profiling process became easier. The evaluation results also showed that profiling stocks using class association rules helps investors in making better investment decisions
Rapid identification of outstanding real estate investment trusts with outlier detection algorithms
Finding outstanding stocks is always the primary goal of an investor. This is because outstanding stocks tend to outperform others in investment return. However, uncover this type of stocks from a stock pool requires extensive financial knowledge and consistent efforts in analyzing the abundant amount of financial data. Thus, it is impractical for an amateur investor. The objective of this study is to rapidly identify outstanding stocks from the Real Estate Investment Trust (REIT) sector. We adopted two outlier detection algorithms, i.e. Interquartile Range (IQR) and Local Outlier Factor (LOF) to trace REIT stocks that were deviated from the average performers. Subsequently, the outstanding REIT stocks can be identified from the small amount of outliers. The entire process is speedy and can be done on the fly. The identified outstanding stocks were assessed based on their 1-year average total return as compared with the non-outlier stocks. The preliminary result showed that their average total return is better than its non-outlier peers
Evaluation of Cost Sensitive Learning for Imbalanced Bank Direct Marketing Data
Objectives: The imbalanced bank direct marketing data set utilized in this study is a two-class data mining problem, where a customer may or may not subscribe a product from a bank. Methods/Statistical Analysis: The data set inherited the rare class problem where the classification rate attained for the rare class is low. In this study, we attempted cost sensitive learning to mitigate the problem, and to address that there are various costs involved when misclassification occurs. Three learning algorithms, namely, Naive Bayes (NB), C4.5 and Naive Bayes Tree (NBT) were involved in the cost sensitive learning and their results were empirically evaluated. Findings: The results were also compared with two previous studies that utilized the cost insensitive SVM and over-sampling, respectively. Although cost sensitive learning is claimed able to handle imbalanced data sets, but we noticed that the learning is less effective for the bank direct marketing data set in overall. Cost sensitive learning provides a way of “wrapping” learning algorithms that are not designed to handle imbalanced class distributions. Therefore, it may not work well for certain imbalanced data sets. Over-sampling, on the other hand, worked well for the data set. Improvements/Applications: Over-sampling helped to generalize the decision region of the rare class clearly and subsequently improved the classification result
The effectiveness of sampling methods for the imbalanced network intrusion detection data set
One of the countermeasures taken by security experts against network attacks is by implementing Intrusion Detection Systems (IDS) in computer networks. Researchers often utilize the de facto network intrusion detection data set, KDD Cup 1999, to evaluate proposed IDS in the context of data mining. However, the imbalanced class distribution of the data set leads to a rare class problem. The problem causes low detection (classification) rates for the rare classes, particularly R2L and U2R. Two commonly used sampling methods to mitigate the rare class problem were evaluated in this research, namely, (1) under-sampling and (2) over-sampling. However, these two methods were less effective in mitigating the problem. The reasons of such performance are presented in this paper
StockProF: a stock profiling framework using data mining approaches
Analysing stock financial data and producing an insight into it are not easy tasks for many stock investors, particularly individual investors. Therefore, building a good stock portfolio from a pool of stocks often requires Herculean efforts. This paper proposes a stock profiling framework, StockProF, for building stock portfolios rapidly. StockProF utilizes data mining approaches, namely, (1) Local Outlier Factor (LOF) and (2) Expectation Maximization (EM). LOF first detects outliers (stocks) that are superior or poor in financial performance. After removing the outliers, EM clusters the remaining stocks. The investors can then profile the resulted clusters using mean and 5-number summary. This study utilized the financial data of the plantation stocks listed on Bursa Malaysia. The authors used 1-year stock price movements to evaluate the performance of the outliers as well as the clusters. The results showed that StockProF is effective as the profiling corresponded to the average capital gain or loss of the plantation stocks
A Bayesian approach to classify conference papers
This article aims at presenting a methodological approach for classifying educational conference papers by employing a Bayesian Network (BN). A total of 400 conference papers were collected and categorized into 4 major topics (Intelligent Tutoring System, Cognition, e-Learning, and Teacher Education). In this study, we have implemented a 80-20 split of collected papers. 80% of the papers were meant for keywords extraction and BN parameter learning whereas the other 20% were aimed for predictive accuracy performance. A feature selection algorithm was applied to automatically extract keywords for each topic. The extracted keywords were then used for constructing BN. The prior probabilities were subsequently learned using the Expectation Maximization (EM) algorithm. The network has gone through a series of validation by human experts and experimental evaluation to analyze its predictive accuracy. The result has demonstrated that the proposed BN has outperformed Naive Bayesian Classifier, and BN learned from the training data
Class Association Rules for Profiling Outlier Stocks
Finding a stock with superior financial performance demands not only abundance of time, but a lot of financial knowledge from retail investors. Consequently, they always end up with empty handed. This research aims to assist them to “recognize” this type of stock in a fast manner, despite they are not financially savvy. In this study, we started with identifying outliers in a pool of construction stocks. Then, these outliers were manually classified into two classes, i.e. outstanding or poor outliers. Class association rule mining was performed to these classes to generate sets of association rules, which were used to profile each outlier class. Investors may use the rules of the profiles to pick potential outstanding stocks or avoid poor performance stocks