21,728 research outputs found
Automating Requirements Traceability: Two Decades of Learning from KDD
This paper summarizes our experience with using Knowledge Discovery in Data
(KDD) methodology for automated requirements tracing, and discusses our
insights.Comment: The work of the second author has been supported in part by NSF
grants CCF-1511117 and CICI 1642134; 4 pages; in Proceedings of IEEE
Requirements Engineering 201
Evaluation of Machine Learning Algorithms for Intrusion Detection System
Intrusion detection system (IDS) is one of the implemented solutions against
harmful attacks. Furthermore, attackers always keep changing their tools and
techniques. However, implementing an accepted IDS system is also a challenging
task. In this paper, several experiments have been performed and evaluated to
assess various machine learning classifiers based on KDD intrusion dataset. It
succeeded to compute several performance metrics in order to evaluate the
selected classifiers. The focus was on false negative and false positive
performance metrics in order to enhance the detection rate of the intrusion
detection system. The implemented experiments demonstrated that the decision
table classifier achieved the lowest value of false negative while the random
forest classifier has achieved the highest average accuracy rate
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
A Clustering-Based Algorithm for Data Reduction
Finding an efficient data reduction method for large-scale
problems is an imperative task. In this paper, we propose a similarity-based self-constructing fuzzy clustering algorithm to do the sampling of instances for the classification task. Instances that are similar to each other are grouped into the same cluster. When all the instances have been fed in, a number of clusters are formed automatically. Then the statistical mean for each cluster will be regarded as representing all the instances covered in the cluster. This approach has two advantages. One is that it can be faster and uses less storage memory. The other is that the number of new representative instances need not be specified in advance by the user. Experiments on real-world datasets show that our method can run faster and obtain better reduction rate than other methods
- …