4 research outputs found

    Performance Evaluation of Manhattan and Euclidean Distance Measures For Clustering Based Automatic Text Summarization

    Get PDF
    In the past few years, there has been an explosion in the amount of text data from a variety of sources. This volume of text is a valuable source of information and knowledge which needs to be effectively summarized to be useful. In this paper, automatic text summarization with K-means clustering techniques is presented by employing two different distance measurement methods (Euclidean and Manhattan). The dataset extracted from African prose was preprocessed using stopwords removal and tokenization. The preprocessed document is converted into vector representation using tf-idf technique and k-means clustering is applied using Euclidean and Manhattan distance measures to generate summary. There are different distance measures for k-means which has been used in several works. However, there is dearth of work on the performance evaluation of these distance measures in text summarization. The experimental analysis was performed on Waikato Environment for Knowledge Analysis (WEKA). The results obtained showed that the Euclidean variation produced an extractive summary of sentences amounting to 72% from three different clusters while the Manhattan variation produced an extractive summary of sentences that made up 94% of the total document all in one cluster using compression ratio as the performance metric.  Keywords— Text summarization, Euclidean distance, k-means clustering, Manhattan distanc

    Empirical analysis of tree-based classification models for customer churn prediction

    No full text
    Customer churn is a vital and reoccurring problem facing most business industries, particularly the telecommunications industry. Considering the fierce competition among telecommunications firms and the high expenses of attracting and gaining new subscribers, keeping existing loyal subscribers becomes crucial. Early prediction of disgruntled subscribers can assist telecommunications firms in identifying the reasons for churn and in deploying applicable innovative policies to boost productivity, maintain market competitiveness, and reduce monetary damages. Controlling customer churn through the development of efficient and dependable customer churn prediction (CCP) solutions is imperative to attaining this goal. According to the outcomes of current CCP research, several strategies, including rule-based and machine-learning (ML) processes, have been proposed to handle the CCP phenomenon. However, the lack of flexibility and robustness of rule based CCP solutions is a fundamental shortcoming, and the lopsided distribution of churn datasets is deleterious to the efficacy of most traditional ML techniques in CCP. Regardless, ML-based CCP solutions have been reported to be more effective than other forms of CCP solutions. Unlike linear-based, instance-based, and function-based ML classifiers, tree-based ML classifiers are known to generate predictive models with high accuracy, high stability, and ease of interpretation. However, the deployment of tree-based classifiers for CCP is limited in most cases to the decision tree (DT) and random forest (RF). Hence, this research investigated the effectiveness of tree-based classifiers with diverse computational properties in CCP. Specifically, the CCP performances of diverse tree-based classifiers such as the single, ensemble, enhanced, and hybrid tree-based classifiers are investigated. Also, the effects of data quality problems such as the class imbalance problem (CIP) on the predictive performances of tree-based classifiers and their homogeneous ensemble variants on CCP were assessed. From the experimental results, it was observed that the investigated tree-based classifiers outperformed other forms of classifiers such as linear-based (Support Vector Machine (SVM)), instance-based (K-Nearest Neighbour (KNN)), Bayesian-based (Naïve Bayes (NB)) and function-based (MultiLayer Perceptron (MLP)) classifiers in most cases with or without the CIP. Also, it was observed that the CIP has a significant effect on the CCP performances of investigated tree-based classifiers, but the combination of a data sampling technique and a homogeneous ensemble method can be an effective solution to CIP and also generate efficient CCP models

    Empirical Analysis of Forest Penalizing Attribute and Its Enhanced Variations for Android Malware Detection

    No full text
    As a result of the rapid advancement of mobile and internet technology, a plethora of new mobile security risks has recently emerged. Many techniques have been developed to address the risks associated with Android malware. The most extensively used method for identifying Android malware is signature-based detection. The drawback of this method, however, is that it is unable to detect unknown malware. As a consequence of this problem, machine learning (ML) methods for detecting and classifying malware applications were developed. The goal of conventional ML approaches is to improve classification accuracy. However, owing to imbalanced real-world datasets, the traditional classification algorithms perform poorly in detecting malicious apps. As a result, in this study, we developed a meta-learning approach based on the forest penalizing attribute (FPA) classification algorithm for detecting malware applications. In other words, with this research, we investigated how to improve Android malware detection by applying empirical analysis of FPA and its enhanced variants (Cas_FPA and RoF_FPA). The proposed FPA and its enhanced variants were tested using the Malgenome and Drebin Android malware datasets, which contain features gathered from both static and dynamic Android malware analysis. Furthermore, the findings obtained using the proposed technique were compared with baseline classifiers and existing malware detection methods to validate their effectiveness in detecting malware application families. Based on the findings, FPA outperforms the baseline classifiers and existing ML-based Android malware detection models in dealing with the unbalanced family categorization of Android malware apps, with an accuracy of 98.94% and an area under curve (AUC) value of 0.999. Hence, further development and deployment of FPA-based meta-learners for Android malware detection and other cybersecurity threats is recommended

    Empirical Analysis of Forest Penalizing Attribute and Its Enhanced Variations for Android Malware Detection

    No full text
    As a result of the rapid advancement of mobile and internet technology, a plethora of new mobile security risks has recently emerged. Many techniques have been developed to address the risks associated with Android malware. The most extensively used method for identifying Android malware is signature-based detection. The drawback of this method, however, is that it is unable to detect unknown malware. As a consequence of this problem, machine learning (ML) methods for detecting and classifying malware applications were developed. The goal of conventional ML approaches is to improve classification accuracy. However, owing to imbalanced real-world datasets, the traditional classification algorithms perform poorly in detecting malicious apps. As a result, in this study, we developed a meta-learning approach based on the forest penalizing attribute (FPA) classification algorithm for detecting malware applications. In other words, with this research, we investigated how to improve Android malware detection by applying empirical analysis of FPA and its enhanced variants (Cas_FPA and RoF_FPA). The proposed FPA and its enhanced variants were tested using the Malgenome and Drebin Android malware datasets, which contain features gathered from both static and dynamic Android malware analysis. Furthermore, the findings obtained using the proposed technique were compared with baseline classifiers and existing malware detection methods to validate their effectiveness in detecting malware application families. Based on the findings, FPA outperforms the baseline classifiers and existing ML-based Android malware detection models in dealing with the unbalanced family categorization of Android malware apps, with an accuracy of 98.94% and an area under curve (AUC) value of 0.999. Hence, further development and deployment of FPA-based meta-learners for Android malware detection and other cybersecurity threats is recommended
    corecore