11 research outputs found

    Text documents clustering using modified multi-verse optimizer

    Get PDF
    In this study, a multi-verse optimizer (MVO) is utilised for the text document clus- tering (TDC) problem. TDC is treated as a discrete optimization problem, and an objective function based on the Euclidean distance is applied as similarity measure. TDC is tackled by the division of the documents into clusters; documents belonging to the same cluster are similar, whereas those belonging to different clusters are dissimilar. MVO, which is a recent metaheuristic optimization algorithm established for continuous optimization problems, can intelligently navigate different areas in the search space and search deeply in each area using a particular learning mechanism. The proposed algorithm is called MVOTDC, and it adopts the convergence behaviour of MVO operators to deal with discrete, rather than continuous, optimization problems. For evaluating MVOTDC, a comprehensive comparative study is conducted on six text document datasets with various numbers of documents and clusters. The quality of the final results is assessed using precision, recall, F-measure, entropy accuracy, and purity measures. Experimental results reveal that the proposed method performs competitively in comparison with state-of-the-art algorithms. Statistical analysis is also conducted and shows that MVOTDC can produce significant results in comparison with three well-established methods

    Earlier stage for straggler detection and handling using combined CPU test and LATE methodology

    Get PDF
    Using MapReduce in Hadoop helps in lowering the execution time and power consumption for large scale data. However, there can be a delay in job processing in circumstances where tasks are assigned to bad or congested machines called "straggler tasks"; which increases the time, power consumptions and therefore increasing the costs and leading to a poor performance of computing systems. This research proposes a hybrid MapReduce framework referred to as the combinatory late-machine (CLM) framework. Implementation of this framework will facilitate early and timely detection and identification of stragglers thereby facilitating prompt appropriate and effective actions

    Improved Multi-Verse Optimizer Feature Selection Technique With Application To Phishing, Spam, and Denial Of Service Attacks

    Get PDF
    Intelligent classification systems proved their merits in different fields including cybersecurity. However, most cybercrime issues are characterized of being dynamic and not static classification problems where the set of discriminative features keep changing with time. This indeed requires revising the cybercrime classification system and pick a group of features that preserve or enhance its performance. Not only this but also the system compactness is regarded as an important factor to judge on the capability of any classification system where cybercrime classification systems are not an exception. The current research proposes an improved feature selection algorithm that is inspired from the well-known multi-verse optimizer (MVO) algorithm. Such an algorithm is then applied to 3 different cybercrime classification problems namely phishing websites, spam, and denial of service attacks. MVO is a population-based approach which stimulates a well-known theory in physics namely multi-verse theory. MVO uses the black and white holes principles for exploration, and wormholes principle for exploitation. A roulette selection schema is used for scientifically modeling the principles of white hole and black hole in exploration phase, which bias to the good solutions, in this case the solutions will be moved toward the best solution and probably to lose the diversity, other solutions may contain important information but didn’t get chance to be improved. Thus, this research will improve the exploration of the MVO by introducing the adaptive neighborhood search operations in updating the MVO solutions. The classification phase has been done using a classifier to evaluate the results and to validate the selected features. Empirical outcomes confirmed that the improved MVO (IMVO) algorithm is capable to enhance the search capability of MVO, and outperform other algorithm involved in comparison

    S-Divergence-Based Internal Clustering Validation Index

    Get PDF
    A clustering validation index (CVI) is employed to evaluate an algorithm’s clustering results. Generally, CVI statistics can be split into three classes, namely internal, external, and relative cluster validations. Most of the existing internal CVIs were designed based on compactness (CM) and separation (SM). The distance between cluster centers is calculated by SM, whereas the CM measures the variance of the cluster. However, the SM between groups is not always captured accurately in highly overlapping classes. In this article, we devise a novel internal CVI that can be regarded as a complementary measure to the landscape of available internal CVIs. Initially, a database’s clusters are modeled as a non-parametric density function estimated using kernel density estimation. Then the S-divergence (SD) and S-distance are introduced for measuring the SM and the CM, respectively. The SD is defined based on the concept of Hermitian positive definite matrices applied to density functions. The proposed internal CVI (PM) is the ratio of CM to SM. The PM outperforms the legacy measures presented in the literature on both superficial and realistic databases in various scenarios, according to empirical results from four popular clustering algorithms, including fuzzy k-means, spectral clustering, density peak clustering, and density-based spatial clustering applied to noisy data

    Classifying spam emails using agglomerative hierarchical clustering and a topic-based approach

    Get PDF
    [EN] Spam emails are unsolicited, annoying and sometimes harmful messages which may contain malware, phishing or hoaxes. Unlike most studies that address the design of efficient anti-spam filters, we approach the spam email problem from a different and novel perspective. Focusing on the needs of cybersecurity units, we follow a topic-based approach for addressing the classification of spam email into multiple categories. We propose SPEMC-15K-E and SPEMC-15K-S, two novel datasets with approximately 15K emails each in English and Spanish, respectively, and we label them using agglomerative hierarchical clustering into 11 classes. We evaluate 16 pipelines, combining four text representation techniques -Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words, Word2Vec and BERT- and four classifiers: Support Vector Machine, Näive Bayes, Random Forest and Logistic Regression. Experimental results show that the highest performance is achieved with TF-IDF and LR for the English dataset, with a F1 score of 0.953 and an accuracy of 94.6%, and while for the Spanish dataset, TF-IDF with NB yields a F1 score of 0.945 and 98.5% accuracy. Regarding the processing time, TF-IDF with LR leads to the fastest classification, processing an English and Spanish spam email in 2ms and 2.2ms on average, respectively.S

    The buttressed walls problem: An application of a hybrid clustering particle swarm optimization algorithm

    Full text link
    [EN] The design of reinforced earth retaining walls is a combinatorial optimization problem of interest due to practical applications regarding the cost savings involved in the design and the optimization in the amount of CO2 emissions generated in its construction. On the other hand, this problem presents important challenges in computational complexity since it involves 32 design variables; therefore we have in the order of 10^20 possible combinations. In this article, we propose a hybrid algorithm in which the particle swarm optimization method is integrated that solves optimization problems in continuous spaces with the db-scan clustering technique, with the aim of addressing the combinatorial problem of the design of reinforced earth retaining walls. This algorithm optimizes two objective functions: the carbon emissions embedded and the economic cost of reinforced concrete walls. To assess the contribution of the db-scan operator in the optimization process, a random operator was designed. The best solutions, the averages, and the interquartile ranges of the obtained distributions are compared. The db-scan algorithm was then compared with a hybrid version that uses k-means as the discretization method and with a discrete implementation of the harmony search algorithm. The results indicate that the db-scan operator significantly improves the quality of the solutions and that the proposed metaheuristic shows competitive results with respect to the harmony search algorithm.The first author was supported by the Grant CONICYT/FONDECYT/INICIACION/11180056, the other two authors were supported by the Spanish Ministry of Economy and Competitiveness, along with FEDER funding (Project: BIA2017-85098-R).Garcia, J.; Martí Albiñana, JV.; Yepes, V. (2020). The buttressed walls problem: An application of a hybrid clustering particle swarm optimization algorithm. Mathematics. 8(6):862-01-862-22. https://doi.org/10.3390/math8060862S862-01862-228

    Identification of continuous-time model of hammerstein system using modified multi-verse optimizer

    Get PDF
    his thesis implements a novel nature-inspired metaheuristic optimization algorithm, namely the modified Multi-Verse Optimizer (mMVO) algorithm, to identify the continuous-time model of Hammerstein system. Multi-Verse Optimizer (MVO) is one of the most recent robust nature-inspired metaheuristic algorithm. It has been successfully implemented and used in various areas such as machine learning applications, engineering applications, network applications, parameter control, and other similar applications to solve optimization problems. However, such metaheuristics had some limitations, such as local optima problem, low searching capability and imbalance between exploration and exploitation. By considering these limitations, two modifications were made upon the conventional MVO in our proposed mMVO algorithm. Our first modification was an average design parameter updating mechanism to solve the local optima issue of the traditional MVO. The essential feature of the average design parameter updating mechanism is that it helps any trapped design parameter jump out from the local optima region and continue a new search track. The second modification is the hybridization of MVO with the Sine Cosine Algorithm (SCA) to improve the low searching capability of the conventional MVO. Hybridization aims to combine MVO and SCA algorithms advantages and minimize the disadvantages, such as low searching capability and imbalance between exploration and exploitation. In particular, the search capacity of the MVO algorithm has been improved using the sine and cosine functions of the Sine Cosine Algorithm (SCA) that will be able to balance the processes of exploration and exploitation. The mMVO based method is then used for identifying the parameters of linear and nonlinear subsystems in the Hammerstein model using the given input and output data. Note that the structure of the linear and nonlinear subsystems is assumed to be known. Moreover, a continuous-time linear subsystem is considered in this study, while there are a few methods that utilize such models. Two numerical examples and one real-world application, such as the Twin Rotor System (TRS) are used to illustrate the efficiency of the mMVO-based method. Various nonlinear subsystems such as quadratic and hyperbolic functions (sine and tangent) are used in those experiments. Numerical and experimental results are analyzed to focus on the convergence curve of the fitness function, the parameter variation index, frequency and time domain response and the Wilcoxon rank test. For the numerical identifications, three different levels of white noise variances were taken. The statistical analysis value (mean) was taken from the parameter deviation index to see how much our proposed algorithm has improved. For Example 1, the improvements are 29%, 33.15% and 36.68%, and for the noise variances, 0.01, 0.25, and 1.0 improvements can be found. For Example 2, the improvements are 39.36%, 39.61% and 66.18%, and for noise variances, the improvements are by 0.01, 0.25 and 1.0, respectively. Finally, for the real TRS application, the improvement is 7%. The numerical and experimental results also showed that both Hammerstein model subsystems are defined effectively using the mMVO-based method, particularly in quadratic output estimation error and a differentiation parameter index. The results further confirmed that the proposed mMVObased method provided better solutions than other optimization techniques, such as PSO, GWO, ALO, MVO and SCA

    An enhanced binary bat and Markov clustering algorithms to improve event detection for heterogeneous news text documents

    Get PDF
    Event Detection (ED) works on identifying events from various types of data. Building an ED model for news text documents greatly helps decision-makers in various disciplines in improving their strategies. However, identifying and summarizing events from such data is a non-trivial task due to the large volume of published heterogeneous news text documents. Such documents create a high-dimensional feature space that influences the overall performance of the baseline methods in ED model. To address such a problem, this research presents an enhanced ED model that includes improved methods for the crucial phases of the ED model such as Feature Selection (FS), ED, and summarization. This work focuses on the FS problem by automatically detecting events through a novel wrapper FS method based on Adapted Binary Bat Algorithm (ABBA) and Adapted Markov Clustering Algorithm (AMCL), termed ABBA-AMCL. These adaptive techniques were developed to overcome the premature convergence in BBA and fast convergence rate in MCL. Furthermore, this study proposes four summarizing methods to generate informative summaries. The enhanced ED model was tested on 10 benchmark datasets and 2 Facebook news datasets. The effectiveness of ABBA-AMCL was compared to 8 FS methods based on meta-heuristic algorithms and 6 graph-based ED methods. The empirical and statistical results proved that ABBAAMCL surpassed other methods on most datasets. The key representative features demonstrated that ABBA-AMCL method successfully detects real-world events from Facebook news datasets with 0.96 Precision and 1 Recall for dataset 11, while for dataset 12, the Precision is 1 and Recall is 0.76. To conclude, the novel ABBA-AMCL presented in this research has successfully bridged the research gap and resolved the curse of high dimensionality feature space for heterogeneous news text documents. Hence, the enhanced ED model can organize news documents into distinct events and provide policymakers with valuable information for decision making
    corecore