391 research outputs found

    Slime Mold Optimization with Relational Graph Convolutional Network for Big Data Classification on Apache Spark Environment

    Get PDF
    Lately, Big Data (BD) classification has become an active research area in different fields namely finance, healthcare, e-commerce, and so on. Feature Selection (FS) is a crucial task for text classification challenges. Text FS aims to characterize documents using the most relevant feature. This method might reduce the dataset size and maximize the efficiency of the machine learning method. Various researcher workers focus on elaborating effective FS techniques. But most of the presented techniques are assessed for smaller datasets and validated by a single machine. As textual data dimensionality becomes high, conventional FS methodologies should be parallelized and improved to manage textual big datasets. This article develops a Slime Mold Optimization based FS with Optimal Relational Graph Convolutional Network (SMOFS-ORGCN) for BD Classification in Apache Spark Environment. The presented SMOFS-ORGCN model mainly focuses on the classification of BD accurately and rapidly. To handle BD, the SMOFS-ORGCN model uses an Apache Spark environment. In the SMOFS-ORGCN model, the SMOFS technique gets executed for reducing the profanity of dimensionality and to improve classification accuracy. In this article, the RGCN technique is employed for BD classification. In addition, Grey Wolf Optimizer (GWO) technique is utilized as a hyperparameter optimizer of the RGCN technique to enhance the classification achievement. To exhibit the better achievement of the SMOFS-ORGCN technique, a far-reaching experiments were conducted. The comparison results reported enhanced outputs of the SMOFS-ORGCN technique over current models

    Methodology for modified whale optimization algorithm for solving appliances scheduling problem

    Get PDF
    Whale Optimization Algorithm (WOA) is considered as one of the newest metaheuristic algorithms to be used for solving a type of NP-hard problems. WOA is known of having slow convergence and at the same time, the computation of the algorithm will also be increased exponentially with multiple objectives and huge request from n users. The current constraints surely limit for solving and optimizing the quality of Demand Side Management (DSM) case, such as the energy consumption of indoor comfort index parameters which consist of thermal comfort, air quality, humidity and vision comfort.To address these issues, this proposed work will firstly justify and validate the constraints related to the appliances scheduling problem, and later proposes a new model of the Cluster based Multi-Objective WOA with multiple restart strategy. In order to achieve the objectives, different initialization strategy and cluster-based approaches will be used for tuning the main parameter of WOA under different MapReduce application which helps to control exploration and exploitation, and the proposed model will be tested on a set of well-known test functions and finally, will be applied on a real case project i.e. appliances scheduling problem. It is anticipating that the approach can expedite the convergence of meta-heuristic technique with quality solution

    Bio-inspired computation for big data fusion, storage, processing, learning and visualization: state of the art and future directions

    Get PDF
    This overview gravitates on research achievements that have recently emerged from the confluence between Big Data technologies and bio-inspired computation. A manifold of reasons can be identified for the profitable synergy between these two paradigms, all rooted on the adaptability, intelligence and robustness that biologically inspired principles can provide to technologies aimed to manage, retrieve, fuse and process Big Data efficiently. We delve into this research field by first analyzing in depth the existing literature, with a focus on advances reported in the last few years. This prior literature analysis is complemented by an identification of the new trends and open challenges in Big Data that remain unsolved to date, and that can be effectively addressed by bio-inspired algorithms. As a second contribution, this work elaborates on how bio-inspired algorithms need to be adapted for their use in a Big Data context, in which data fusion becomes crucial as a previous step to allow processing and mining several and potentially heterogeneous data sources. This analysis allows exploring and comparing the scope and efficiency of existing approaches across different problems and domains, with the purpose of identifying new potential applications and research niches. Finally, this survey highlights open issues that remain unsolved to date in this research avenue, alongside a prescription of recommendations for future research.This work has received funding support from the Basque Government (Eusko Jaurlaritza) through the Consolidated Research Group MATHMODE (IT1294-19), EMAITEK and ELK ARTEK programs. D. Camacho also acknowledges support from the Spanish Ministry of Science and Education under PID2020-117263GB-100 grant (FightDIS), the Comunidad Autonoma de Madrid under S2018/TCS-4566 grant (CYNAMON), and the CHIST ERA 2017 BDSI PACMEL Project (PCI2019-103623, Spain)

    An improved Arabic text classification method using word embedding

    Get PDF
    Feature selection (FS) is a widely used method for removing redundant or irrelevant features to improve classification accuracy and decrease the model’s computational cost. In this paper, we present an improved method (referred to hereafter as RARF) for Arabic text classification (ATC) that employs the term frequency-inverse document frequency (TF-IDF) and Word2Vec embedding technique to identify words that have a particular semantic relationship. In addition, we have compared our method with four benchmark FS methods namely principal component analysis (PCA), linear discriminant analysis (LDA), chi-square, and mutual information (MI). Support vector machine (SVM), k-nearest neighbors (K-NN), and naive Bayes (NB) are three machine learning based algorithms used in this work. Two different Arabic datasets are utilized to perform a comparative analysis of these algorithms. This paper also evaluates the efficiency of our method for ATC on the basis of performance metrics viz accuracy, precision, recall, and F-measure. Results revealed that the highest accuracy achieved for the SVM classifier applied to the Khaleej-2004 Arabic dataset with 94.75%, while the same classifier recorded an accuracy of 94.01% for the Watan-2004 Arabic dataset

    Investigations on Machining Aspects of Inconel 718 During Wire Electro-Discharge Machining (WEDM): Experimental and Numerical Analysis

    Get PDF
    Wire electro- discharge machining (WEDM) is known as unique cutting in manufacturing industries, especially in the good tolerance with intricate shape geometry in die industry. In this study the workpiece has been chosen as Inconel 718. Inconel 718 super alloy is widely used in aerospace industries. This nickel based super alloy has excellent resistance to high temperature, mechanical and chemical degradations with toughness and work hardening characteristics materials. Due to these properties, the machinability studies of this material have been carried-out in this study. The machining of Inconel 718 using variation of wire electrode material (brass wire electrode and zinc coated brass wire) with diameter equal to 0.20mm has been carried out. The objective of this study is mainly to investigate the various WEDM process parameters and performance of wire electrodes materials on Inconel 718 with various types of cutting. The optimal process parameter setting for each of wire electrode material has been obtained for multi-objective response. The kerf width, Material Removal Rate (MRR) and surface finish, corner error, corner deviation and angular error are the responses which are function of process variables viz. pulse-on time, discharge current, wire speed, flushing pressure and taper angle. The non-linear regression analysis has been developed for relationship between the process parameter and process characteristics. The optimal parameters setting have been carried out using multi-objective nature-inspired meta-heuristic optimization algorithm such as Whale Optimization Algorithm (WOA) and Gray Wolf Optimizer (GWO). Lastly numerical model analysis has been carried out to determine MRR and residual stress using ANSYS software and MRR model validated with the experimental results. The overlapping approach has been adopted for solving the multi-spark problem and validate with the experimental results

    Análisis estadístico del rendimiento de cuatro algoritmos de Apache Spark ML

    Get PDF
    Feature selection (FS) techniques generally require repeatedly training and evaluating models to assess the importance of each feature for a particular task. However, due to the increasing size of currently available databases, distributed processing has become a necessity for many tasks. In this context, the Apache Spark ML library is one of the most widely used libraries for performing classification and other tasks with large datasets. Therefore, knowing both the predictive performance and efficiency of its main algorithms before applying a FS technique is crucial to planning computations and saving time. In this work, a comparative study of four Spark ML classification algorithms is carried out, statistically measuring execution times and predictive power based on the number of attributes from a colon cancer database. Results were statistically analyzed, showing that, although Random Forest and Naive Bayes are the algorithms with the shortest execution times, Support Vector Machine obtains models with the best predictive power. The study of the performance of these algorithms is interesting as they are applied in many different problems, such as classification of pathologies from epigenomic data, image classification, prediction of computer attacks in network security problems, among others.Las técnicas de selección de características suelen requerir el entrenamiento y la evaluación repetida de modelos con el fin de evaluar la ünportancia de cada característica para una tarea concreta. Sin embargo, debido al aumento del tamaño de las bases de datos disponibles actualmente, el procesamiento distribuido se ha convertido en una necesidad para muchas tareas tareas. En este contexto, la librería Apache Spark ML es una de las más utilizadas para realizar clasificación y otras tareas con grandes conjuntos de datos. Por ello, conocer tanto el rendimiento predictivo como la eficiencia de sus principales algoritmos antes de aplicar una técnica de selección de características es crucial para planificar los cálculos y ahorrar tiempo. En este trabajo se realiza un estudio comparativo de cuatro algoritmos de clasificación de Spark ML, midiendo estadísticamente los tiempos de ejecución y el poder predictivo en función del número de atributos de una base de datos de cáncer de colon. Los resultados fueron analizados estadísticamente, mostrando que, aunque Random Forest y Naive Bayes son los algoritmos con menores tiempos de ejecución, Support Vector Machine obtiene modelos con el mejor poder predictivo. El estudio de la performance de estos algoritmos resulta interesante ya que los mismos son utilizados en problemas muy diversos, como por ejemplo, la clasificación de diferentes patologías a partir de datos epigenómicos, clasificación de imágenes, la predicción de ataques informáticos en problemas de seguridad en redes, entre otros.Facultad de Informátic

    Review of Big Data Analytics, Artificial Intelligence and Nature-Inspired Computing Models towards Accurate Detection of COVID-19 Pandemic Cases and Contact Tracing

    Get PDF
    ArticleThe emergence of the 2019 novel coronavirus (COVID-19) which was declared a pandemic has spread to 210 countries worldwide. It has had a significant impact on health systems and economic, educational and social facets of contemporary society. As the rate of transmission increases, various collaborative approaches among stakeholders to develop innovative means of screening, detecting and diagnosing COVID-19’s cases among human beings at a commensurate rate have evolved. Further, the utility of computing models associated with the fourth industrial revolution technologies in achieving the desired feat has been highlighted. However, there is a gap in terms of the accuracy of detection and prediction of COVID-19 cases and tracing contacts of infected persons. This paper presents a review of computing models that can be adopted to enhance the performance of detecting and predicting the COVID-19 pandemic cases. We focus on big data, artificial intelligence (AI) and nature-inspired computing (NIC) models that can be adopted in the current pandemic. The review suggested that artificial intelligence models have been used for the case detection of COVID-19. Similarly, big data platforms have also been applied for tracing contacts. However, the nature-inspired computing (NIC) models that have demonstrated good performance in feature selection of medical issues are yet to be explored for case detection and tracing of contacts in the current COVID-19 pandemic. This study holds salient implications for practitioners and researchers alike as it elucidates the potentials of NIC in the accurate detection of pandemic cases and optimized contact tracing

    Q-Learnheuristics: towards data-driven balanced metaheuristics

    Get PDF
    One of the central issues that must be resolved for a metaheuristic optimization process to work well is the dilemma of the balance between exploration and exploitation. The metaheuristics (MH) that achieved this balance can be called balanced MH, where a Q-Learning (QL) integration framework was proposed for the selection of metaheuristic operators conducive to this balance, particularly the selection of binarization schemes when a continuous metaheuristic solves binary combinatorial problems. In this work the use of this framework is extended to other recent metaheuristics, demonstrating that the integration of QL in the selection of operators improves the exploration-exploitation balance. Specifically, the Whale Optimization Algorithm and the Sine-Cosine Algorithm are tested by solving the Set Covering Problem, showing statistical improvements in this balance and in the quality of the solutions

    Reliable Machine Learning Model for IIoT Botnet Detection

    Get PDF
    Due to the growing number of Internet of Things (IoT) devices, network attacks like denial of service (DoS) and floods are rising for security and reliability issues. As a result of these attacks, IoT devices suffer from denial of service and network disruption. Researchers have implemented different techniques to identify attacks aimed at vulnerable Internet of Things (IoT) devices. In this study, we propose a novel features selection algorithm FGOA-kNN based on a hybrid filter and wrapper selection approaches to select the most relevant features. The novel approach integrated with clustering rank the features and then applies the Grasshopper algorithm (GOA) to minimize the top-ranked features. Moreover, a proposed algorithm, IHHO, selects and adapts the neural network’s hyper parameters to detect botnets efficiently. The proposed Harris Hawks algorithm is enhanced with three improvements to improve the global search process for optimal solutions. To tackle the problem of population diversity, a chaotic map function is utilized for initialization. The escape energy of hawks is updated with a new nonlinear formula to avoid the local minima and better balance between exploration and exploitation. Furthermore, the exploitation phase of HHO is enhanced using a new elite operator ROBL. The proposed model combines unsupervised, clustering, and supervised approaches to detect intrusion behaviors. The N-BaIoT dataset is utilized to validate the proposed model. Many recent techniques were used to assess and compare the proposed model’s performance. The result demonstrates that the proposed model is better than other variations at detecting multiclass botnet attacks

    From past to present: spam detection and identifying opinion leaders in social networks

    Get PDF
    On microblogging sites, which are gaining more and more users every day, a wide range of ideas are quickly emerging, spreading, and creating interactive environments. In some cases, in Turkey as well as in the rest of the world, it was noticed that events were published on microblogging sites before appearing in visual, audio and printed news sources. Thanks to the rapid flow of information in social networks, it can reach millions of people in seconds. In this context, social media can be seen as one of the most important sources of information affecting public opinion. Since the information in social networks became accessible, research started to be conducted using the information on the social networks. While the studies about spam detection and identification of opinion leaders gained popularity, surveys about these topics began to be published. This study also shows the importance of spam detection and identification of opinion leaders in social networks. It is seen that the data collected from social platforms, especially in recent years, has sourced many state-of-art applications. There are independent surveys that focus on filtering the spam content and detecting influencers on social networks. This survey analyzes both spam detection studies and opinion leader identification and categorizes these studies by their methodologies. As far as we know there is no survey that contains approaches for both spam detection and opinion leader identification in social networks. This survey contains an overview of the past and recent advances in both spam detection and opinion leader identification studies in social networks. Furthermore, readers of this survey have the opportunity of understanding general aspects of different studies about spam detection and opinion leader identification while observing key points and comparisons of these studies.This work is supported in part by the Scientific and Technological Research Council of Turkey (TUBITAK) through grant number 118E315 and grant number 120E187. Points of view in this document are those of the authors and do not necessarily represent the official position or policies of TUBITAK.Publisher's VersionEmerging Sources Citation Index (ESCI)Q4WOS:00080858480001
    corecore