150 research outputs found

    Parallel particle swarm optimization based on spark for academic paper co-authorship prediction

    Get PDF
    The particle swarm optimization (PSO) algorithm has been widely used in various optimization problems. Although PSO has been successful in many fields, solving optimization problems in big data applications often requires processing of massive amounts of data, which cannot be handled by traditional PSO on a single machine. There have been several parallel PSO based on Spark, however they are almost proposed for solving numerical optimization problems, and few for big data optimization problems. In this paper, we propose a new Spark-based parallel PSO algorithm to predict the co-authorship of academic papers, which we formulate as an optimization problem from massive academic data. Experimental results show that the proposed parallel PSO can achieve good prediction accuracy

    Slime Mold Optimization with Relational Graph Convolutional Network for Big Data Classification on Apache Spark Environment

    Get PDF
    Lately, Big Data (BD) classification has become an active research area in different fields namely finance, healthcare, e-commerce, and so on. Feature Selection (FS) is a crucial task for text classification challenges. Text FS aims to characterize documents using the most relevant feature. This method might reduce the dataset size and maximize the efficiency of the machine learning method. Various researcher workers focus on elaborating effective FS techniques. But most of the presented techniques are assessed for smaller datasets and validated by a single machine. As textual data dimensionality becomes high, conventional FS methodologies should be parallelized and improved to manage textual big datasets. This article develops a Slime Mold Optimization based FS with Optimal Relational Graph Convolutional Network (SMOFS-ORGCN) for BD Classification in Apache Spark Environment. The presented SMOFS-ORGCN model mainly focuses on the classification of BD accurately and rapidly. To handle BD, the SMOFS-ORGCN model uses an Apache Spark environment. In the SMOFS-ORGCN model, the SMOFS technique gets executed for reducing the profanity of dimensionality and to improve classification accuracy. In this article, the RGCN technique is employed for BD classification. In addition, Grey Wolf Optimizer (GWO) technique is utilized as a hyperparameter optimizer of the RGCN technique to enhance the classification achievement. To exhibit the better achievement of the SMOFS-ORGCN technique, a far-reaching experiments were conducted. The comparison results reported enhanced outputs of the SMOFS-ORGCN technique over current models

    An approach to support generic topologies in distributed PSO algorithms in Spark

    Get PDF
    Particle Swarm Optimization (PSO) is a popular population-based search algorithm that has been applied to all kinds of complex optimization problems. Although the performance of the algorithm strongly depends on the social topology that determines the interaction between the particles during the search, current Metaheuristic Optimization Frameworks (MOFs) provide limited support for topologies. In this paper, we present an approach to support generic topologies in distributed PSO algorithms within a framework for the development and execution of populationbased metaheuristics in Spark, which is currently under development.Facultad de Informátic

    Short Papers of the 11th Conference on Cloud Computing Conference, Big Data & Emerging Topics (JCC-BD&ET 2023)

    Get PDF
    Compilación de los short papers presentados en las 11vas Jornadas de Cloud Computing, Big Data & Emerging Topics (JCC-BD&ET2023), llevadas a cabo en modalidad híbrida durante junio de 2023 y organizadas por el Instituto de Investigación en Informática LIDI (III-LIDI) y la Secretaría de Posgrado de la Facultad de Informática de la UNLP en colaboración con universidades de Argentina y del exterior.Facultad de Informátic

    Scalable Teacher Forcing Network for Semi-Supervised Large Scale Data Streams

    Full text link
    The large-scale data stream problem refers to high-speed information flow which cannot be processed in scalable manner under a traditional computing platform. This problem also imposes expensive labelling cost making the deployment of fully supervised algorithms unfeasible. On the other hand, the problem of semi-supervised large-scale data streams is little explored in the literature because most works are designed in the traditional single-node computing environments while also being fully supervised approaches. This paper offers Weakly Supervised Scalable Teacher Forcing Network (WeScatterNet) to cope with the scarcity of labelled samples and the large-scale data streams simultaneously. WeScatterNet is crafted under distributed computing platform of Apache Spark with a data-free model fusion strategy for model compression after parallel computing stage. It features an open network structure to address the global and local drift problems while integrating a data augmentation, annotation and auto-correction (DA3DA^3) method for handling partially labelled data streams. The performance of WeScatterNet is numerically evaluated in the six large-scale data stream problems with only 25%25\% label proportions. It shows highly competitive performance even if compared with fully supervised learners with 100%100\% label proportions.Comment: This paper has been accepted for publication in Information Science

    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)

    Get PDF
    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.The PhD Symposium was a very good opportunity for the young researchers to share information and knowledge, to present their current research, and to discuss topics with other students in order to look for synergies and common research topics. The idea was very successful and the assessment made by the PhD Student was very good. It also helped to achieve one of the major goals of the NESUS Action: to establish an open European research network targeting sustainable solutions for ultrascale computing aiming at cross fertilization among HPC, large scale distributed systems, and big data management, training, contributing to glue disparate researchers working across different areas and provide a meeting ground for researchers in these separate areas to exchange ideas, to identify synergies, and to pursue common activities in research topics such as sustainable software solutions (applications and system software stack), data management, energy efficiency, and resilience.European Cooperation in Science and Technology. COS

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Scheduling Problems

    Get PDF
    Scheduling is defined as the process of assigning operations to resources over time to optimize a criterion. Problems with scheduling comprise both a set of resources and a set of a consumers. As such, managing scheduling problems involves managing the use of resources by several consumers. This book presents some new applications and trends related to task and data scheduling. In particular, chapters focus on data science, big data, high-performance computing, and Cloud computing environments. In addition, this book presents novel algorithms and literature reviews that will guide current and new researchers who work with load balancing, scheduling, and allocation problems

    Parallel Exchange of Randomized SubGraphs for Optimization of Network Alignment: PERSONA

    Get PDF
    The aim of Network Alignment in Protein-Protein Interaction Networks is discovering functionally similar regions between compared organisms. One major compromise for solving a network alignment problem is the trade-off among multiple similarity objectives while applying an alignment strategy. An alignment may lose its biological relevance while favoring certain objectives upon others due to the actual relevance of unfavored objectives. One possible solution for solving this issue may be blending the stronger aspects of various alignment strategies until achieving mature solutions. This study proposes a parallel approach called PERSONA that allows aligners to share their partial solutions continuously while they progress. All these aligners pursue their particular heuristics as part of a particle swarm that searches for multi-objective solutions of the same alignment problem in a reactive actor environment. The actors use the stronger portion of a solution as a subgraph that they receive from leading or other actors and send their own stronger subgraphs back upon evaluation of those partial solutions. Moreover, the individual heuristics of each actor takes randomized parameter values at each cycle of parallel execution so that the problem search space can thoroughly be investigated. The results achieved with PERSONA are remarkably optimized and balanced for both topological and node similarity objectives
    • …
    corecore