3,815 research outputs found

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    The iPlant Collaborative: Cyberinfrastructure for Plant Biology

    Get PDF
    The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). iPlant is developing cyberinfrastructure that uniquely enables scientists throughout the diverse fields that comprise plant biology to address Grand Challenges in new ways, to stimulate and facilitate cross-disciplinary research, to promote biology and computer science research interactions, and to train the next generation of scientists on the use of cyberinfrastructure in research and education. Meeting humanity's projected demands for agricultural and forest products and the expectation that natural ecosystems be managed sustainably will require synergies from the application of information technologies. The iPlant cyberinfrastructure design is based on an unprecedented period of research community input, and leverages developments in high-performance computing, data storage, and cyberinfrastructure for the physical sciences. iPlant is an open-source project with application programming interfaces that allow the community to extend the infrastructure to meet its needs. iPlant is sponsoring community-driven workshops addressing specific scientific questions via analysis tool integration and hypothesis testing. These workshops teach researchers how to add bioinformatics tools and/or datasets into the iPlant cyberinfrastructure enabling plant scientists to perform complex analyses on large datasets without the need to master the command-line or high-performance computational services

    GeantV: Results from the prototype of concurrent vector particle transport simulation in HEP

    Full text link
    Full detector simulation was among the largest CPU consumer in all CERN experiment software stacks for the first two runs of the Large Hadron Collider (LHC). In the early 2010's, the projections were that simulation demands would scale linearly with luminosity increase, compensated only partially by an increase of computing resources. The extension of fast simulation approaches to more use cases, covering a larger fraction of the simulation budget, is only part of the solution due to intrinsic precision limitations. The remainder corresponds to speeding-up the simulation software by several factors, which is out of reach using simple optimizations on the current code base. In this context, the GeantV R&D project was launched, aiming to redesign the legacy particle transport codes in order to make them benefit from fine-grained parallelism features such as vectorization, but also from increased code and data locality. This paper presents extensively the results and achievements of this R&D, as well as the conclusions and lessons learnt from the beta prototype.Comment: 34 pages, 26 figures, 24 table

    Agroforestry Opportunities for Enhancing Resilience to Climate Change in Rainfed Areas,

    Get PDF
    Not AvailableAgroforestry provides a unique opportunity to achieve the objectives of enhancing the productivity and improving the soil quality. Tree systems can also play an important role towards adapting to the climate variability and important carbon sinks which helps to decrease the pressure on natural forests. Realizing the importance of the agroforestry in meeting the twin objectives of mitigation and adaptation to climate change as well as making rainfed agriculture more climate resilient, the ICAR-CRIDA has taken up the challenge in pursuance of National Agroforestry Policy 2014, in preparing a book on Agroforestry Opportunities for Enhancing Resilience to Climate Change in Rainfed Areas at ICAR-CRIDA to sharpen the skills of all stakeholders at national, state and district level in rainfed areas to increase agricultural productivity in response to climate changeNot Availabl

    Federated Learning and Meta Learning:Approaches, Applications, and Directions

    Get PDF
    Over the past few years, significant advancements have been made in the field of machine learning (ML) to address resource management, interference management, autonomy, and decision-making in wireless networks. Traditional ML approaches rely on centralized methods, where data is collected at a central server for training. However, this approach poses a challenge in terms of preserving the data privacy of devices. To address this issue, federated learning (FL) has emerged as an effective solution that allows edge devices to collaboratively train ML models without compromising data privacy. In FL, local datasets are not shared, and the focus is on learning a global model for a specific task involving all devices. However, FL has limitations when it comes to adapting the model to devices with different data distributions. In such cases, meta learning is considered, as it enables the adaptation of learning models to different data distributions using only a few data samples. In this tutorial, we present a comprehensive review of FL, meta learning, and federated meta learning (FedMeta). Unlike other tutorial papers, our objective is to explore how FL, meta learning, and FedMeta methodologies can be designed, optimized, and evolved, and their applications over wireless networks. We also analyze the relationships among these learning algorithms and examine their advantages and disadvantages in real-world applications.</p

    Adapting Swarm Intelligence For The Self-Assembly And Optimization Of Networks

    Get PDF
    While self-assembly is a fairly active area of research in swarm intelligence and robotics, relatively little attention has been paid to the issues surrounding the construction of network structures. Here, methods developed previously for modeling and controlling the collective movements of groups of agents are extended to serve as the basis for self-assembly or "growth" of networks, using neural networks as a concrete application to evaluate this novel approach. One of the central innovations incorporated into the model presented here is having network connections arise as persistent "trails" left behind moving agents, trails that are reminiscent of pheromone deposits made by agents in ant colony optimization models. The resulting network connections are thus essentially a record of agent movements. The model's effectiveness is demonstrated by using it to produce two large networks that support subsequent learning of topographic and feature maps. Improvements produced by the incorporation of collective movements are also examined through computational experiments. These results indicate that methods for directing collective movements can be extended to support and facilitate network self-assembly. Additionally, the traditional self-assembly problem is extended to include the generation of network structures based on optimality criteria, rather than on target structures that are specified a priori. It is demonstrated that endowing the network components involved in the self-assembly process with the ability to engage in collective movements can be an effective means of generating computationally optimal network structures. This is confirmed on a number of challenging test problems from the domains of trajectory generation, time-series forecasting, and control. Further, this extension of the model is used to illuminate an important relationship between particle swarm optimization, which usually occurs in high dimensional abstract spaces, and self-assembly, which is normally grounded in real and simulated 2D and 3D physical spaces

    Evolutionary Computation 2020

    Get PDF
    Intelligent optimization is based on the mechanism of computational intelligence to refine a suitable feature model, design an effective optimization algorithm, and then to obtain an optimal or satisfactory solution to a complex problem. Intelligent algorithms are key tools to ensure global optimization quality, fast optimization efficiency and robust optimization performance. Intelligent optimization algorithms have been studied by many researchers, leading to improvements in the performance of algorithms such as the evolutionary algorithm, whale optimization algorithm, differential evolution algorithm, and particle swarm optimization. Studies in this arena have also resulted in breakthroughs in solving complex problems including the green shop scheduling problem, the severe nonlinear problem in one-dimensional geodesic electromagnetic inversion, error and bug finding problem in software, the 0-1 backpack problem, traveler problem, and logistics distribution center siting problem. The editors are confident that this book can open a new avenue for further improvement and discoveries in the area of intelligent algorithms. The book is a valuable resource for researchers interested in understanding the principles and design of intelligent algorithms
    • …
    corecore