818,499 research outputs found

    A Hierarchical Distributed Processing Framework for Big Image Data

    Get PDF
    Abstract—This paper introduces an effective processing framework nominated ICP (Image Cloud Processing) to powerfully cope with the data explosion in image processing field. While most previous researches focus on optimizing the image processing algorithms to gain higher efficiency, our work dedicates to providing a general framework for those image processing algorithms, which can be implemented in parallel so as to achieve a boost in time efficiency without compromising the results performance along with the increasing image scale. The proposed ICP framework consists of two mechanisms, i.e. SICP (Static ICP) and DICP (Dynamic ICP). Specifically, SICP is aimed at processing the big image data pre-stored in the distributed system, while DICP is proposed for dynamic input. To accomplish SICP, two novel data representations named P-Image and Big-Image are designed to cooperate with MapReduce to achieve more optimized configuration and higher efficiency. DICP is implemented through a parallel processing procedure working with the traditional processing mechanism of the distributed system. Representative results of comprehensive experiments on the challenging ImageNet dataset are selected to validate the capacity of our proposed ICP framework over the traditional state-of-the-art methods, both in time efficiency and quality of results

    Development of a New Framework for Distributed Processing of Geospatial Big Data

    Get PDF
    Geospatial technology is still facing a lack of “out of the box” distributed processing solutions which are suitable for the amount and heterogeneity of geodata, and particularly for use cases requiring a rapid response. Moreover, most of the current distributed computing frameworks have important limitations hindering the transparent and flexible control of processing (and/or storage) nodes and control of distribution of data chunks. We investigated the design of distributed processing systems and existing solutions related to Geospatial Big Data. This research area is highly dynamic in terms of new developments and the re-use of existing solutions (that is, the re-use of certain modules to implement further specific developments), with new implementations continuously emerging in areas such as disaster management, environmental monitoring and earth observation. The distributed processing of raster data sets is the focus of this paper, as we believe that the problem of raster data partitioning is far from trivial: a number of tiling and stitching requirements need to be addressed to be able to fulfil the needs of efficient image processing beyond pixel level. We attempt to compare the terms Big Data, Geospatial Big Data and the traditional Geospatial Data in order to clarify the typical differences, to compare them in terms of storage and processing backgrounds for different data representations and to categorize the common processing systems from the aspect of distributed raster processing. This clarification is necessary due to the fact that they behave differently on the processing side, and particular processing solutions need to be developed according to their characteristics. Furthermore, we compare parallel and distributed computing, taking into account the fact that these are used improperly in several cases. We also briefly assess the widely-known MapReduce paradigm in the context of geospatial applications. The second half of the article reports on a new processing framework initiative, currently at the concept and early development stages, which aims to be capable of processing raster, vector and point cloud data in a distributed IT ecosystem. The developed system is modular, has no limitations on programming language environment, and can execute scripts written in any development language (e.g. Python, R or C#)

    A Framework for XML-based Integration of Data, Visualization and Analysis in a Biomedical Domain

    Get PDF
    Biomedical data are becoming increasingly complex and heterogeneous in nature. The data are stored in distributed information systems, using a variety of data models, and are processed by increasingly more complex tools that analyze and visualize them. We present in this paper our framework for integrating biomedical research data and tools into a unique Web front end. Our framework is applied to the University of Washington’s Human Brain Project. SpeciïŹcally, we present solutions to four integration tasks: deïŹnition of complex mappings from relational sources to XML, distributed XQuery processing, generation of heterogeneous output formats, and the integration of heterogeneous data visualization and analysis tools

    DIAMOnDS - DIstributed Agents for MObile & Dynamic Services

    Full text link
    Distributed Services Architecture with support for mobile agents between services, offer significantly improved communication and computational flexibility. The uses of agents allow execution of complex operations that involve large amounts of data to be processed effectively using distributed resources. The prototype system Distributed Agents for Mobile and Dynamic Services (DIAMOnDS), allows a service to send agents on its behalf, to other services, to perform data manipulation and processing. Agents have been implemented as mobile services that are discovered using the Jini Lookup mechanism and used by other services for task management and communication. Agents provide proxies for interaction with other services as well as specific GUI to monitor and control the agent activity. Thus agents acting on behalf of one service cooperate with other services to carry out a job, providing inter-operation of loosely coupled services in a semi-autonomous way. Remote file system access functionality has been incorporated by the agent framework and allows services to dynamically share and browse the file system resources of hosts, running the services. Generic database access functionality has been implemented in the mobile agent framework that allows performing complex data mining and processing operations efficiently in distributed system. A basic data searching agent is also implemented that performs a query based search in a file system. The testing of the framework was carried out on WAN by moving Connectivity Test agents between AgentStations in CERN, Switzerland and NUST, Pakistan.Comment: 7 pages, 4 figures, CHEP03, La Jolla, California, March 24-28, 200

    Simulating Distributed Systems

    Get PDF
    The simulation framework developed within the "Models of Networked Analysis at Regional Centers" (MONARC) project as a design and optimization tool for large scale distributed systems is presented. The goals are to provide a realistic simulation of distributed computing systems, customized for specific physics data processing tasks and to offer a flexible and dynamic environment to evaluate the performance of a range of possible distributed computing architectures. A detailed simulation of a large system, the CMS High Level Trigger (HLT) production farm, is also presented

    Streaming, Distributed Variational Inference for Bayesian Nonparametrics

    Full text link
    This paper presents a methodology for creating streaming, distributed inference algorithms for Bayesian nonparametric (BNP) models. In the proposed framework, processing nodes receive a sequence of data minibatches, compute a variational posterior for each, and make asynchronous streaming updates to a central model. In contrast to previous algorithms, the proposed framework is truly streaming, distributed, asynchronous, learning-rate-free, and truncation-free. The key challenge in developing the framework, arising from the fact that BNP models do not impose an inherent ordering on their components, is finding the correspondence between minibatch and central BNP posterior components before performing each update. To address this, the paper develops a combinatorial optimization problem over component correspondences, and provides an efficient solution technique. The paper concludes with an application of the methodology to the DP mixture model, with experimental results demonstrating its practical scalability and performance.Comment: This paper was presented at NIPS 2015. Please use the following BibTeX citation: @inproceedings{Campbell15_NIPS, Author = {Trevor Campbell and Julian Straub and John W. {Fisher III} and Jonathan P. How}, Title = {Streaming, Distributed Variational Inference for Bayesian Nonparametrics}, Booktitle = {Advances in Neural Information Processing Systems (NIPS)}, Year = {2015}
    • 

    corecore