Search CORE

78,207 research outputs found

Object level physics data replication in the Grid

Author: Holtman Koen
Publication venue: 'AIP Publishing'
Publication date: 01/01/2000
Field of study

To support distributed physics analysis on a scale as foreseen by the LHC experiments, 'Grid' systems are needed that manage and streamline data distribution, replication, and synchronization. We report on the development of a tool that allows large physics datasets to be managed and replicated at the granularity level of single objects. Efficient and convenient support for data extraction and replication at the level of individual objects and events will enable for types of interactive data analysis that would be too inconvenient or costly to perform with tools that work on a file level only. Our tool development effort is intended as both a demonstrator project for various types of existing Grid technology, and as a research effort to develop Grid technology further. The basic use case supported by our tool is one in which a physicist repeatedly selects some physics objects located at a central repository, and replicates them to a local site. The selection can be done using 'tag' or 'ntuple' analysis at the local site. The tool replicates the selected objects, and merges all replicated objects into a single single coherent 'virtual' dataset. This allows all objects to be used together seamlessly, even if they were replicated at different times or from different locations. The version of the tool that is reported on in this paper replicates ORCA based physics data created by CMS in its ongoing high level trigger design studies. The basic capabilities and limitations of the tool are discussed, together with some performance results. Some tool internals are also presented. Finally we will report on experiences so far and on future plans

Clouder: a flexible large scale decentralized object store - architecture overview

Author: Oliveira Rui Carlos Mendes de
Vilaça Ricardo Manuel Pereira
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/04/2009
Field of study

The current exponential growth of data calls for massive scale capabilities of storage and processing. Such large volumes of data tend to disallow their centralized storage and processing making extensive and flexible data partitioning unavoidable. This is being acknowledged by several major Internet players embracing the Cloud computing model and offering first generation remote storage services with simple processing capabilities. In this position paper we present preliminary ideas for the architecture of a flexible, efficient and dependable fully decentralized object store able to manage very large sets of variable size objects and to coordinate in place processing. Our target are local area large computing facilities composed of tens of thousands of nodes under the same administrative domain. The system should be capable of leveraging massive replication of data to balance read scalability and fault tolerance.(undefined

Universidade do Minho: RepositoriUM

Fragmented ARES: Dynamic Storage for Large Objects

Author: Georgiou Chryssis
Nicolaou Nicolas
Trigeorgi Andria
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 36th International Symposium on Distributed Computing (DISC 2022)
Publication date: 01/01/2022
Field of study

Data availability is one of the most important features in distributed storage systems, made possible by data replication. Nowadays data are generated rapidly and developing efficient, scalable and reliable storage systems has become one of the major challenges for high performance computing. In this work, we develop and prove correct a dynamic, robust and strongly consistent distributed shared memory suitable for handling large objects (such as files) and utilizing erasure coding. We do so by integrating an Adaptive, Reconfigurable, Atomic memory framework, called Ares, with the CoBFS framework, which relies on a block fragmentation technique to handle large objects. With the addition of Ares, we also enable the use of an erasure-coded algorithm to further split the data and to potentially improve storage efficiency at the replica servers and operation latency. Our development is complemented with an in-depth experimental evaluation on the Emulab and AWS EC2 testbeds, illustrating the benefits of our approach, as well as interesting tradeoffs

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Brain-inspired automated visual object discovery and detection

Author: Chen Lichao
Kailath Thomas
Roychowdhury Vwani
Singh Sudhir
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 02/01/2019
Field of study

Despite significant recent progress, machine vision systems lag considerably behind their biological counterparts in performance, scalability, and robustness. A distinctive hallmark of the brain is its ability to automatically discover and model objects, at multiscale resolutions, from repeated exposures to unlabeled contextual data and then to be able to robustly detect the learned objects under various nonideal circumstances, such as partial occlusion and different view angles. Replication of such capabilities in a machine would require three key ingredients: (i) access to large-scale perceptual data of the kind that humans experience, (ii) flexible representations of objects, and (iii) an efficient unsupervised learning algorithm. The Internet fortunately provides unprecedented access to vast amounts of visual data. This paper leverages the availability of such data to develop a scalable framework for unsupervised learning of object prototypes—brain-inspired flexible, scale, and shift invariant representations of deformable objects (e.g., humans, motorcycles, cars, airplanes) comprised of parts, their different configurations and views, and their spatial relationships. Computationally, the object prototypes are represented as geometric associative networks using probabilistic constructs such as Markov random fields. We apply our framework to various datasets and show that our approach is computationally scalable and can construct accurate and operational part-aware object models much more efficiently than in much of the recent computer vision literature. We also present efficient algorithms for detection and localization in new scenes of objects and their partial views

arXiv.org e-Print Archive

eScholarship - University of California

Caltech Authors

Simplified Distributed Programming with Micro Objects

Author: Gwen Salaün
Jan-Mark S. Wams
Maarten van Steen
MohammadReza Mousavi
Publication venue: 'Open Publishing Association'
Publication date: 01/01/2010
Field of study

Developing large-scale distributed applications can be a daunting task. object-based environments have attempted to alleviate problems by providing distributed objects that look like local objects. We advocate that this approach has actually only made matters worse, as the developer needs to be aware of many intricate internal details in order to adequately handle partial failures. The result is an increase of application complexity. We present an alternative in which distribution transparency is lessened in favor of clearer semantics. In particular, we argue that a developer should always be offered the unambiguous semantics of local objects, and that distribution comes from copying those objects to where they are needed. We claim that it is often sufficient to provide only small, immutable objects, along with facilities to group objects into clusters.Comment: In Proceedings FOCLASA 2010, arXiv:1007.499

arXiv.org e-Print Archive

Crossref

VU Research Portal

Directory of Open Access Journals

Efficient Processing of k Nearest Neighbor Joins using MapReduce

Author: Chen Su
Lu Wei
Ooi Beng Chin
Shen Yanyan
Publication venue
Publication date: 01/01/2012
Field of study

k nearest neighbor join (kNN join), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operation widely adopted by many data mining applications. As a combination of the k nearest neighbor query and the join operation, kNN join is an expensive operation. Given the increasing volume of data, it is difficult to perform a kNN join on a centralized machine efficiently. In this paper, we investigate how to perform kNN join using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers. In brief, the mappers cluster objects into groups; the reducers perform the kNN join on each group of objects separately. We design an effective mapping mechanism that exploits pruning rules for distance filtering, and hence reduces both the shuffling and computational costs. To reduce the shuffling cost, we propose two approximate algorithms to minimize the number of replicas. Extensive experiments on our in-house cluster demonstrate that our proposed methods are efficient, robust and scalable.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

ScholarBank@NUS