13,840 research outputs found
A Benchmark for Image Retrieval using Distributed Systems over the Internet: BIRDS-I
The performance of CBIR algorithms is usually measured on an isolated
workstation. In a real-world environment the algorithms would only constitute a
minor component among the many interacting components. The Internet
dramati-cally changes many of the usual assumptions about measuring CBIR
performance. Any CBIR benchmark should be designed from a networked systems
standpoint. These benchmarks typically introduce communication overhead because
the real systems they model are distributed applications. We present our
implementation of a client/server benchmark called BIRDS-I to measure image
retrieval performance over the Internet. It has been designed with the trend
toward the use of small personalized wireless systems in mind. Web-based CBIR
implies the use of heteroge-neous image sets, imposing certain constraints on
how the images are organized and the type of performance metrics applicable.
BIRDS-I only requires controlled human intervention for the compilation of the
image collection and none for the generation of ground truth in the measurement
of retrieval accuracy. Benchmark image collections need to be evolved
incrementally toward the storage of millions of images and that scaleup can
only be achieved through the use of computer-aided compilation. Finally, our
scoring metric introduces a tightly optimized image-ranking window.Comment: 24 pages, To appear in the Proc. SPIE Internet Imaging Conference
200
DKVF: A Framework for Rapid Prototyping and Evaluating Distributed Key-value Stores
We present our framework DKVF that enables one to quickly prototype and
evaluate new protocols for key-value stores and compare them with existing
protocols based on selected benchmarks. Due to limitations of CAP theorem, new
protocols must be developed that achieve the desired trade-off between
consistency and availability for the given application at hand. Hence, both
academic and industrial communities focus on developing new protocols that
identify a different (and hopefully better in one or more aspect) point on this
trade-off curve. While these protocols are often based on a simple intuition,
evaluating them to ensure that they indeed provide increased availability,
consistency, or performance is a tedious task. Our framework, DKVF, enables one
to quickly prototype a new protocol as well as identify how it performs
compared to existing protocols for pre-specified benchmarks. Our framework
relies on YCSB (Yahoo! Cloud Servicing Benchmark) for benchmarking. We
demonstrate DKVF by implementing four existing protocols --eventual
consistency, COPS, GentleRain and CausalSpartan-- with it. We compare the
performance of these protocols against different loading conditions. We find
that the performance is similar to our implementation of these protocols from
scratch. And, the comparison of these protocols is consistent with what has
been reported in the literature. Moreover, implementation of these protocols
was much more natural as we only needed to translate the pseudocode into Java
(and add the necessary error handling). Hence, it was possible to achieve this
in just 1-2 days per protocol. Finally, our framework is extensible. It is
possible to replace individual components in the framework (e.g., the storage
component)
CloudScope: diagnosing and managing performance interference in multi-tenant clouds
© 2015 IEEE.Virtual machine consolidation is attractive in cloud computing platforms for several reasons including reduced infrastructure costs, lower energy consumption and ease of management. However, the interference between co-resident workloads caused by virtualization can violate the service level objectives (SLOs) that the cloud platform guarantees. Existing solutions to minimize interference between virtual machines (VMs) are mostly based on comprehensive micro-benchmarks or online training which makes them computationally intensive. In this paper, we present CloudScope, a system for diagnosing interference for multi-tenant cloud systems in a lightweight way. CloudScope employs a discrete-time Markov Chain model for the online prediction of performance interference of co-resident VMs. It uses the results to optimally (re)assign VMs to physical machines and to optimize the hypervisor configuration, e.g. the CPU share it can use, for different workloads. We have implemented CloudScope on top of the Xen hypervisor and conducted experiments using a set of CPU, disk, and network intensive workloads and a real system (MapReduce). Our results show that CloudScope interference prediction achieves an average error of 9%. The interference-aware scheduler improves VM performance by up to 10% compared to the default scheduler. In addition, the hypervisor reconfiguration can improve network throughput by up to 30%
Dynamic Virtualized Deployment of Particle Physics Environments on a High Performance Computing Cluster
The NEMO High Performance Computing Cluster at the University of Freiburg has
been made available to researchers of the ATLAS and CMS experiments. Users
access the cluster from external machines connected to the World-wide LHC
Computing Grid (WLCG). This paper describes how the full software environment
of the WLCG is provided in a virtual machine image. The interplay between the
schedulers for NEMO and for the external clusters is coordinated through the
ROCED service. A cloud computing infrastructure is deployed at NEMO to
orchestrate the simultaneous usage by bare metal and virtualized jobs. Through
the setup, resources are provided to users in a transparent, automatized, and
on-demand way. The performance of the virtualized environment has been
evaluated for particle physics applications
Benchmarking database systems for Genomic Selection implementation
Motivation: With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems. Results: We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix
- …