6,128 research outputs found
A fast stable sorting algorithm with absolutely minimum storage
AbstractAn algorithm is described which sorts n numbers in place with the property of stability, i.e., preserving the original order of equal elements. The algorithm requires absolutely minimum storage 0 (log2n) bits for program variables and a computation time at most 0 (n (log2n)2)
From Cooperative Scans to Predictive Buffer Management
In analytical applications, database systems often need to sustain workloads
with multiple concurrent scans hitting the same table. The Cooperative Scans
(CScans) framework, which introduces an Active Buffer Manager (ABM) component
into the database architecture, has been the most effective and elaborate
response to this problem, and was initially developed in the X100 research
prototype. We now report on the the experiences of integrating Cooperative
Scans into its industrial-strength successor, the Vectorwise database product.
During this implementation we invented a simpler optimization of concurrent
scan buffer management, called Predictive Buffer Management (PBM). PBM is based
on the observation that in a workload with long-running scans, the buffer
manager has quite a bit of information on the workload in the immediate future,
such that an approximation of the ideal OPT algorithm becomes feasible. In the
evaluation on both synthetic benchmarks as well as a TPC-H throughput run we
compare the benefits of naive buffer management (LRU) versus CScans, PBM and
OPT; showing that PBM achieves benefits close to Cooperative Scans, while
incurring much lower architectural impact.Comment: VLDB201
Efficient Aerial Data Collection with UAV in Large-Scale Wireless Sensor Networks
Data collection from deployed sensor networks can be with static sink, ground-based mobile sink, or Unmanned Aerial Vehicle (UAV) based mobile aerial data collector. Considering the large-scale sensor networks and peculiarity of the deployed environments, aerial data collection based on controllable UAV has more advantages. In this paper, we have designed a basic framework for aerial data collection, which includes the following five components: deployment of networks, nodes positioning, anchor points searching, fast path planning for UAV, and data collection from network. We have identified the key challenges in each of them and have proposed efficient solutions. This includes proposal of a Fast Path Planning with Rules (FPPWR) algorithm based on grid division, to increase the efficiency of path planning, while guaranteeing the length of the path to be relatively short. We have designed and implemented a simulation platform for aerial data collection from sensor networks and have validated performance efficiency of the proposed framework based on the following parameters: time consumption of the aerial data collection, flight path distance, and volume of collected data
Recommended from our members
A pipeline for targeted metagenomics of environmental bacteria.
BackgroundMetagenomics and single cell genomics provide a window into the genetic repertoire of yet uncultivated microorganisms, but both methods are usually taxonomically untargeted. The combination of fluorescence in situ hybridization (FISH) and fluorescence activated cell sorting (FACS) has the potential to enrich taxonomically well-defined clades for genomic analyses.MethodsCells hybridized with a taxon-specific FISH probe are enriched based on their fluorescence signal via flow cytometric cell sorting. A recently developed FISH procedure, the hybridization chain reaction (HCR)-FISH, provides the high signal intensities required for flow cytometric sorting while maintaining the integrity of the cellular DNA for subsequent genome sequencing. Sorted cells are subjected to shotgun sequencing, resulting in targeted metagenomes of low diversity.ResultsPure cultures of different taxonomic groups were used to (1) adapt and optimize the HCR-FISH protocol and (2) assess the effects of various cell fixation methods on both the signal intensity for cell sorting and the quality of subsequent genome amplification and sequencing. Best results were obtained for ethanol-fixed cells in terms of both HCR-FISH signal intensity and genome assembly quality. Our newly developed pipeline was successfully applied to a marine plankton sample from the North Sea yielding good quality metagenome assembled genomes from a yet uncultivated flavobacterial clade.ConclusionsWith the developed pipeline, targeted metagenomes at various taxonomic levels can be efficiently retrieved from environmental samples. The resulting metagenome assembled genomes allow for the description of yet uncharacterized microbial clades. Video abstract
Data Generation, Distribution & Management
BNP Paribas requires a high volume of calculations in order to support its front office. In order to perform those calculations in a more efficient way, BNP Paribas requested the implementation of a distributed system. The project outcome was a distributed system using the Oracle Coherence framework, utilizing .NET as the main development framework. The structure provided a flexible system of task distribution to be implemented at BNP Paribas
Efficient Management of Short-Lived Data
Motivated by the increasing prominence of loosely-coupled systems, such as
mobile and sensor networks, which are characterised by intermittent
connectivity and volatile data, we study the tagging of data with so-called
expiration times. More specifically, when data are inserted into a database,
they may be tagged with time values indicating when they expire, i.e., when
they are regarded as stale or invalid and thus are no longer considered part of
the database. In a number of applications, expiration times are known and can
be assigned at insertion time. We present data structures and algorithms for
online management of data tagged with expiration times. The algorithms are
based on fully functional, persistent treaps, which are a combination of binary
search trees with respect to a primary attribute and heaps with respect to a
secondary attribute. The primary attribute implements primary keys, and the
secondary attribute stores expiration times in a minimum heap, thus keeping a
priority queue of tuples to expire. A detailed and comprehensive experimental
study demonstrates the well-behavedness and scalability of the approach as well
as its efficiency with respect to a number of competitors.Comment: switched to TimeCenter latex styl
Recommended from our members
Analytical Query Execution Optimized for all Layers of Modern Hardware
Analytical database queries are at the core of business intelligence and decision support. To analyze the vast amounts of data available today, query execution needs to be orders of magnitude faster. Hardware advances have made a profound impact on database design and implementation. The large main memory capacity allows queries to execute exclusively in memory and shifts the bottleneck from disk access to memory bandwidth. In the new setting, to optimize query performance, databases must be aware of an unprecedented multitude of complicated hardware features. This thesis focuses on the design and implementation of highly efficient database systems by optimizing analytical query execution for all layers of modern hardware. The hardware layers include the network across multiple machines, main memory and the NUMA interconnection across multiple processors, the multiple levels of caches across multiple processor cores, and the execution pipeline within each core. For the network layer, we introduce a distributed join algorithm that minimizes the network traffic. For the memory hierarchy, we describe partitioning variants aware to the dynamics of the CPU caches and the NUMA interconnection. To improve the memory access rate of linear scans, we optimize lightweight compression variants and evaluate their trade-offs. To accelerate query execution within the core pipeline, we introduce advanced SIMD vectorization techniques generalizable across multiple operators. We evaluate our algorithms and techniques on both mainstream hardware and on many-integrated-core platforms, and combine our techniques in a new query engine design that can better utilize the features of many-core CPUs. In the era of hardware becoming increasingly parallel and datasets consistently growing in size, this thesis can serve as a compass for developing hardware-conscious databases with truly high-performance analytical query execution
- …