863 research outputs found
Accelerating Nearest Neighbor Search on Manycore Systems
We develop methods for accelerating metric similarity search that are
effective on modern hardware. Our algorithms factor into easily parallelizable
components, making them simple to deploy and efficient on multicore CPUs and
GPUs. Despite the simple structure of our algorithms, their search performance
is provably sublinear in the size of the database, with a factor dependent only
on its intrinsic dimensionality. We demonstrate that our methods provide
substantial speedups on a range of datasets and hardware platforms. In
particular, we present results on a 48-core server machine, on graphics
hardware, and on a multicore desktop
Trip-Based Public Transit Routing
We study the problem of computing all Pareto-optimal journeys in a public
transit network regarding the two criteria of arrival time and number of
transfers taken. We take a novel approach, focusing on trips and transfers
between them, allowing fine-grained modeling. Our experiments on the
metropolitan network of London show that the algorithm computes full 24-hour
profiles in 70 ms after a preprocessing phase of 30 s, allowing fast queries in
dynamic scenarios.Comment: Minor corrections, no substantial changes. To be presented at ESA
201
Discriminative Scale Space Tracking
Accurate scale estimation of a target is a challenging research problem in
visual object tracking. Most state-of-the-art methods employ an exhaustive
scale search to estimate the target size. The exhaustive search strategy is
computationally expensive and struggles when encountered with large scale
variations. This paper investigates the problem of accurate and robust scale
estimation in a tracking-by-detection framework. We propose a novel scale
adaptive tracking approach by learning separate discriminative correlation
filters for translation and scale estimation. The explicit scale filter is
learned online using the target appearance sampled at a set of different
scales. Contrary to standard approaches, our method directly learns the
appearance change induced by variations in the target scale. Additionally, we
investigate strategies to reduce the computational cost of our approach.
Extensive experiments are performed on the OTB and the VOT2014 datasets.
Compared to the standard exhaustive scale search, our approach achieves a gain
of 2.5% in average overlap precision on the OTB dataset. Additionally, our
method is computationally efficient, operating at a 50% higher frame rate
compared to the exhaustive scale search. Our method obtains the top rank in
performance by outperforming 19 state-of-the-art trackers on OTB and 37
state-of-the-art trackers on VOT2014.Comment: To appear in TPAMI. This is the journal extension of the
VOT2014-winning DSST tracking metho
Dynamic re-optimization techniques for stream processing engines and object stores
Large scale data storage and processing systems are strongly motivated by the need to store and analyze massive datasets. The complexity of a large class of these systems is rooted in their distributed nature, extreme scale, need for real-time response, and streaming nature. The use of these systems on multi-tenant, cloud environments with potential resource interference necessitates fine-grained monitoring and control. In this dissertation, we present efficient, dynamic techniques for re-optimizing stream-processing systems and transactional object-storage systems.^ In the context of stream-processing systems, we present VAYU, a per-topology controller. VAYU uses novel methods and protocols for dynamic, network-aware tuple-routing in the dataflow. We show that the feedback-driven controller in VAYU helps achieve high pipeline throughput over long execution periods, as it dynamically detects and diagnoses any pipeline-bottlenecks. We present novel heuristics to optimize overlays for group communication operations in the streaming model.^ In the context of object-storage systems, we present M-Lock, a novel lock-localization service for distributed transaction protocols on scale-out object stores to increase transaction throughput. Lock localization refers to dynamic migration and partitioning of locks across nodes in the scale-out store to reduce cross-partition acquisition of locks. The service leverages the observed object-access patterns to achieve lock-clustering and deliver high performance. We also present TransMR, a framework that uses distributed, transactional object stores to orchestrate and execute asynchronous components in amorphous data-parallel applications on scale-out architectures
Architecture aware parallel programming in Glasgow parallel Haskell (GPH)
General purpose computing architectures are evolving quickly to become manycore
and hierarchical: i.e. a core can communicate more quickly locally than
globally. To be effective on such architectures, programming models must be
aware of the communications hierarchy. This thesis investigates a programming
model that aims to share the responsibility of task placement, load balance, thread
creation, and synchronisation between the application developer and the runtime
system.
The main contribution of this thesis is the development of four new architectureaware
constructs for Glasgow parallel Haskell that exploit information about task
size and aim to reduce communication for small tasks, preserve data locality, or to
distribute large units of work. We define a semantics for the constructs that specifies the sets of PEs that each construct identifies, and we check four properties
of the semantics using QuickCheck.
We report a preliminary investigation of architecture aware programming
models that abstract over the new constructs. In particular, we propose architecture
aware evaluation strategies and skeletons. We investigate three common
paradigms, such as data parallelism, divide-and-conquer and nested parallelism,
on hierarchical architectures with up to 224 cores. The results show that the
architecture-aware programming model consistently delivers better speedup and
scalability than existing constructs, together with a dramatic reduction in the
execution time variability.
We present a comparison of functional multicore technologies and it reports
some of the first ever multicore results for the Feedback Directed Implicit Parallelism
(FDIP) and the semi-explicit parallelism (GpH and Eden) languages. The
comparison reflects the growing maturity of the field by systematically evaluating
four parallel Haskell implementations on a common multicore architecture.
The comparison contrasts the programming effort each language requires with
the parallel performance delivered.
We investigate the minimum thread granularity required to achieve satisfactory
performance for three implementations parallel functional language on a
multicore platform. The results show that GHC-GUM requires a larger thread
granularity than Eden and GHC-SMP. The thread granularity rises as the number
of cores rises
- …