4,374 research outputs found
Case for holistic query evaluation
In this thesis we present the holistic query evaluation model. We propose a novel
query engine design that exploits the characteristics of modern processors when queries
execute inside main memory. The holistic model (a) is based on template-based code
generation for each executed query, (b) uses multithreading to adapt to multicore processor
architectures and (c) addresses the optimization problem of scheduling multiple
threads for intra-query parallelism.
Main-memory query execution is a usual operation in modern database servers
equipped with tens or hundreds of gigabytes of RAM. In such an execution environment,
the query engine needs to adapt to the CPU characteristics to boost performance.
For this purpose, holistic query evaluation applies customized code generation
to database query evaluation. The idea is to use a collection of highly efficient code
templates and dynamically instantiate them to create query- and hardware-specific
source code. The source code is compiled and dynamically linked to the database
server for processing. Code generation diminishes the bloat of higher-level programming
abstractions necessary for implementing generic, interpreted, SQL query engines.
At the same time, the generated code is customized for the hardware it will run on. The
holistic model supports the most frequently used query processing algorithms, namely
sorting, partitioning, join evaluation, and aggregation, thus allowing the efficient evaluation
of complex DSS or OLAP queries.
Modern CPUs follow multicore designs with multiple threads running in parallel.
The dataflow of query engine algorithms needs to be adapted to exploit such designs.
We identify memory accesses and thread synchronization as the main bottlenecks in
a multicore execution environment. We extend the holistic query evaluation model
and propose techniques to mitigate the impact of these bottlenecks on multithreaded
query evaluation. We analytically model the expected performance and scalability of
the proposed algorithms according to the hardware specifications. The analytical performance
expressions can be used by the optimizer to statically estimate the speedup
of multithreaded query execution.
Finally, we examine the problem of thread scheduling in the context of multithreaded
query evaluation on multicore CPUs. The search space for possible operator
execution schedules scales fast, thus forbidding the use of exhaustive techniques. We
model intra-query parallelism on multicore systems and present scheduling heuristics
that result in different degrees of schedule quality and optimization cost. We identify
cases where each of our proposed algorithms, or combinations of them, are expected
to generate schedules of high quality at an acceptable running cost
Optimization of patch antennas via multithreaded simulated annealing based design exploration
In this paper, we present a new software framework for the optimization of the design of microstrip patch antennas. The proposed simulation and optimization framework implements a simulated annealing algorithm to perform design space exploration in order to identify the optimal patch antenna design. During each iteration of the optimization loop, we employ the popular MEEP simulation tool to evaluate explored design solutions. To speed up the design space exploration, the software framework is developed to run multiple MEEP simulations concurrently. This is achieved using multithreading to implement a manager-workers execution strategy. The number of worker threads is the same as the number of cores of the computer that is utilized. Thus, the computational runtime of the proposed software framework enables effective design space exploration. Simulations demonstrate the effectiveness of the proposed software framework
A Fast Causal Profiler for Task Parallel Programs
This paper proposes TASKPROF, a profiler that identifies parallelism
bottlenecks in task parallel programs. It leverages the structure of a task
parallel execution to perform fine-grained attribution of work to various parts
of the program. TASKPROF's use of hardware performance counters to perform
fine-grained measurements minimizes perturbation. TASKPROF's profile execution
runs in parallel using multi-cores. TASKPROF's causal profile enables users to
estimate improvements in parallelism when a region of code is optimized even
when concrete optimizations are not yet known. We have used TASKPROF to isolate
parallelism bottlenecks in twenty three applications that use the Intel
Threading Building Blocks library. We have designed parallelization techniques
in five applications to in- crease parallelism by an order of magnitude using
TASKPROF. Our user study indicates that developers are able to isolate
performance bottlenecks with ease using TASKPROF.Comment: 11 page
A GPU-accelerated Branch-and-Bound Algorithm for the Flow-Shop Scheduling Problem
Branch-and-Bound (B&B) algorithms are time intensive tree-based exploration
methods for solving to optimality combinatorial optimization problems. In this
paper, we investigate the use of GPU computing as a major complementary way to
speed up those methods. The focus is put on the bounding mechanism of B&B
algorithms, which is the most time consuming part of their exploration process.
We propose a parallel B&B algorithm based on a GPU-accelerated bounding model.
The proposed approach concentrate on optimizing data access management to
further improve the performance of the bounding mechanism which uses large and
intermediate data sets that do not completely fit in GPU memory. Extensive
experiments of the contribution have been carried out on well known FSP
benchmarks using an Nvidia Tesla C2050 GPU card. We compared the obtained
performances to a single and a multithreaded CPU-based execution. Accelerations
up to x100 are achieved for large problem instances
Coz: Finding Code that Counts with Causal Profiling
Improving performance is a central concern for software developers. To locate
optimization opportunities, developers rely on software profilers. However,
these profilers only report where programs spent their time: optimizing that
code may have no impact on performance. Past profilers thus both waste
developer time and make it difficult for them to uncover significant
optimization opportunities.
This paper introduces causal profiling. Unlike past profiling approaches,
causal profiling indicates exactly where programmers should focus their
optimization efforts, and quantifies their potential impact. Causal profiling
works by running performance experiments during program execution. Each
experiment calculates the impact of any potential optimization by virtually
speeding up code: inserting pauses that slow down all other code running
concurrently. The key insight is that this slowdown has the same relative
effect as running that line faster, thus "virtually" speeding it up.
We present Coz, a causal profiler, which we evaluate on a range of
highly-tuned applications: Memcached, SQLite, and the PARSEC benchmark suite.
Coz identifies previously unknown optimization opportunities that are both
significant and targeted. Guided by Coz, we improve the performance of
Memcached by 9%, SQLite by 25%, and accelerate six PARSEC applications by as
much as 68%; in most cases, these optimizations involve modifying under 10
lines of code.Comment: Published at SOSP 2015 (Best Paper Award
- âŠ