2 research outputs found
Accelerating boosting-based face detection on GPUs
The goal of face detection is to determine the
presence of faces in arbitrary images, along with their locations
and dimensions. As it happens with any graphics workloads,
these algorithms benefit from data-level parallelism. Existing
parallelization efforts strictly focus on mapping different di-
vide and conquer strategies into multicore CPUs and GPUs.
However, even the most advanced single-chip many-core pro-
cessors to date are still struggling to effectively handle real-
time face detection under high-definition video workloads. To
address this challenge, face detection algorithms typically avoid
computations by dynamically evaluating a boosted cascade
of classifiers. Unfortunately, this technique yields a low ALU
occupancy in architectures such as GPUs, which heavily rely
on large SIMD widths for maximizing data-level parallelism.
In this paper we present several techniques to increase the
performance of the cascade evaluation kernel, which is the
most resource-intensive part of the face detection pipeline.
Particularly, the usage of concurrent kernel execution in
combination with cascades generated with the GentleBoost
algorithm solves the problem of GPU underutilization, and
achieves a 5X speedup in 1080p videos on average over
the fastest known implementations, while slightly improving
the accuracy. Finally, we also studied the parallelization of
the cascade training process and its scalability under SMP
platforms. The proposed parallelization strategy exploits both
task and data-level parallelism and achieves a 3.5X speedup
over single-threaded implementationsPeer Reviewe
Accelerating boosting-based face detection on GPUs
The goal of face detection is to determine the
presence of faces in arbitrary images, along with their locations
and dimensions. As it happens with any graphics workloads,
these algorithms benefit from data-level parallelism. Existing
parallelization efforts strictly focus on mapping different di-
vide and conquer strategies into multicore CPUs and GPUs.
However, even the most advanced single-chip many-core pro-
cessors to date are still struggling to effectively handle real-
time face detection under high-definition video workloads. To
address this challenge, face detection algorithms typically avoid
computations by dynamically evaluating a boosted cascade
of classifiers. Unfortunately, this technique yields a low ALU
occupancy in architectures such as GPUs, which heavily rely
on large SIMD widths for maximizing data-level parallelism.
In this paper we present several techniques to increase the
performance of the cascade evaluation kernel, which is the
most resource-intensive part of the face detection pipeline.
Particularly, the usage of concurrent kernel execution in
combination with cascades generated with the GentleBoost
algorithm solves the problem of GPU underutilization, and
achieves a 5X speedup in 1080p videos on average over
the fastest known implementations, while slightly improving
the accuracy. Finally, we also studied the parallelization of
the cascade training process and its scalability under SMP
platforms. The proposed parallelization strategy exploits both
task and data-level parallelism and achieves a 3.5X speedup
over single-threaded implementationsPeer Reviewe