274 research outputs found
Parallel waveform extraction algorithms for the Cherenkov Telescope Array Real-Time Analysis
The Cherenkov Telescope Array (CTA) is the next generation observatory for
the study of very high-energy gamma rays from about 20 GeV up to 300 TeV.
Thanks to the large effective area and field of view, the CTA observatory will
be characterized by an unprecedented sensitivity to transient flaring gamma-ray
phenomena compared to both current ground (e.g. MAGIC, VERITAS, H.E.S.S.) and
space (e.g. Fermi) gamma-ray telescopes. In order to trigger the astrophysics
community for follow-up observations, or being able to quickly respond to
external science alerts, a fast analysis pipeline is crucial. This will be
accomplished by means of a Real-Time Analysis (RTA) pipeline, a fast and
automated science alert trigger system, becoming a key system of the CTA
observatory. Among the CTA design key requirements to the RTA system, the most
challenging is the generation of alerts within 30 seconds from the last
acquired event, while obtaining a flux sensitivity not worse than the one of
the final analysis by more than a factor of 3. A dedicated software and
hardware architecture for the RTA pipeline must be designed and tested. We
present comparison of OpenCL solutions using different kind of devices like
CPUs, Graphical Processing Unit (GPU) and Field Programmable Array (FPGA) cards
for the Real-Time data reduction of the Cherenkov Telescope Array (CTA)
triggered data.Comment: In Proceedings of the 34th International Cosmic Ray Conference
(ICRC2015), The Hague, The Netherlands. All CTA contributions at
arXiv:1508.0589
Real-Time Hand Shape Classification
The problem of hand shape classification is challenging since a hand is
characterized by a large number of degrees of freedom. Numerous shape
descriptors have been proposed and applied over the years to estimate and
classify hand poses in reasonable time. In this paper we discuss our parallel
framework for real-time hand shape classification applicable in real-time
applications. We show how the number of gallery images influences the
classification accuracy and execution time of the parallel algorithm. We
present the speedup and efficiency analyses that prove the efficacy of the
parallel implementation. Noteworthy, different methods can be used at each step
of our parallel framework. Here, we combine the shape contexts with the
appearance-based techniques to enhance the robustness of the algorithm and to
increase the classification score. An extensive experimental study proves the
superiority of the proposed approach over existing state-of-the-art methods.Comment: 11 page
A Cascade Neural Network Architecture investigating Surface Plasmon Polaritons propagation for thin metals in OpenMP
Surface plasmon polaritons (SPPs) confined along metal-dielectric interface
have attracted a relevant interest in the area of ultracompact photonic
circuits, photovoltaic devices and other applications due to their strong field
confinement and enhancement. This paper investigates a novel cascade neural
network (NN) architecture to find the dependance of metal thickness on the SPP
propagation. Additionally, a novel training procedure for the proposed cascade
NN has been developed using an OpenMP-based framework, thus greatly reducing
training time. The performed experiments confirm the effectiveness of the
proposed NN architecture for the problem at hand
Sentetik açıklıklı radar görüntülerinde alan tabanlı hedef tespiti ve paralel gerçekleştirmesi (Region based target detection in synthetic aperture radar images and its parallel implementation)
Sentetik açıklıklı radar (SAR) görüntülerinde otomatik hedef tespiti yöntemleri görüntünün çözünürlüğüne, hedefin büyüklüğüne, parazit yankı karmaşıklığına ve benek gürültü seviyesine duyarlıdır. Gürbüz bir hedef tespiti yönteminin ise bu tür etkenlere daha az duyarlı olması istenir. Önerilen yöntem görüntünün öznitelik korumalı benek gürültü arındırma (feature preserving despeckling, FPD) yönteminden geçmiş hali üzerinden olası hedef bölgelerinin ve etrafındaki parazit yankı karmaşıklığının bulunması ve sabit yanlış alarm oranı elde edilecek şekilde eşiklenmesi esasına dayanmaktadır. Hesaplama verimliği OpenMP ve NVidia CUDA kullanılarak arttırılmış ve elde edilen hızlanmalar gösterilmiştir
Performance enhancement of an immersed boundary method based FSI solver using OpenMP
This work presents a high-fidelity in-house Fluid Structure Interaction (FSI) solver devel- oped by combining discrete forcing Immersed Boundary Method (IBM) with a RK-4 based structural solver. Classification of the grid points as fluid, solid and IB points in the IBM framework and the solution of the pressure correction equations are the two most computationally expensive section in the numerical solver. These computational efforts can be significantly reduced by implementing OpenMP techniques. However, the successive over-relaxation (SOR) iterative method used in the serial code is not suitable for OpenMP parallelization as it shows data dependencies from previous iterations. Therefore, the Red-Black (RB) SOR is implemented to avoid the data dependencies
Эффективная реализация ЕМ-алгоритма с использованием технологии GPGPU
У статті розглядається модифікація алгоритму максимізації математичного сподівання (ЕМ-алгоритму) для підвищення його швидкодії за допомогою збільшення ступеня паралелізму при реалізації на графічному процесорі. Результат забезпечується розв’язанням класичної задачі розділення суміші гауссових випадкових величин. Реалізація алгоритму була виконана на одному і двох 8-ядерних процесорах, а також на графічному процесорі загального призначення. У всіх тестах графічний процесор за рахунок своїх значних можливостей з паралельних обчислень та через властивості виконуваного ЕМ-алгоритму виявився більш ефективним. А за великих обсягів вибірок (від 5 млн значень і більше) модифікований ЕМ-алгоритм на графічному процесорі показав практично в два рази швидше виконання, ніж на одному або двох універсальних процесорах. З урахуванням меншої вартості графічних процесорів підвищення паралелізму алгоритмів має важливе практичне значення.The problem of decreasing of running time for the data processing algorithms is very important especially when they are used in real time. For example, in real time image processing, process control systems, speech recognition, etc. The paper considers the possibility of decreasing running time of the expectation maximization (EM) algorithm using modern computing systems. The proposed modified EM-algorithm is aimed at better parallelism for the general purpose graphical processing unit (GPGPU).The experimental results are obtained with solving of the classical problem of Gaussian random variables mixture separation. The proposed implementation of the algorithm was performed on one and two 8-core processor (CPU) setup, as well as on the general purpose graphical processing unit. The graphics processor, because of its abilities for parallel computations and due to the properties of the EM-algorithm considered, showed substantially higher effectiveness in all the computational experiments. Besides, the modified EM-algorithm showed almost two times faster performance on GPGPU than on one or two CPU using large sample sizes (from 5 million values and higher). The lower price of graphics processor is an additional advantage of the approach proposed for such parallel algorithms and GPGPU usage.В статье рассматривается модификация алгоритма максимизации математического ожидания (ЕМ-алгоритма) для повышения его быстродействия за счет увеличения степени параллелизма при реализации на графическом процессоре. Результат обеспечивается решением классической задачи разделения смеси гауссовых случайных величин. Реализация алгоритма была выполнена на одном и двух 8-ядерных процессорах, а также на графическом процессоре общего назначения. Во всех тестах графический процессор за счет своих широких возможностей по параллельным вычислениям и за счет свойств исполняемого ЕМ-алгоритма оказался более эффективным. А при больших объемах выборок (от 5 млн значений и более) модифицированный ЕМ-алгоритм на графическом процессоре показал выполнение практически в два раза быстрее, чем на одном или двух универсальных. С учетом более низкой стоимости графических процессоров повышение параллелизма алгоритмов имеет важное практическое значение
- …