9 research outputs found

    Irregular alignment of arbitrarily long DNA sequences on GPU

    Get PDF
    The use of Graphics Processing Units to accelerate computational applications is increasingly being adopted due to its affordability, flexibility and performance. However, achieving top performance comes at the price of restricted data-parallelism models. In the case of sequence alignment, most GPU-based approaches focus on accelerating the Smith-Waterman dynamic programming algorithm due to its regularity. Nevertheless, because of its quadratic complexity, it becomes impractical when comparing long sequences, and therefore heuristic methods are required to reduce the search space. We present GPUGECKO, a CUDA implementation for the sequential, seed-and-extend sequence-comparison algorithm, GECKO. Our proposal includes optimized kernels based on collective operations capable of producing arbitrarily long alignments while dealing with heterogeneous and unpredictable load. Contrary to other state-of-the-art methods, GPUGECKO employs a batching mechanism that prevents memory exhaustion by not requiring to fit all alignments at once into the device memory, therefore enabling to run massive comparisons exhaustively with improved sensitivity while also providing up to 6x average speedup w.r.t. the CUDA acceleration of BLASTN.Funding for open access publishing: Universidad Málaga/CBUA /// This work has been partially supported by the European project ELIXIR-EXCELERATE (grant no. 676559), the Spanish national project Plataforma de Recursos Biomoleculares y Bioinformáticos (ISCIII-PT13.0001.0012 and ISCIII-PT17.0009.0022), the Fondo Europeo de Desarrollo Regional (UMA18-FEDERJA-156, UMA20-FEDERJA-059), the Junta de Andalucía (P18-FR-3130), the Instituto de Investigación Biomédica de Málaga IBIMA and the University of Málaga

    Real-Time Unsupervised Object Localization on the Edge for Airport Video Surveillance.

    Get PDF
    Object localization is vital in computer vision to solve object detection or classification problems. Typically, this task is performed on expensive GPU devices, but edge computing is gaining importance in real-time applications. In this work, we propose a real-time implementation for unsupervised object localization using a low-power device for airport video surveillance. We automatically find regions of objects in video using a region proposal network (RPN) together with an optical flow region proposal (OFRP) based on optical flow maps between frames. In addition, we study the deployment of our solution on an embedded architecture, i.e. a Jetson AGX Xavier, using simultaneously CPU, GPU and specific hardware accelerators. Also, three different data representations (FP32, FP16 and INT8) are employed for the RPN. Obtained results show that optimizations can improve up to 4.1× energy consumption and 2.2× execution time while maintaining good accuracy with respect to the baseline model.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Multimodal features fusion for gait, gender and shoes recognition

    Get PDF
    This paper evaluates how fusing multimodal features (audio, RGB, and depth) can enhance the task of gait recognition, as well as gender and shoe recognition. While most previous research has focused on visual descriptors like binary silhouettes, little attention has been given to audio or depth data associated with walking. The proposed multimodal system is tested on the TUM GAID dataset, which includes audio, depth, and image sequences. Results show that combining features from these modalities using early or late fusion techniques improves state-of-the-art performance in gait, gender, and shoe recognition. Additional experiments on CASIA-B (which only includes visual data) further support the advantages of feature fusion

    Multimodal feature fusion for CNN-based gait recognition: an empirical comparison

    Get PDF
    This paper focuses on identifying people based on their gait using a non-invasive approach. Traditional methods rely on gait signatures derived from binary energy maps, which introduce noise. Instead, the authors explore the use of raw pixel data and compare different Convolutional Neural Network (CNN) architectures across three modalities: gray pixels, optical flow, and depth maps. Tested on the TUM-GAID and CASIA-B datasets, the study finds that (i) raw pixel values are competitive with traditional silhouette-based features, (ii) combining pixel data with optical flow and depth maps yields state-of-the-art results even at lower image resolutions, and (iii) the choice of CNN architecture significantly impacts performance

    Multimodal Human Pose Feature Fusion for Gait Recognition.

    Get PDF
    Gait recognition allows identifying people at a distance based on the way they walk (i.e. gait) in a non-invasive approach. Most of the approaches published in the last decades are dominated by the use of silhouettes or other appearance-based modalities to describe the Gait cycle. In an attempt to exclude the appearance data, many works have been published that address the use of the human pose as a modality to describe the walking movement. However, as the pose contains less information when used as a single modality, the performance achieved by the models is generally poorer. To overcome such limitations, we propose a multimodal setup that combines multiple pose representation models. To this end, we evaluate multiple fusion strategies to aggregate the features derived from each pose modality at every model stage. Moreover, we introduce a weighted sum with trainable weights that can adaptively learn the optimal balance among pose modalities. Our experimental results show that (a) our fusion strategies can effectively combine different pose modalities by improving their baseline performance; and, (b) by using only human pose, our approach outperforms most of the silhouette-based state-of-the-art approaches. Concretely, we obtain 92.8% mean Top-1 accuracy in CASIA-B.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    On how to improve tracklet-based gait recognition systems

    Get PDF
    Abstract Recently, short-term dense trajectories features (DTF) have shown state-of-the-art results in video recognition and retrieval. However, their use has not been extensively studied on the problem of gait recognition. Therefore, the goal of this work is to propose and evaluate diverse strategies to improve recognition performance in the task of gait recognition based on DTF. In particular, this paper will show that (i) the proposed RootDCS descriptor improves on DCS in most tested cases; (ii) selecting relevant trajectories in an automatic way improves the recognition performance in several situations; (iii) applying a metric learning technique to reduce dimensionality of feature vectors improves on standard PCA; and, (iv) binarization of low-dimensionality feature vectors not only reduces storage needs but also improves recognition performance in many cases. The experiments are carried out on the popular datasets CASIA, parts B and C, and TUM-GAID showing improvement on state-of-the-art results for most scenarios

    Energy-based tuning of convolution neural networks on multi-GPUs

    Get PDF
    Deep Learning (DL) applications are gaining momentum in the realm of Artificial Intelligence, particularly after GPUs have demonstrated remarkable skills for accelerating their challenging computational requirements. Within this context, Convolutional Neural Network (CNN) models constitute a representative example of success on a wide set of complex applications, particularly on datasets where the target can be represented through a hierarchy of local features of increas- ing semantic complexity. In most of the real scenarios, the roadmap to improve results relies on CNN settings involving brute force computation, and researchers have lately proven Nvidia GPUs to be one of the best hardware counterparts for acceleration. Our work complements those find- ings with an energy study on critical parameters for the deployment of CNNs on flagship image and video applications, ie, object recognition and people identification by gait, respectively. We evaluate energy consumption on four different networks based on the two most popular ones (ResNet/AlexNet), ie, ResNet (167 layers), a 2D CNN (15 layers), a CaffeNet (25 layers), and a ResNetIm (94 layers) using batch sizes of 64, 128, and 256, and then correlate those with speed-up and accuracy to determine optimal settings. Experimental results on a multi-GPU server endowed with twin Maxwell and twin Pascal Titan X GPUs demonstrate that energy correlates with per- formance and that Pascal may have up to 40% gains versus Maxwell. Larger batch sizes extend performance gains and energy savings, but we have to keep an eye on accuracy, which sometimes shows a preference for small batches. We expect this work to provide a preliminary guidance for a wide set of CNN and DL applications in modern HPC times, where the GFLOPS/w ratio constitutes the primary goal.Ministry of Education of Spain, Grant/Award Number: TIN2013-42253-P and TIN2016-78799-P; Consejería de Economía, Innovación, Ciencia y Empleo, Junta de Andalucía, Grant/Award Number: P12-TIC-1741 and TIC-169

    Concurrent Calculations on Reconfigurable Logic Devices Applied to the Analysis of Video Images

    Get PDF
    This paper presents the design and implementation on FPGA devices of an algorithm for computing similarities between neighboring frames in a video sequence using luminance information. By taking advantage of the well-known flexibility of Reconfigurable Logic Devices, we have designed a hardware implementation of the algorithm used in video segmentation and indexing. The experimental results show the tradeoff between concurrent sequential resources and the functional blocks needed to achieve maximum operational speed while achieving minimum silicon area usage. To evaluate system efficiency, we compare the performance of the hardware solution to that of calculations done via software using general-purpose processors with and without an SIMD instruction set

    A cross-dataset deep learning-based classifier for people fall detection and identification

    Get PDF
    This paper addresses the issue of fall detection, particularly for elderly individuals who may live alone and be unable to call for help after a fall. The objective is to develop a deep learning-based approach that can detect falls and identify individuals without needing model fine-tuning for different datasets. The proposed method uses a multi-task learning model that processes raw inertial data to simultaneously detect falls and identify people. The model achieves over 98% accuracy in fall detection across four datasets, with less than 1.6% false positives, and identifies people with an average accuracy of 79.6%. It operates in real-time, requiring no retraining for new subjects, making it suitable for practical implementation
    corecore