14 research outputs found

    IMU Based Deep Stride Length Estimation With Self-Supervised Learning

    Full text link
    Stride length estimation using inertial measurement unit (IMU) sensors is getting popular recently as one representative gait parameter for health care and sports training. The traditional estimation method requires some explicit calibrations and design assumptions. Current deep learning methods suffer from few labeled data problem. To solve above problems, this paper proposes a single convolutional neural network (CNN) model to predict stride length of running and walking and classify the running or walking type per stride. The model trains its pretext task with self-supervised learning on a large unlabeled dataset for feature learning, and its downstream task on the stride length estimation and classification tasks with supervised learning with a small labeled dataset. The proposed model can achieve better average percent error, 4.78\%, on running and walking stride length regression and 99.83\% accuracy on running and walking classification, when compared to the previous approach, 7.44\% on the stride length estimation.Comment: 8 pages, 11 figure

    Low Memory Bandwidth Prediction Method for H.264/AVC Scalable Video Extension

    Get PDF
    Memory bandwidth issue becomes more and more critical in designing video coding system especially in scalable video coding due to its extra inter-layer prediction. This paper proposes a low memory bandwidth prediction method for inter and inter-layer residual prediction. The proposed method combines two predictions into one prediction process and reuses its data for lowering memory bandwidth requirements. The simulation results show that 67% of memory bandwidth in enhancement layer can be reduced with negligible rate distortion loss.APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. 4-7 October 2009. Sapporo, Japan. Poster session: Advanced Circuits and Systems/VLSI (5 October 2009)

    Memory Analysis for H.264/AVC Scalable Extension Decoder

    Get PDF
    In this paper, a systematic analysis for memory usage in H.264/AVC scalable extension (SVC) decoder is presented. This paper analyzes the memory requirements with three different decoding flows, macroblock, row and frame based, to find out a best method which can achieve optimal trade-off between internal memory usage and external memory access. The analysis shows that the SVC decoding needs 88% to 110% extra memory bandwidth compared to single layer H.264 decoding due to inter-layer prediction disregarding of decoding flow. However, extra internal memory storage by inter-layer prediction varies a lot according to the flow. This analysis could provide as a foundation to design a SVC decoder for further step.APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. 4-7 October 2009. Sapporo, Japan. Poster session: Advanced Circuits and Systems/VLSI (5 October 2009)

    Low memory cost block-based belief propagation for stereo correspondence

    No full text
    The typical belief propagation has good accuracy for stereo correspondence but suffers from large run-time memory cost. In this paper, we propose a block-based belief propagation algorithm for stereo correspondence that partitions an image into regular blocks for optimization. With independently partitioned blocks, the required memory size could be reduced significantly by 99 % with slightly degraded performance with a 32x32 block size when compared to original one. Besides, such blocks are also suitable for parallel hardware implementation. Experimental results using Middlebury stereo test bed demonstrate the performance of the proposed method. 1

    A Low Cost Context Adaptive Arithmetic Coder for H.264/MPEG-4 AVC Video Coding

    No full text
    This paper presents a fast and low cost context adaptive binary arithmetic encoder for H.264/MPEG-4 AVC video coding standard through both algorithm level and architecture level optimizations. First in the algorithm level, we process the binarization and context generation in parallel to reduce the encoding iteration cycles to three or four cycles from five cycles in the previous design. Second, in the architecture level, we reduce the cycles of renormalization loops by employing one-skipping and bit-parallelism, and save hardware cost of arithmetic coder by merging three different modes. The implemented design shows that it can achieve the 333MHz frequency with only 13.3K gate count

    VSA: Reconfigurable Vectorwise Spiking Neural Network Accelerator

    Full text link
    Spiking neural networks (SNNs) that enable low-power design on edge devices have recently attracted significant research. However, the temporal characteristic of SNNs causes high latency, high bandwidth and high energy consumption for the hardware. In this work, we propose a binary weight spiking model with IF-based Batch Normalization for small time steps and low hardware cost when direct training with input encoding layer and spatio-temporal back propagation (STBP). In addition, we propose a vectorwise hardware accelerator that is reconfigurable for different models, inference time steps and even supports the encoding layer to receive multi-bit input. The required memory bandwidth is further reduced by two-layer fusion mechanism. The implementation result shows competitive accuracy on the MNIST and CIFAR-10 datasets with only 8 time steps, and achieves power efficiency of 25.9 TOPS/W.Comment: 5 pages, 8 figures, published in IEEE ISCAS 202

    PMRME: A Parallel Multi-Resolution Motion Estimation Algorithm and Architecture for HDTV Sized H.264 Video Coding

    No full text
    The paper presents a hardware-efficient fast algorithm and its architecture for large search range motion estimation (ME) used in HDTV sized H.264 video coding. To solve the high cost and latency in large search range case, the proposed algorithm processes ME in parallel multi-resolution levels instead of serial process in the previous approach. This enables high data reuse for lower bandwidth and low memory cost. Further combining with our previous proposed mode filtering and bit truncation, the algorithm only increases the bit rate within 1.85 % and 2.48 % and at most 0.04dB and 0.05dB PSNR degradation for 720p and 1080p sequences respectively. The hardware implementation can save up to 49.5 % of area cost and 65% of memory cost compared to the previous approach for large search range to [-128, 127]. Index Terms — H.264, motion estimation, multi
    corecore