14 research outputs found
IMU Based Deep Stride Length Estimation With Self-Supervised Learning
Stride length estimation using inertial measurement unit (IMU) sensors is
getting popular recently as one representative gait parameter for health care
and sports training. The traditional estimation method requires some explicit
calibrations and design assumptions. Current deep learning methods suffer from
few labeled data problem. To solve above problems, this paper proposes a single
convolutional neural network (CNN) model to predict stride length of running
and walking and classify the running or walking type per stride. The model
trains its pretext task with self-supervised learning on a large unlabeled
dataset for feature learning, and its downstream task on the stride length
estimation and classification tasks with supervised learning with a small
labeled dataset. The proposed model can achieve better average percent error,
4.78\%, on running and walking stride length regression and 99.83\% accuracy on
running and walking classification, when compared to the previous approach,
7.44\% on the stride length estimation.Comment: 8 pages, 11 figure
Low Memory Bandwidth Prediction Method for H.264/AVC Scalable Video Extension
Memory bandwidth issue becomes more and more critical in designing video coding system especially in scalable video coding due to its extra inter-layer prediction. This paper proposes a low memory bandwidth prediction method for inter and inter-layer residual prediction. The proposed method combines two predictions into one prediction process and reuses its data for lowering memory bandwidth requirements. The simulation results show that 67% of memory bandwidth in enhancement layer can be reduced with negligible rate distortion loss.APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. 4-7 October 2009. Sapporo, Japan. Poster session: Advanced Circuits and Systems/VLSI (5 October 2009)
Memory Analysis for H.264/AVC Scalable Extension Decoder
In this paper, a systematic analysis for memory usage in H.264/AVC scalable extension (SVC) decoder is presented. This paper analyzes the memory requirements with three different decoding flows, macroblock, row and frame based, to find out a best method which can achieve optimal trade-off between internal memory usage and external memory access. The analysis shows that the SVC decoding needs 88% to 110% extra memory bandwidth compared to single layer H.264 decoding due to inter-layer prediction disregarding of decoding flow. However, extra internal memory storage by inter-layer prediction varies a lot according to the flow. This analysis could provide as a foundation to design a SVC decoder for further step.APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. 4-7 October 2009. Sapporo, Japan. Poster session: Advanced Circuits and Systems/VLSI (5 October 2009)
Low memory cost block-based belief propagation for stereo correspondence
The typical belief propagation has good accuracy for stereo correspondence but suffers from large run-time memory cost. In this paper, we propose a block-based belief propagation algorithm for stereo correspondence that partitions an image into regular blocks for optimization. With independently partitioned blocks, the required memory size could be reduced significantly by 99 % with slightly degraded performance with a 32x32 block size when compared to original one. Besides, such blocks are also suitable for parallel hardware implementation. Experimental results using Middlebury stereo test bed demonstrate the performance of the proposed method. 1
A Low Cost Context Adaptive Arithmetic Coder for H.264/MPEG-4 AVC Video Coding
This paper presents a fast and low cost context adaptive binary arithmetic encoder for H.264/MPEG-4 AVC video coding standard through both algorithm level and architecture level optimizations. First in the algorithm level, we process the binarization and context generation in parallel to reduce the encoding iteration cycles to three or four cycles from five cycles in the previous design. Second, in the architecture level, we reduce the cycles of renormalization loops by employing one-skipping and bit-parallelism, and save hardware cost of arithmetic coder by merging three different modes. The implemented design shows that it can achieve the 333MHz frequency with only 13.3K gate count
VSA: Reconfigurable Vectorwise Spiking Neural Network Accelerator
Spiking neural networks (SNNs) that enable low-power design on edge devices
have recently attracted significant research. However, the temporal
characteristic of SNNs causes high latency, high bandwidth and high energy
consumption for the hardware. In this work, we propose a binary weight spiking
model with IF-based Batch Normalization for small time steps and low hardware
cost when direct training with input encoding layer and spatio-temporal back
propagation (STBP). In addition, we propose a vectorwise hardware accelerator
that is reconfigurable for different models, inference time steps and even
supports the encoding layer to receive multi-bit input. The required memory
bandwidth is further reduced by two-layer fusion mechanism. The implementation
result shows competitive accuracy on the MNIST and CIFAR-10 datasets with only
8 time steps, and achieves power efficiency of 25.9 TOPS/W.Comment: 5 pages, 8 figures, published in IEEE ISCAS 202
PMRME: A Parallel Multi-Resolution Motion Estimation Algorithm and Architecture for HDTV Sized H.264 Video Coding
The paper presents a hardware-efficient fast algorithm and its architecture for large search range motion estimation (ME) used in HDTV sized H.264 video coding. To solve the high cost and latency in large search range case, the proposed algorithm processes ME in parallel multi-resolution levels instead of serial process in the previous approach. This enables high data reuse for lower bandwidth and low memory cost. Further combining with our previous proposed mode filtering and bit truncation, the algorithm only increases the bit rate within 1.85 % and 2.48 % and at most 0.04dB and 0.05dB PSNR degradation for 720p and 1080p sequences respectively. The hardware implementation can save up to 49.5 % of area cost and 65% of memory cost compared to the previous approach for large search range to [-128, 127]. Index Terms — H.264, motion estimation, multi