86 research outputs found

    Modeling the Resource Requirements of Convolutional Neural Networks on Mobile Devices

    Full text link
    Convolutional Neural Networks (CNNs) have revolutionized the research in computer vision, due to their ability to capture complex patterns, resulting in high inference accuracies. However, the increasingly complex nature of these neural networks means that they are particularly suited for server computers with powerful GPUs. We envision that deep learning applications will be eventually and widely deployed on mobile devices, e.g., smartphones, self-driving cars, and drones. Therefore, in this paper, we aim to understand the resource requirements (time, memory) of CNNs on mobile devices. First, by deploying several popular CNNs on mobile CPUs and GPUs, we measure and analyze the performance and resource usage for every layer of the CNNs. Our findings point out the potential ways of optimizing the performance on mobile devices. Second, we model the resource requirements of the different CNN computations. Finally, based on the measurement, pro ling, and modeling, we build and evaluate our modeling tool, Augur, which takes a CNN configuration (descriptor) as the input and estimates the compute time and resource usage of the CNN, to give insights about whether and how e ciently a CNN can be run on a given mobile platform. In doing so Augur tackles several challenges: (i) how to overcome pro ling and measurement overhead; (ii) how to capture the variance in different mobile platforms with different processors, memory, and cache sizes; and (iii) how to account for the variance in the number, type and size of layers of the different CNN configurations

    DFT-Spread Spectrally Overlapped Hybrid OFDM-Digital Filter Multiple Access IMDD PONs

    Get PDF
    A novel transmission technique—namely, a DFT-spread spectrally overlapped hybrid OFDM–digital filter multiple access (DFMA) PON based on intensity modulation and direct detection (IMDD)—is here proposed by employing the discrete Fourier transform (DFT)-spread technique in each optical network unit (ONU) and the optical line terminal (OLT). Detailed numerical simulations are carried out to identify optimal ONU transceiver parameters and explore their maximum achievable upstream transmission performances on the IMDD PON systems. The results show that the DFT-spread technique in the proposed PON is effective in enhancing the upstream transmission performance to its maximum potential, whilst still maintaining all of the salient features associated with previously reported PONs. Compared with previously reported PONs excluding DFT-spread, a significant peak-to-average power ratio (PAPR) reduction of over 2 dB is achieved, leading to a 1 dB reduction in the optimal signal clipping ratio (CR). As a direct consequence of the PAPR reduction, the proposed PON has excellent tolerance to reduced digital-to-analogue converter/analogue-to-digital converter (DAC/ADC) bit resolution, and can therefore ensure the utilization of a minimum DAC/ADC resolution of only 6 bits at the forward error correction (FEC) limit (1 × 10−3). In addition, the proposed PON can improve the upstream power budget by >1.4 dB and increase the aggregate upstream signal transmission rate by up to 10% without degrading nonlinearity tolerances

    Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition

    Full text link
    While dynamic Neural Radiance Fields (NeRF) have shown success in high-fidelity 3D modeling of talking portraits, the slow training and inference speed severely obstruct their potential usage. In this paper, we propose an efficient NeRF-based framework that enables real-time synthesizing of talking portraits and faster convergence by leveraging the recent success of grid-based NeRF. Our key insight is to decompose the inherently high-dimensional talking portrait representation into three low-dimensional feature grids. Specifically, a Decomposed Audio-spatial Encoding Module models the dynamic head with a 3D spatial grid and a 2D audio grid. The torso is handled with another 2D grid in a lightweight Pseudo-3D Deformable Module. Both modules focus on efficiency under the premise of good rendering quality. Extensive experiments demonstrate that our method can generate realistic and audio-lips synchronized talking portrait videos, while also being highly efficient compared to previous methods.Comment: Project page: https://me.kiui.moe/radnerf
    • …
    corecore