18 research outputs found
NICE 2023 Zero-shot Image Captioning Challenge
In this report, we introduce NICE
project\footnote{\url{https://nice.lgresearch.ai/}} and share the results and
outcomes of NICE challenge 2023. This project is designed to challenge the
computer vision community to develop robust image captioning models that
advance the state-of-the-art both in terms of accuracy and fairness. Through
the challenge, the image captioning models were tested using a new evaluation
dataset that includes a large variety of visual concepts from many domains.
There was no specific training data provided for the challenge, and therefore
the challenge entries were required to adapt to new types of image descriptions
that had not been seen during training. This report includes information on the
newly proposed NICE dataset, evaluation methods, challenge results, and
technical details of top-ranking entries. We expect that the outcomes of the
challenge will contribute to the improvement of AI models on various
vision-language tasks.Comment: Tech report, project page https://nice.lgresearch.ai
Hardware-Friendly Compression Methods for Low-Cost Deep Neural Networks
Department of Electrical Engineeringclos
Design Space Exploration of FPGA Accelerators for Convolutional Neural Networks
The increasing use of machine learning algorithms, such as Convolutional Neural Networks (CNNs), makes the hardware accelerator approach very compelling. However the question of how to best design an accelerator for a given CNN has not been answered yet, even on a very fundamental level. This paper addresses that challenge, by providing a novel framework that can universally and accurately evaluate and explore various architectural choices for CNN accelerators on FPGAs. Our exploration framework is more extensive than that of any previous work in terms of the design space, and takes into account various FPGA resources to maximize performance including DSP resources, on-chip memory, and off-chip memory bandwidth. Our experimental results using some of the largest CNN models including one that has 16 convolutional layers demonstrate the efficacy of our framework, as well as the need for such a high-level architecture exploration approach to find the best architecture for a CNN model
RRNet: Repetition-Reduction Network for Energy Efficient Depth Estimation
Lightweight neural networks that employ depthwise convolution have a significant computational advantage over those that use standard convolution because they involve fewer parameters; however, they also require more time, even with graphics processing units (GPUs). We propose a Repetition-Reduction Network (RRNet) in which the number of depthwise channels is large enough to reduce computation time while simultaneously being small enough to reduce GPU latency. RRNet also reduces power consumption and memory usage, not only in the encoder but also in the residual connections to the decoder. We apply RRNet to the problem of resource-constrained depth estimation, where it proves to be significantly more efficient than other methods in terms of energy consumption, memory usage, and computation. It has two key modules: the Repetition-Reduction (RR) block, which is a set of repeated lightweight convolutions that can be used for feature extraction in the encoder, and the Condensed Decoding Connection (CDC), which can replace the skip connection, delivering features to the decoder while significantly reducing the channel depth of the decoder layers. Experimental results on the KITTI dataset show that RRNet consumes less energy and less memory than conventional schemes, and that it is faster on a commercial mobile GPU without increasing the demand on hardware resources relative to the baseline network. Furthermore, RRNet outperforms state-of-the-art lightweight models such as MobileNets, PyDNet, DiCENet, DABNet, and EfficientNet