Search CORE

365 research outputs found

Non-Autoregressive Coarse-to-Fine Video Captioning

Author: Liu Fenglin
Yang Bang
Zhang Can
Zou Yuexian
Publication venue
Publication date: 24/03/2021
Field of study

It is encouraged to see that progress has been made to bridge videos and natural language. However, mainstream video captioning methods suffer from slow inference speed due to the sequential manner of autoregressive decoding, and prefer generating generic descriptions due to the insufficient training of visual words (e.g., nouns and verbs) and inadequate decoding paradigm. In this paper, we propose a non-autoregressive decoding based model with a coarse-to-fine captioning procedure to alleviate these defects. In implementations, we employ a bi-directional self-attention based network as our language model for achieving inference speedup, based on which we decompose the captioning procedure into two stages, where the model has different focuses. Specifically, given that visual words determine the semantic correctness of captions, we design a mechanism of generating visual words to not only promote the training of scene-related words but also capture relevant details from videos to construct a coarse-grained sentence "template". Thereafter, we devise dedicated decoding algorithms that fill in the "template" with suitable words and modify inappropriate phrasing via iterative refinement to obtain a fine-grained description. Extensive experiments on two mainstream video captioning benchmarks, i.e., MSVD and MSR-VTT, demonstrate that our approach achieves state-of-the-art performance, generates diverse descriptions, and obtains high inference efficiency. Our code is available at https://github.com/yangbang18/Non-Autoregressive-Video-Captioning.Comment: 9 pages, 6 figures, to be published in AAAI2021. Our code is available at https://github.com/yangbang18/Non-Autoregressive-Video-Captionin

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

[Lecture] Development of Human Capital in China

Author: Zhang Fenglin
張鳳林
Publication venue: 關西大学經済學會
Publication date: 31/10/1997
Field of study

講演：張, 鳳林、翻訳：鄒, 暁

Kansai University Repository

OpenPerf: A Benchmarking Framework for the Sustainable Development of the Open-Source Ecosystem

Author: Bi Fenglin
Han Fanyu
Li Jinlu
Wang Wei
Zhang Yanbin
Zhao Shengyu
Publication venue
Publication date: 26/11/2023
Field of study

Benchmarking involves designing scientific test methods, tools, and frameworks to quantitatively and comparably assess specific performance indicators of certain test subjects. With the development of artificial intelligence, AI benchmarking datasets such as ImageNet and DataPerf have gradually become consensus standards in both academic and industrial fields. However, constructing a benchmarking framework remains a significant challenge in the open-source domain due to the diverse range of data types, the wide array of research issues, and the intricate nature of collaboration networks. This paper introduces OpenPerf, a benchmarking framework designed for the sustainable development of the open-source ecosystem. This framework defines 9 task benchmarking tasks in the open-source research, encompassing 3 data types: time series, text, and graphics, and addresses 6 research problems including regression, classification, recommendation, ranking, network building, and anomaly detection. Based on the above tasks, we implemented 3 data science task benchmarks, 2 index-based benchmarks, and 1 standard benchmark. Notably, the index-based benchmarks have been adopted by the China Electronics Standardization Institute as evaluation criteria for open-source community governance. Additionally, we have developed a comprehensive toolkit for OpenPerf, which not only offers robust data management, tool integration, and user interface capabilities but also adopts a Benchmarking-as-a-Service (BaaS) model to serve academic institutions, industries, and foundations. Through its application in renowned companies and institutions such as Alibaba, Ant Group, and East China Normal University, we have validated OpenPerf's pivotal role in the healthy evolution of the open-source ecosystem

arXiv.org e-Print Archive

VNI-Net: Vector Neurons-based Rotation-Invariant Descriptor for LiDAR Place Recognition

Author: Cai Yingfeng
Mu Wenjie
Tian Gengxuan
Ye Chen
Zhang Fenglin
Zhao Junqiao
Publication venue
Publication date: 24/08/2023
Field of study

LiDAR-based place recognition plays a crucial role in Simultaneous Localization and Mapping (SLAM) and LiDAR localization. Despite the emergence of various deep learning-based and hand-crafting-based methods, rotation-induced place recognition failure remains a critical challenge. Existing studies address this limitation through specific training strategies or network structures. However, the former does not produce satisfactory results, while the latter focuses mainly on the reduced problem of SO(2) rotation invariance. Methods targeting SO(3) rotation invariance suffer from limitations in discrimination capability. In this paper, we propose a new method that employs Vector Neurons Network (VNN) to achieve SO(3) rotation invariance. We first extract rotation-equivariant features from neighboring points and map low-dimensional features to a high-dimensional space through VNN. Afterwards, we calculate the Euclidean and Cosine distance in the rotation-equivariant feature space as rotation-invariant feature descriptors. Finally, we aggregate the features using GeM pooling to obtain global descriptors. To address the significant information loss when formulating rotation-invariant descriptors, we propose computing distances between features at different layers within the Euclidean space neighborhood. This greatly improves the discriminability of the point cloud descriptors while ensuring computational efficiency. Experimental results on public datasets show that our approach significantly outperforms other baseline methods implementing rotation invariance, while achieving comparable results with current state-of-the-art place recognition methods that do not consider rotation issues

arXiv.org e-Print Archive