Search CORE

21 research outputs found

STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset

Author: Shigeto Yutaro
Takeuchi Akikazu
Yoshikawa Yuya
Publication venue
Publication date: 01/01/2017
Field of study

In recent years, automatic generation of image descriptions (captions), that is, image captioning, has attracted a great deal of attention. In this paper, we particularly consider generating Japanese captions for images. Since most available caption datasets have been constructed for English language, there are few datasets for Japanese. To tackle this problem, we construct a large-scale Japanese image caption dataset based on images from MS-COCO, which is called STAIR Captions. STAIR Captions consists of 820,310 Japanese captions for 164,062 images. In the experiment, we show that a neural network trained using STAIR Captions can generate more natural and better Japanese captions, compared to those generated using English-Japanese machine translation after generating English captions.Comment: Accepted as ACL2017 short paper. 5 page

arXiv.org e-Print Archive

Crossref

Learning Decorrelated Representations Efficiently Using Fast Fourier Transform

Author: Shigeto Yutaro
Shimbo Masashi
Takeuchi Akikazu
Yoshikawa Yuya
Publication venue
Publication date: 01/06/2023
Field of study

Barlow Twins and VICReg are self-supervised representation learning models that use regularizers to decorrelate features. Although these models are as effective as conventional representation learning models, their training can be computationally demanding if the dimension d of the projected embeddings is high. As the regularizers are defined in terms of individual elements of a cross-correlation or covariance matrix, computing the loss for n samples takes O(n d^2) time. In this paper, we propose a relaxed decorrelating regularizer that can be computed in O(n d log d) time by Fast Fourier Transform. We also propose an inexpensive technique to mitigate undesirable local minima that develop with the relaxation. The proposed regularizer exhibits accuracy comparable to that of existing regularizers in downstream tasks, whereas their training requires less memory and is faster for large d. The source code is available.Comment: Accepted for CVPR 202

arXiv.org e-Print Archive

Action Class Relation Detection and Classification Across Multiple Video Datasets

Author: Shigeto Yutaro
Shimbo Masashi
Takeuchi Akikazu
Yoshikawa Yuya
Publication venue
Publication date: 14/08/2023
Field of study

The Meta Video Dataset (MetaVD) provides annotated relations between action classes in major datasets for human action recognition in videos. Although these annotated relations enable dataset augmentation, it is only applicable to those covered by MetaVD. For an external dataset to enjoy the same benefit, the relations between its action classes and those in MetaVD need to be determined. To address this issue, we consider two new machine learning tasks: action class relation detection and classification. We propose a unified model to predict relations between action classes, using language and visual information associated with classes. Experimental results show that (i) pre-trained recent neural network models for texts and videos contribute to high predictive performance, (ii) the relation prediction based on action label texts is more accurate than based on videos, and (iii) a blending approach that combines predictions by both modalities can further improve the predictive performance in some cases.Comment: Accepted to Pattern Recognition Letters. 12 pages, 4 figure

arXiv.org e-Print Archive

Paralell logic programming

Author: Akikazu TAKEUCHI
Publication venue: john wiley & sons. inc.
Publication date
Field of study

Open Library

Bounded buffer communication in Concurrent Prolog

Author: Akikazu Takeuchi
Koichi Furukawa
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Object oriented programming in Concurrent Prolog

Author: Akikazu Takeuchi
Ehud Shapiro
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref