Search CORE

3 research outputs found

Towards Visual Syntactical Understanding

Author: Chandra Soumyadeep
Chowdhury Sayeed Shafayet
Roy Kaushik
Publication venue
Publication date: 30/01/2024
Field of study

Syntax is usually studied in the realm of linguistics and refers to the arrangement of words in a sentence. Similarly, an image can be considered as a visual 'sentence', with the semantic parts of the image acting as 'words'. While visual syntactic understanding occurs naturally to humans, it is interesting to explore whether deep neural networks (DNNs) are equipped with such reasoning. To that end, we alter the syntax of natural images (e.g. swapping the eye and nose of a face), referred to as 'incorrect' images, to investigate the sensitivity of DNNs to such syntactic anomaly. Through our experiments, we discover an intriguing property of DNNs where we observe that state-of-the-art convolutional neural networks, as well as vision transformers, fail to discriminate between syntactically correct and incorrect images when trained on only correct ones. To counter this issue and enable visual syntactic understanding with DNNs, we propose a three-stage framework- (i) the 'words' (or the sub-features) in the image are detected, (ii) the detected words are sequentially masked and reconstructed using an autoencoder, (iii) the original and reconstructed parts are compared at each location to determine syntactic correctness. The reconstruction module is trained with BERT-like masked autoencoding for images, with the motivation to leverage language model inspired training to better capture the syntax. Note, our proposed approach is unsupervised in the sense that the incorrect images are only used during testing and the correct versus incorrect labels are never used for training. We perform experiments on CelebA, and AFHQ datasets and obtain classification accuracy of 92.10%, and 90.89%, respectively. Notably, the approach generalizes well to ImageNet samples which share common classes with CelebA and AFHQ without explicitly training on them

arXiv.org e-Print Archive

Application of stochastic grammars to understanding action

Author: Ivanov Yuri A., 1967-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1998
Field of study

Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1998.Includes bibliographical references (leaves 69-72).by Yuri A. Ivanov.M.S

CiteSeerX

DSpace@MIT

A paradigm for semantic picture recognition

Author: Baird Michael Lloyd
Publication venue: Georgia Institute of Technology
Publication date: 01/05/1973
Field of study

Ph.D.Michael D. Kell

Scholarly Materials And Research @ Georgia Tech