31 research outputs found

    Feature Fusion Effects of Tensor Product Representation on (De)Compositional Network for Caption Generation for Images

    Full text link
    Progress in image captioning is gradually getting complex as researchers try to generalized the model and define the representation between visual features and natural language processing. This work tried to define such kind of relationship in the form of representation called Tensor Product Representation (TPR) which generalized the scheme of language modeling and structuring the linguistic attributes (related to grammar and parts of speech of language) which will provide a much better structure and grammatically correct sentence. TPR enables better and unique representation and structuring of the feature space and will enable better sentence composition from these representations. A large part of the different ways of defining and improving these TPR are discussed and their performance with respect to the traditional procedures and feature representations are evaluated for image captioning application. The new models achieved considerable improvement than the corresponding previous architectures.Comment: 7 page

    Angular Correlation in Double Photoionization of Atoms and the Role of the Observer

    Full text link
    The problem of angular correlation in the double photoionization (DPI) of rare gas atoms is considered in some depth. We refer particularly to the efficiency operator for the detection of an electron by a detector having the shape of a right circular cylinder. The different factors in the efficiency operator are discussed in detail keeping in mind the fundamental epistemological question of the role of the observer (or his equipment) in such experiments.Comment: 4 pages, late

    An angular correlation theory for double photoionization in a rare gas atom

    Full text link
    We consider the process of double photoionization (DPI) in a rare gas atom as a two-step process, namely (i) photoionization in an inner shell followed by (ii) the emission of an Auger electron from an outer shell. An angular correlation function for the two emitted electrons is defined by analogy with the theory of angular correlation in nuclear physics. An expression is obtained for this angular correlation function by a statistical method which makes use of the density and efficiency operators. The latter takes care of the attenuation of the probability of detection of an electrons due to the geometrical properties of the detector. Theoretical values of the angular correlation function are obtained for DPI in xenon and these are shown to be in good agreement with the experimental results given by K\"{a}mmerling and Schmidt

    Green Heron Swarm Optimization Algorithm - State-of-the-Art of a New Nature Inspired Discrete Meta-Heuristics

    Full text link
    Many real world problems are NP-Hard problems are a very large part of them can be represented as graph based problems. This makes graph theory a very important and prevalent field of study. In this work a new bio-inspired meta-heuristics called Green Heron Swarm Optimization (GHOSA) Algorithm is being introduced which is inspired by the fishing skills of the bird. The algorithm basically suited for graph based problems like combinatorial optimization etc. However introduction of an adaptive mathematical variation operator called Location Based Neighbour Influenced Variation (LBNIV) makes it suitable for high dimensional continuous domain problems. The new algorithm is being operated on the traditional benchmark equations and the results are compared with Genetic Algorithm and Particle Swarm Optimization. The algorithm is also operated on Travelling Salesman Problem, Quadratic Assignment Problem, Knapsack Problem dataset. The procedure to operate the algorithm on the Resource Constraint Shortest Path and road network optimization is also discussed. The results clearly demarcates the GHOSA algorithm as an efficient algorithm specially considering that the number of algorithms for the discrete optimization is very low and robust and more explorative algorithm is required in this age of social networking and mostly graph based problem scenarios.Comment: 20 pages, Pre-print copy, submitted to a peer reviewed journa

    Angular Correlation in Double photoionization of Atoms and the Role of the Detection Process

    Full text link
    The problem of angular correlation in the double photoionization (DPI) of rare gas atoms is considered in some depth. We refer particularly to the efficiency operator for the detection of an electron by a detector having cylindrical symmetry. The different factors in the efficiency operator are discussed in detail keeping in mind the fundamental epistemological question of the role of the detection process in such experiments.Comment: 5pages, Revte

    Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO Framework

    Full text link
    In this work, we have introduced Gaussian Smoothen Semantic Features (GSSF) for Better Semantic Selection for Indian regional language-based image captioning and introduced a procedure where we used the existing translation and English crowd-sourced sentences for training. We have shown that this architecture is a promising alternative source, where there is a crunch in resources. Our main contribution of this work is the development of deep learning architectures for the Bengali language (is the fifth widely spoken language in the world) with a completely different grammar and language attributes. We have shown that these are working well for complex applications like language generation from image contexts and can diversify the representation through introducing constraints, more extensive features, and unique feature spaces. We also established that we could achieve absolute precision and diversity when we use smoothened semantic tensor with the traditional LSTM and feature decomposition networks. With better learning architecture, we succeeded in establishing an automated algorithm and assessment procedure that can help in the evaluation of competent applications without the requirement for expertise and human intervention

    TPsgtR: Neural-Symbolic Tensor Product Scene-Graph-Triplet Representation for Image Captioning

    Full text link
    Image captioning can be improved if the structure of the graphical representations can be formulated with conceptual positional binding. In this work, we have introduced a novel technique for caption generation using the neural-symbolic encoding of the scene-graphs, derived from regional visual information of the images and we call it Tensor Product Scene-Graph-Triplet Representation (TPsgt_{sgt}R). While, most of the previous works concentrated on identification of the object features in images, we introduce a neuro-symbolic embedding that can embed identified relationships among different regions of the image into concrete forms, instead of relying on the model to compose for any/all combinations. These neural symbolic representation helps in better definition of the neural symbolic space for neuro-symbolic attention and can be transformed to better captions. With this approach, we introduced two novel architectures (TPsgt_{sgt}R-TDBU and TPsgt_{sgt}R-sTDBU) for comparison and experiment result demonstrates that our approaches outperformed the other models, and generated captions are more comprehensive and natural

    SACT: Self-Aware Multi-Space Feature Composition Transformer for Multinomial Attention for Video Captioning

    Full text link
    Video captioning works on the two fundamental concepts, feature detection and feature composition. While modern day transformers are beneficial in composing features, they lack the fundamental problems of selecting and understanding of the contents. As the feature length increases, it becomes increasingly important to include provisions for improved capturing of the pertinent contents. In this work, we have introduced a new concept of Self-Aware Composition Transformer (SACT) that is capable of generating Multinomial Attention (MultAtt) which is a way of generating distributions of various combinations of frames. Also, multi-head attention transformer works on the principle of combining all possible contents for attention, which is good for natural language classification, but has limitations for video captioning. Video contents have repetitions and require parsing of important contents for better content composition. In this work, we have introduced SACT for more selective attention and combined them for different attention heads for better capturing of the usable contents for any applications. To address the problem of diversification and encourage selective utilization, we propose the Self-Aware Composition Transformer model for dense video captioning and apply the technique on two benchmark datasets like ActivityNet and YouCookII

    ReLGAN: Generalization of Consistency for GAN with Disjoint Constraints and Relative Learning of Generative Processes for Multiple Transformation Learning

    Full text link
    Image to image transformation has gained popularity from different research communities due to its enormous impact on different applications, including medical. In this work, we have introduced a generalized scheme for consistency for GAN architectures with two new concepts of Transformation Learning (TL) and Relative Learning (ReL) for enhanced learning image transformations. Consistency for GAN architectures suffered from inadequate constraints and failed to learn multiple and multi-modal transformations, which is inevitable for many medical applications. The main drawback is that it focused on creating an intermediate and workable hybrid, which is not permissible for the medical applications which focus on minute details. Another drawback is the weak interrelation between the two learning phases and TL and ReL have introduced improved coordination among them. We have demonstrated the capability of the novel network framework on public datasets. We emphasized that our novel architecture produced an improved neural image transformation version for the image, which is more acceptable to the medical community. Experiments and results demonstrated the effectiveness of our framework with enhancement compared to the previous works

    Self-Segregating and Coordinated-Segregating Transformer for Focused Deep Multi-Modular Network for Visual Question Answering

    Full text link
    Attention mechanism has gained huge popularity due to its effectiveness in achieving high accuracy in different domains. But attention is opportunistic and is not justified by the content or usability of the content. Transformer like structure creates all/any possible attention(s). We define segregating strategies that can prioritize the contents for the applications for enhancement of performance. We defined two strategies: Self-Segregating Transformer (SST) and Coordinated-Segregating Transformer (CST) and used it to solve visual question answering application. Self-segregation strategy for attention contributes in better understanding and filtering the information that can be most helpful for answering the question and create diversity of visual-reasoning for attention. This work can easily be used in many other applications that involve repetition and multiple frames of features and would reduce the commonality of the attentions to a great extent. Visual Question Answering (VQA) requires understanding and coordination of both images and textual interpretations. Experiments demonstrate that segregation strategies for cascaded multi-head transformer attention outperforms many previous works and achieved considerable improvement for VQA-v2 dataset benchmark
    corecore