31 research outputs found
Feature Fusion Effects of Tensor Product Representation on (De)Compositional Network for Caption Generation for Images
Progress in image captioning is gradually getting complex as researchers try
to generalized the model and define the representation between visual features
and natural language processing. This work tried to define such kind of
relationship in the form of representation called Tensor Product Representation
(TPR) which generalized the scheme of language modeling and structuring the
linguistic attributes (related to grammar and parts of speech of language)
which will provide a much better structure and grammatically correct sentence.
TPR enables better and unique representation and structuring of the feature
space and will enable better sentence composition from these representations. A
large part of the different ways of defining and improving these TPR are
discussed and their performance with respect to the traditional procedures and
feature representations are evaluated for image captioning application. The new
models achieved considerable improvement than the corresponding previous
architectures.Comment: 7 page
Angular Correlation in Double Photoionization of Atoms and the Role of the Observer
The problem of angular correlation in the double photoionization (DPI) of
rare gas atoms is considered in some depth. We refer particularly to the
efficiency operator for the detection of an electron by a detector having the
shape of a right circular cylinder. The different factors in the efficiency
operator are discussed in detail keeping in mind the fundamental
epistemological question of the role of the observer (or his equipment) in such
experiments.Comment: 4 pages, late
An angular correlation theory for double photoionization in a rare gas atom
We consider the process of double photoionization (DPI) in a rare gas atom as
a two-step process, namely (i) photoionization in an inner shell followed by
(ii) the emission of an Auger electron from an outer shell. An angular
correlation function for the two emitted electrons is defined by analogy with
the theory of angular correlation in nuclear physics. An expression is obtained
for this angular correlation function by a statistical method which makes use
of the density and efficiency operators. The latter takes care of the
attenuation of the probability of detection of an electrons due to the
geometrical properties of the detector. Theoretical values of the angular
correlation function are obtained for DPI in xenon and these are shown to be in
good agreement with the experimental results given by K\"{a}mmerling and
Schmidt
Green Heron Swarm Optimization Algorithm - State-of-the-Art of a New Nature Inspired Discrete Meta-Heuristics
Many real world problems are NP-Hard problems are a very large part of them
can be represented as graph based problems. This makes graph theory a very
important and prevalent field of study. In this work a new bio-inspired
meta-heuristics called Green Heron Swarm Optimization (GHOSA) Algorithm is
being introduced which is inspired by the fishing skills of the bird. The
algorithm basically suited for graph based problems like combinatorial
optimization etc. However introduction of an adaptive mathematical variation
operator called Location Based Neighbour Influenced Variation (LBNIV) makes it
suitable for high dimensional continuous domain problems. The new algorithm is
being operated on the traditional benchmark equations and the results are
compared with Genetic Algorithm and Particle Swarm Optimization. The algorithm
is also operated on Travelling Salesman Problem, Quadratic Assignment Problem,
Knapsack Problem dataset. The procedure to operate the algorithm on the
Resource Constraint Shortest Path and road network optimization is also
discussed. The results clearly demarcates the GHOSA algorithm as an efficient
algorithm specially considering that the number of algorithms for the discrete
optimization is very low and robust and more explorative algorithm is required
in this age of social networking and mostly graph based problem scenarios.Comment: 20 pages, Pre-print copy, submitted to a peer reviewed journa
Angular Correlation in Double photoionization of Atoms and the Role of the Detection Process
The problem of angular correlation in the double photoionization (DPI) of
rare gas atoms is considered in some depth. We refer particularly to the
efficiency operator for the detection of an electron by a detector having
cylindrical symmetry. The different factors in the efficiency operator are
discussed in detail keeping in mind the fundamental epistemological question of
the role of the detection process in such experiments.Comment: 5pages, Revte
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO Framework
In this work, we have introduced Gaussian Smoothen Semantic Features (GSSF)
for Better Semantic Selection for Indian regional language-based image
captioning and introduced a procedure where we used the existing translation
and English crowd-sourced sentences for training. We have shown that this
architecture is a promising alternative source, where there is a crunch in
resources. Our main contribution of this work is the development of deep
learning architectures for the Bengali language (is the fifth widely spoken
language in the world) with a completely different grammar and language
attributes. We have shown that these are working well for complex applications
like language generation from image contexts and can diversify the
representation through introducing constraints, more extensive features, and
unique feature spaces. We also established that we could achieve absolute
precision and diversity when we use smoothened semantic tensor with the
traditional LSTM and feature decomposition networks. With better learning
architecture, we succeeded in establishing an automated algorithm and
assessment procedure that can help in the evaluation of competent applications
without the requirement for expertise and human intervention
TPsgtR: Neural-Symbolic Tensor Product Scene-Graph-Triplet Representation for Image Captioning
Image captioning can be improved if the structure of the graphical
representations can be formulated with conceptual positional binding. In this
work, we have introduced a novel technique for caption generation using the
neural-symbolic encoding of the scene-graphs, derived from regional visual
information of the images and we call it Tensor Product Scene-Graph-Triplet
Representation (TPR). While, most of the previous works concentrated on
identification of the object features in images, we introduce a neuro-symbolic
embedding that can embed identified relationships among different regions of
the image into concrete forms, instead of relying on the model to compose for
any/all combinations. These neural symbolic representation helps in better
definition of the neural symbolic space for neuro-symbolic attention and can be
transformed to better captions. With this approach, we introduced two novel
architectures (TPR-TDBU and TPR-sTDBU) for comparison and
experiment result demonstrates that our approaches outperformed the other
models, and generated captions are more comprehensive and natural
SACT: Self-Aware Multi-Space Feature Composition Transformer for Multinomial Attention for Video Captioning
Video captioning works on the two fundamental concepts, feature detection and
feature composition. While modern day transformers are beneficial in composing
features, they lack the fundamental problems of selecting and understanding of
the contents. As the feature length increases, it becomes increasingly
important to include provisions for improved capturing of the pertinent
contents. In this work, we have introduced a new concept of Self-Aware
Composition Transformer (SACT) that is capable of generating Multinomial
Attention (MultAtt) which is a way of generating distributions of various
combinations of frames. Also, multi-head attention transformer works on the
principle of combining all possible contents for attention, which is good for
natural language classification, but has limitations for video captioning.
Video contents have repetitions and require parsing of important contents for
better content composition. In this work, we have introduced SACT for more
selective attention and combined them for different attention heads for better
capturing of the usable contents for any applications. To address the problem
of diversification and encourage selective utilization, we propose the
Self-Aware Composition Transformer model for dense video captioning and apply
the technique on two benchmark datasets like ActivityNet and YouCookII
ReLGAN: Generalization of Consistency for GAN with Disjoint Constraints and Relative Learning of Generative Processes for Multiple Transformation Learning
Image to image transformation has gained popularity from different research
communities due to its enormous impact on different applications, including
medical. In this work, we have introduced a generalized scheme for consistency
for GAN architectures with two new concepts of Transformation Learning (TL) and
Relative Learning (ReL) for enhanced learning image transformations.
Consistency for GAN architectures suffered from inadequate constraints and
failed to learn multiple and multi-modal transformations, which is inevitable
for many medical applications. The main drawback is that it focused on creating
an intermediate and workable hybrid, which is not permissible for the medical
applications which focus on minute details. Another drawback is the weak
interrelation between the two learning phases and TL and ReL have introduced
improved coordination among them. We have demonstrated the capability of the
novel network framework on public datasets. We emphasized that our novel
architecture produced an improved neural image transformation version for the
image, which is more acceptable to the medical community. Experiments and
results demonstrated the effectiveness of our framework with enhancement
compared to the previous works
Self-Segregating and Coordinated-Segregating Transformer for Focused Deep Multi-Modular Network for Visual Question Answering
Attention mechanism has gained huge popularity due to its effectiveness in
achieving high accuracy in different domains. But attention is opportunistic
and is not justified by the content or usability of the content. Transformer
like structure creates all/any possible attention(s). We define segregating
strategies that can prioritize the contents for the applications for
enhancement of performance. We defined two strategies: Self-Segregating
Transformer (SST) and Coordinated-Segregating Transformer (CST) and used it to
solve visual question answering application. Self-segregation strategy for
attention contributes in better understanding and filtering the information
that can be most helpful for answering the question and create diversity of
visual-reasoning for attention. This work can easily be used in many other
applications that involve repetition and multiple frames of features and would
reduce the commonality of the attentions to a great extent. Visual Question
Answering (VQA) requires understanding and coordination of both images and
textual interpretations. Experiments demonstrate that segregation strategies
for cascaded multi-head transformer attention outperforms many previous works
and achieved considerable improvement for VQA-v2 dataset benchmark