305 research outputs found
Mass Displacement Networks
Despite the large improvements in performance attained by using deep learning
in computer vision, one can often further improve results with some additional
post-processing that exploits the geometric nature of the underlying task. This
commonly involves displacing the posterior distribution of a CNN in a way that
makes it more appropriate for the task at hand, e.g. better aligned with local
image features, or more compact. In this work we integrate this geometric
post-processing within a deep architecture, introducing a differentiable and
probabilistically sound counterpart to the common geometric voting technique
used for evidence accumulation in vision. We refer to the resulting neural
models as Mass Displacement Networks (MDNs), and apply them to human pose
estimation in two distinct setups: (a) landmark localization, where we collapse
a distribution to a point, allowing for precise localization of body keypoints
and (b) communication across body parts, where we transfer evidence from one
part to the other, allowing for a globally consistent pose estimate. We
evaluate on large-scale pose estimation benchmarks, such as MPII Human Pose and
COCO datasets, and report systematic improvements when compared to strong
baselines.Comment: 12 pages, 4 figure
The concept of inter-cultural communication in globalized world
Recently, world globalization became one of the most used notion in different areas. While process of globalization is going on our outlook on the modern world and reality are completely changing. There is a need for developing in unison with other countries, conducting global policy according to modern technologies. In that way, all traditional areas experience modern changes, global, information community and mass community appear
CentralNet: a Multilayer Approach for Multimodal Fusion
This paper proposes a novel multimodal fusion approach, aiming to produce
best possible decisions by integrating information coming from multiple media.
While most of the past multimodal approaches either work by projecting the
features of different modalities into the same space, or by coordinating the
representations of each modality through the use of constraints, our approach
borrows from both visions. More specifically, assuming each modality can be
processed by a separated deep convolutional network, allowing to take decisions
independently from each modality, we introduce a central network linking the
modality specific networks. This central network not only provides a common
feature embedding but also regularizes the modality specific networks through
the use of multi-task learning. The proposed approach is validated on 4
different computer vision tasks on which it consistently improves the accuracy
of existing multimodal fusion approaches
Еncyclopedia of Ukraine
«Encyclopedia of Ukraine» is one of the most important and significant works of the twentieth century. This work has a goal-to acquaint us with the history of ancient and modern Ukraine, which during its long journey of formation, experienced a large number of changes. The encyclopedia was created under the auspices of the Scientific Society named after Shevchenko in Europe and contains of two parts: 3 volumes of general part and 10 volumes of dictionary part
Predicting Deeper into the Future of Semantic Segmentation
The ability to predict and therefore to anticipate the future is an important
attribute of intelligence. It is also of utmost importance in real-time
systems, e.g. in robotics or autonomous driving, which depend on visual scene
understanding for decision making. While prediction of the raw RGB pixel values
in future video frames has been studied in previous work, here we introduce the
novel task of predicting semantic segmentations of future frames. Given a
sequence of video frames, our goal is to predict segmentation maps of not yet
observed video frames that lie up to a second or further in the future. We
develop an autoregressive convolutional neural network that learns to
iteratively generate multiple frames. Our results on the Cityscapes dataset
show that directly predicting future segmentations is substantially better than
predicting and then segmenting future RGB frames. Prediction results up to half
a second in the future are visually convincing and are much more accurate than
those of a baseline based on warping semantic segmentations using optical flow.Comment: Accepted to ICCV 2017. Supplementary material available on the
authors' webpage
ModDrop: adaptive multi-modal gesture recognition
We present a method for gesture detection and localisation based on
multi-scale and multi-modal deep learning. Each visual modality captures
spatial information at a particular spatial scale (such as motion of the upper
body or a hand), and the whole system operates at three temporal scales. Key to
our technique is a training strategy which exploits: i) careful initialization
of individual modalities; and ii) gradual fusion involving random dropping of
separate channels (dubbed ModDrop) for learning cross-modality correlations
while preserving uniqueness of each modality-specific representation. We
present experiments on the ChaLearn 2014 Looking at People Challenge gesture
recognition track, in which we placed first out of 17 teams. Fusing multiple
modalities at several spatial and temporal scales leads to a significant
increase in recognition rates, allowing the model to compensate for errors of
the individual classifiers as well as noise in the separate channels.
Futhermore, the proposed ModDrop training technique ensures robustness of the
classifier to missing signals in one or several channels to produce meaningful
predictions from any number of available modalities. In addition, we
demonstrate the applicability of the proposed fusion scheme to modalities of
arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure
An exclusive phonological method of teaching a foreign language with a professionally oriented approach
The relevance of the topic of the article is due to the reassessment of the goals and objectives of professionally oriented language education at the undergraduate level in the light of UNESCO initiatives. The hypothesis: use of the proposed methodology in the course of teaching English with the indicated approach at a technical university can significantly improve pronunciation. The purpose is to determine the effectiveness of the phonological methodology in teaching a foreign language based on a professionally oriented approach.
Methodology: pilot experiment, comparison, observation, description, Wilcoxon's T-test, induction. The experiment in 2019-2020 involved 60 students of the Moscow Aviation Institute (National Research University).
Most relevant results: repeated control testing revealed an increase in the level of foreign language proficiency. The Wilcoxon t-test confirmed the significance of differences in the results: Temp=0< Tcr (n=4; p≤0.01)=1. Practical testing of this study showed that the vocalism technique provides effective opportunities for implementing an individual learning path.
Thus, future research suggested by the results will continue to improve the phonology of pronunciation.
The novelty of the study lies in the fact that this experiment was conducted for the first time at an aerospace non-linguistic university
- …