292 research outputs found

    Mass Displacement Networks

    Full text link
    Despite the large improvements in performance attained by using deep learning in computer vision, one can often further improve results with some additional post-processing that exploits the geometric nature of the underlying task. This commonly involves displacing the posterior distribution of a CNN in a way that makes it more appropriate for the task at hand, e.g. better aligned with local image features, or more compact. In this work we integrate this geometric post-processing within a deep architecture, introducing a differentiable and probabilistically sound counterpart to the common geometric voting technique used for evidence accumulation in vision. We refer to the resulting neural models as Mass Displacement Networks (MDNs), and apply them to human pose estimation in two distinct setups: (a) landmark localization, where we collapse a distribution to a point, allowing for precise localization of body keypoints and (b) communication across body parts, where we transfer evidence from one part to the other, allowing for a globally consistent pose estimate. We evaluate on large-scale pose estimation benchmarks, such as MPII Human Pose and COCO datasets, and report systematic improvements when compared to strong baselines.Comment: 12 pages, 4 figure

    The concept of inter-cultural communication in globalized world

    Get PDF
    Recently, world globalization became one of the most used notion in different areas. While process of globalization is going on our outlook on the modern world and reality are completely changing. There is a need for developing in unison with other countries, conducting global policy according to modern technologies. In that way, all traditional areas experience modern changes, global, information community and mass community appear

    CentralNet: a Multilayer Approach for Multimodal Fusion

    Full text link
    This paper proposes a novel multimodal fusion approach, aiming to produce best possible decisions by integrating information coming from multiple media. While most of the past multimodal approaches either work by projecting the features of different modalities into the same space, or by coordinating the representations of each modality through the use of constraints, our approach borrows from both visions. More specifically, assuming each modality can be processed by a separated deep convolutional network, allowing to take decisions independently from each modality, we introduce a central network linking the modality specific networks. This central network not only provides a common feature embedding but also regularizes the modality specific networks through the use of multi-task learning. The proposed approach is validated on 4 different computer vision tasks on which it consistently improves the accuracy of existing multimodal fusion approaches

    Еncyclopedia of Ukraine

    Get PDF
    «Encyclopedia of Ukraine» is one of the most important and significant works of the twentieth century. This work has a goal-to acquaint us with the history of ancient and modern Ukraine, which during its long journey of formation, experienced a large number of changes. The encyclopedia was created under the auspices of the Scientific Society named after Shevchenko in Europe and contains of two parts: 3 volumes of general part and 10 volumes of dictionary part

    ModDrop: adaptive multi-modal gesture recognition

    Full text link
    We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure

    Predicting Deeper into the Future of Semantic Segmentation

    Get PDF
    The ability to predict and therefore to anticipate the future is an important attribute of intelligence. It is also of utmost importance in real-time systems, e.g. in robotics or autonomous driving, which depend on visual scene understanding for decision making. While prediction of the raw RGB pixel values in future video frames has been studied in previous work, here we introduce the novel task of predicting semantic segmentations of future frames. Given a sequence of video frames, our goal is to predict segmentation maps of not yet observed video frames that lie up to a second or further in the future. We develop an autoregressive convolutional neural network that learns to iteratively generate multiple frames. Our results on the Cityscapes dataset show that directly predicting future segmentations is substantially better than predicting and then segmenting future RGB frames. Prediction results up to half a second in the future are visually convincing and are much more accurate than those of a baseline based on warping semantic segmentations using optical flow.Comment: Accepted to ICCV 2017. Supplementary material available on the authors' webpage

    An exclusive phonological method of teaching a foreign language with a professionally oriented approach

    Get PDF
    The relevance of the topic of the article is due to the reassessment of the goals and objectives of professionally oriented language education at the undergraduate level in the light of UNESCO initiatives. The hypothesis: use of the proposed methodology in the course of teaching English with the indicated approach at a technical university can significantly improve pronunciation. The purpose  is to determine the effectiveness of the phonological methodology in teaching a foreign language based on a professionally oriented approach. Methodology: pilot experiment, comparison, observation, description, Wilcoxon's T-test, induction. The experiment in 2019-2020 involved 60 students of the Moscow Aviation Institute (National Research University). Most relevant results: repeated control testing revealed an increase in the level of foreign language proficiency. The Wilcoxon t-test confirmed the significance of differences in the results: Temp=0< Tcr (n=4; p≤0.01)=1. Practical testing of this study showed that the vocalism technique provides effective opportunities for implementing an individual learning path. Thus, future research suggested by the results will continue to improve the phonology of pronunciation. The novelty of the study lies in the fact that this experiment was conducted for the first time at an aerospace non-linguistic university
    corecore