273 research outputs found

    Temporal Relational Reasoning in Videos

    Full text link
    Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species. In this paper, we introduce an effective and interpretable network module, the Temporal Relation Network (TRN), designed to learn and reason about temporal dependencies between video frames at multiple time scales. We evaluate TRN-equipped networks on activity recognition tasks using three recent video datasets - Something-Something, Jester, and Charades - which fundamentally depend on temporal relational reasoning. Our results demonstrate that the proposed TRN gives convolutional neural networks a remarkable capacity to discover temporal relations in videos. Through only sparsely sampled video frames, TRN-equipped networks can accurately predict human-object interactions in the Something-Something dataset and identify various human gestures on the Jester dataset with very competitive performance. TRN-equipped networks also outperform two-stream networks and 3D convolution networks in recognizing daily activities in the Charades dataset. Further analyses show that the models learn intuitive and interpretable visual common sense knowledge in videos.Comment: camera-ready version for ECCV'1

    Knowledge Distillation for Multi-task Learning

    Get PDF
    Multi-task learning (MTL) is to learn one single model that performs multiple tasks for achieving good performance on all tasks and lower cost on computation. Learning such a model requires to jointly optimize losses of a set of tasks with different difficulty levels, magnitudes, and characteristics (e.g. cross-entropy, Euclidean loss), leading to the imbalance problem in multi-task learning. To address the imbalance problem, we propose a knowledge distillation based method in this work. We first learn a task-specific model for each task. We then learn the multi-task model for minimizing task-specific loss and for producing the same feature with task-specific models. As the task-specific network encodes different features, we introduce small task-specific adaptors to project multi-task features to the task-specific features. In this way, the adaptors align the task-specific feature and the multi-task feature, which enables a balanced parameter sharing across tasks. Extensive experimental results demonstrate that our method can optimize a multi-task learning model in a more balanced way and achieve better overall performance.Comment: We propose a knowledge distillation method for addressing the imbalance problem in multi-task learnin

    Adding New Tasks to a Single Network with Weight Transformations using Binary Masks

    Full text link
    Visual recognition algorithms are required today to exhibit adaptive abilities. Given a deep model trained on a specific, given task, it would be highly desirable to be able to adapt incrementally to new tasks, preserving scalability as the number of new tasks increases, while at the same time avoiding catastrophic forgetting issues. Recent work has shown that masking the internal weights of a given original conv-net through learned binary variables is a promising strategy. We build upon this intuition and take into account more elaborated affine transformations of the convolutional weights that include learned binary masks. We show that with our generalization it is possible to achieve significantly higher levels of adaptation to new tasks, enabling the approach to compete with fine tuning strategies by requiring slightly more than 1 bit per network parameter per additional task. Experiments on two popular benchmarks showcase the power of our approach, that achieves the new state of the art on the Visual Decathlon Challenge

    Selecting Relevant Features from a Multi-domain Representation for Few-shot Classification

    Get PDF
    Popular approaches for few-shot classification consist of first learning a generic data representation based on a large annotated dataset, before adapting the representation to new classes given only a few labeled samples. In this work, we propose a new strategy based on feature selection, which is both simpler and more effective than previous feature adaptation approaches. First, we obtain a multi-domain representation by training a set of semantically different feature extractors. Then, given a few-shot learning task, we use our multi-domain feature bank to automatically select the most relevant representations. We show that a simple non-parametric classifier built on top of such features produces high accuracy and generalizes to domains never seen during training, which leads to state-of-the-art results on MetaDataset and improved accuracy on mini-ImageNet.Comment: ECCV'2

    Using AI to Enable Design for Diversity: A Perspective

    Get PDF
    Inclusive design focuses on diversity. The contextualized user-sensitive design framework of the interaction system needs to analyze and deal with complex diversity factors, which challenges the traditional design process, tools, and methods. Therefore, new technological progress is needed to provide more innovation potential. The authors point out that the design process of smart products is evolving in response to uncertainty. In the future, diversity-oriented design will tend to allocate design resources and values in an algorithmic way rather than the compromised unity solution. This paper analyzes the limitations and potential of the application of AI technology represented by deep learning in diversity-oriented design practice and design research, puts forward the goal and direction of further research, and discusses the critical links of AI-enabled diversity design in interdisciplinary research environment

    BĂ©zierSketch: A Generative Model for Scalable Vector Sketches

    Get PDF
    The study of neural generative models of human sketches is a fascinating contemporary modeling problem due to the links between sketch image generation and the human drawing process. The landmark SketchRNN provided breakthrough by sequentially generating sketches as a sequence of waypoints. However this leads to low-resolution image generation, and failure to model long sketches. In this paper we present B\'ezierSketch, a novel generative model for fully vector sketches that are automatically scalable and high-resolution. To this end, we first introduce a novel inverse graphics approach to stroke embedding that trains an encoder to embed each stroke to its best fit B\'ezier curve. This enables us to treat sketches as short sequences of paramaterized strokes and thus train a recurrent sketch generator with greater capacity for longer sketches, while producing scalable high-resolution results. We report qualitative and quantitative results on the Quick, Draw! benchmark.Comment: Accepted as poster at ECCV 202

    “Are Machines Better Than Humans in Image Tagging?” - A User Study Adds to the Puzzle

    Get PDF
    “Do machines perform better than humans in visual recognition tasks?” Not so long ago, this question would have been considered even somewhat provoking and the answer would have been clear: “No”. In this paper, we present a comparison of human and machine performance with respect to annotation for multimedia retrieval tasks. Going beyond recent crowdsourcing studies in this respect, we also report results of two extensive user studies. In total, 23 participants were asked to annotate more than 1000 images of a benchmark dataset, which is the most comprehensive study in the field so far. Krippendorff’s α is used to measure inter-coder agreement among several coders and the results are compared with the best machine results. The study is preceded by a summary of studies which compared human and machine performance in different visual and auditory recognition tasks. We discuss the results and derive a methodology in order to compare machine performance in multimedia annotation tasks at human level. This allows us to formally answer the question whether a recognition problem can be considered as solved. Finally, we are going to answer the initial question

    Observation of Kuznetsov-Ma soliton dynamics in optical fibre

    Get PDF
    The nonlinear Schrödinger equation (NLSE) is a central model of nonlinear science, applying to hydrodynamics, plasma physics, molecular biology and optics. The NLSE admits only few elementary analytic solutions, but one in particular describing a localized soliton on a finite background is of intense current interest in the context of understanding the physics of extreme waves. However, although the first solution of this type was the Kuznetzov-Ma (KM) soliton derived in 1977, there have in fact been no quantitative experiments confirming its validity. We report here novel experiments in optical fibre that confirm the KM soliton theory, completing an important series of experiments that have now observed a complete family of soliton on background solutions to the NLSE. Our results also show that KM dynamics appear more universally than for the specific conditions originally considered, and can be interpreted as an analytic description of Fermi-Pasta-Ulam recurrence in NLSE propagation