165 research outputs found

    Deep Learning based 3D Segmentation: A Survey

    Full text link
    3D object segmentation is a fundamental and challenging problem in computer vision with applications in autonomous driving, robotics, augmented reality and medical image analysis. It has received significant attention from the computer vision, graphics and machine learning communities. Traditionally, 3D segmentation was performed with hand-crafted features and engineered methods which failed to achieve acceptable accuracy and could not generalize to large-scale data. Driven by their great success in 2D computer vision, deep learning techniques have recently become the tool of choice for 3D segmentation tasks as well. This has led to an influx of a large number of methods in the literature that have been evaluated on different benchmark datasets. This paper provides a comprehensive survey of recent progress in deep learning based 3D segmentation covering over 150 papers. It summarizes the most commonly used pipelines, discusses their highlights and shortcomings, and analyzes the competitive results of these segmentation methods. Based on the analysis, it also provides promising research directions for the future.Comment: Under review of ACM Computing Surveys, 36 pages, 10 tables, 9 figure

    Salient Object Detection via Integrity Learning

    Full text link
    Albeit current salient object detection (SOD) works have achieved fantastic progress, they are cast into the shade when it comes to the integrity of the predicted salient regions. We define the concept of integrity at both the micro and macro level. Specifically, at the micro level, the model should highlight all parts that belong to a certain salient object, while at the macro level, the model needs to discover all salient objects from the given image scene. To facilitate integrity learning for salient object detection, we design a novel Integrity Cognition Network (ICON), which explores three important components to learn strong integrity features. 1) Unlike the existing models that focus more on feature discriminability, we introduce a diverse feature aggregation (DFA) component to aggregate features with various receptive fields (i.e.,, kernel shape and context) and increase the feature diversity. Such diversity is the foundation for mining the integral salient objects. 2) Based on the DFA features, we introduce the integrity channel enhancement (ICE) component with the goal of enhancing feature channels that highlight the integral salient objects at the macro level, while suppressing the other distracting ones. 3) After extracting the enhanced features, the part-whole verification (PWV) method is employed to determine whether the part and whole object features have strong agreement. Such part-whole agreements can further improve the micro-level integrity for each salient object. To demonstrate the effectiveness of ICON, comprehensive experiments are conducted on seven challenging benchmarks, where promising results are achieved

    Graph-based Facial Affect Analysis: A Review of Methods, Applications and Challenges

    Full text link
    Facial affect analysis (FAA) using visual signals is important in human-computer interaction. Early methods focus on extracting appearance and geometry features associated with human affects, while ignoring the latent semantic information among individual facial changes, leading to limited performance and generalization. Recent work attempts to establish a graph-based representation to model these semantic relationships and develop frameworks to leverage them for various FAA tasks. In this paper, we provide a comprehensive review of graph-based FAA, including the evolution of algorithms and their applications. First, the FAA background knowledge is introduced, especially on the role of the graph. We then discuss approaches that are widely used for graph-based affective representation in literature and show a trend towards graph construction. For the relational reasoning in graph-based FAA, existing studies are categorized according to their usage of traditional methods or deep models, with a special emphasis on the latest graph neural networks. Performance comparisons of the state-of-the-art graph-based FAA methods are also summarized. Finally, we discuss the challenges and potential directions. As far as we know, this is the first survey of graph-based FAA methods. Our findings can serve as a reference for future research in this field.Comment: 20 pages, 12 figures, 5 table

    Deep Learning for Face Anti-Spoofing: A Survey

    Full text link
    Face anti-spoofing (FAS) has lately attracted increasing attention due to its vital role in securing face recognition systems from presentation attacks (PAs). As more and more realistic PAs with novel types spring up, traditional FAS methods based on handcrafted features become unreliable due to their limited representation capacity. With the emergence of large-scale academic datasets in the recent decade, deep learning based FAS achieves remarkable performance and dominates this area. However, existing reviews in this field mainly focus on the handcrafted features, which are outdated and uninspiring for the progress of FAS community. In this paper, to stimulate future research, we present the first comprehensive review of recent advances in deep learning based FAS. It covers several novel and insightful components: 1) besides supervision with binary label (e.g., '0' for bonafide vs. '1' for PAs), we also investigate recent methods with pixel-wise supervision (e.g., pseudo depth map); 2) in addition to traditional intra-dataset evaluation, we collect and analyze the latest methods specially designed for domain generalization and open-set FAS; and 3) besides commercial RGB camera, we summarize the deep learning applications under multi-modal (e.g., depth and infrared) or specialized (e.g., light field and flash) sensors. We conclude this survey by emphasizing current open issues and highlighting potential prospects.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI

    Capsule Networks for Video Understanding

    Get PDF
    With the increase of videos available online, it is more important than ever to learn how to process and understand video data. Although convolutional neural networks have revolutionized the representation learning from images and videos, they do not explicitly model entities within the given input. It would be useful for learned models to be able to represent part-to-whole relationships within a given image or video. To this end, a novel neural network architecture - capsule networks - has been proposed. Capsule networks add extra structure to allow for the modeling of entities and has shown great promise when applied to image data. By grouping neural activations and propagating information from one layer to the next through a routing-by-agreement procedure, capsule networks are able to learn part-to-whole relationships as well as robust object representations. In this dissertation, we explore how capsule networks can be generalized to video and be used to effectively solve several video understanding problems. First, we generalize capsule networks from the image domain so that it can process 3-dimensional video data. Our proposed video capsule network (VideoCapsuleNet) tackles the problem of video action detection. We introduce capsule-pooling in the convolutional capsule layer to make the voting algorithm tractable in the 3-dimensional video domain. The network\u27s routing-by-agreement inherently models the action representations and various action characteristics are captured by the predicted capsules. We show that VideoCapsuleNet is able to successfully produce pixel-wise localizations of actions present in videos. While action detection only requires a coarse localization, we show that video capsule networks can generate fine-grained segmentations. To that end, we propose a capsule-based approach for video object segmentation, CapsuleVOS, which can segment several frames at once conditioned on a reference frame and segmentation mask. This conditioning is performed through a novel routing algorithm for attention-based efficient capsule selection. We address two challenging issues in video object segmentation: segmentation of small objects and occlusion of objects across time. The first issue is addressed with a zooming module; the second, is dealt with by a novel memory module based on recurrent neural networks. Above we show that capsule networks can effectively localize actors and objects within videos. Next, we address the problem of integration of video and text for the task of actor and action video segmentation from a sentence. We propose a novel capsule-based approach to perform pixel-level localization based on a natural language query describing the actor of interest. We encode both the video and textual input in the form of capsules, and propose a visual-textual routing mechanism for the fusion of these capsules to successfully localize the actor and action within all frames of a video. The previous works are all fully supervised: they are all trained on manually annotated data, which is often time-consuming and costly to acquire. Finally, we propose a novel method for self-supervised learning which does not rely on manually annotated data. We present a capsule network that jointly learns high-level concepts and their relationships across different low-level multimodal (video, audio, and text) input representations. To adapt the capsules to large-scale input data, we propose a routing by self-attention mechanism that selects relevant capsules which are then used to generate a final joint multimodal feature representation. This allows us to learn robust representations from noisy video data and to scale up the size of the capsule network compared to traditional routing methods while still being computationally efficient

    Deep Learning for Video Object Segmentation:A Review

    Get PDF
    As one of the fundamental problems in the field of video understanding, video object segmentation aims at segmenting objects of interest throughout the given video sequence. Recently, with the advancements of deep learning techniques, deep neural networks have shown outstanding performance improvements in many computer vision applications, with video object segmentation being one of the most advocated and intensively investigated. In this paper, we present a systematic review of the deep learning-based video segmentation literature, highlighting the pros and cons of each category of approaches. Concretely, we start by introducing the definition, background concepts and basic ideas of algorithms in this field. Subsequently, we summarise the datasets for training and testing a video object segmentation algorithm, as well as common challenges and evaluation metrics. Next, previous works are grouped and reviewed based on how they extract and use spatial and temporal features, where their architectures, contributions and the differences among each other are elaborated. At last, the quantitative and qualitative results of several representative methods on a dataset with many remaining challenges are provided and analysed, followed by further discussions on future research directions. This article is expected to serve as a tutorial and source of reference for learners intended to quickly grasp the current progress in this research area and practitioners interested in applying the video object segmentation methods to their problems. A public website is built to collect and track the related works in this field: https://github.com/gaomingqi/VOS-Review

    Deep learning for internet of underwater things and ocean data analytics

    Get PDF
    The Internet of Underwater Things (IoUT) is an emerging technological ecosystem developed for connecting objects in maritime and underwater environments. IoUT technologies are empowered by an extreme number of deployed sensors and actuators. In this thesis, multiple IoUT sensory data are augmented with machine intelligence for forecasting purposes
    corecore