6,152 research outputs found

    Deformable Capsules for Object Detection

    Full text link
    In this study, we introduce a new family of capsule networks, deformable capsules (DeformCaps), to address a very important problem in computer vision: object detection. We propose two new algorithms associated with our DeformCaps: a novel capsule structure (SplitCaps), and a novel dynamic routing algorithm (SE-Routing), which balance computational efficiency with the need for modeling a large number of objects and classes, which have never been achieved with capsule networks before. We demonstrate that the proposed methods allow capsules to efficiently scale-up to large-scale computer vision tasks for the first time, and create the first-ever capsule network for object detection in the literature. Our proposed architecture is a one-stage detection framework and obtains results on MS COCO which are on-par with state-of-the-art one-stage CNN-based methods, while producing fewer false positive detections, generalizing to unusual poses/viewpoints of objects

    Polyphonic Sound Event Detection by using Capsule Neural Networks

    Full text link
    Artificial sound event detection (SED) has the aim to mimic the human ability to perceive and understand what is happening in the surroundings. Nowadays, Deep Learning offers valuable techniques for this goal such as Convolutional Neural Networks (CNNs). The Capsule Neural Network (CapsNet) architecture has been recently introduced in the image processing field with the intent to overcome some of the known limitations of CNNs, specifically regarding the scarce robustness to affine transformations (i.e., perspective, size, orientation) and the detection of overlapped images. This motivated the authors to employ CapsNets to deal with the polyphonic-SED task, in which multiple sound events occur simultaneously. Specifically, we propose to exploit the capsule units to represent a set of distinctive properties for each individual sound event. Capsule units are connected through a so-called "dynamic routing" that encourages learning part-whole relationships and improves the detection performance in a polyphonic context. This paper reports extensive evaluations carried out on three publicly available datasets, showing how the CapsNet-based algorithm not only outperforms standard CNNs but also allows to achieve the best results with respect to the state of the art algorithms

    3D Point Capsule Networks

    Get PDF
    In this paper, we propose 3D point-capsule networks, an auto-encoder designed to process sparse 3D point clouds while preserving spatial arrangements of the input data. 3D capsule networks arise as a direct consequence of our novel unified 3D auto-encoder formulation. Their dynamic routing scheme and the peculiar 2D latent space deployed by our approach bring in improvements for several common point cloud-related tasks, such as object classification, object reconstruction and part segmentation as substantiated by our extensive evaluations. Moreover, it enables new applications such as part interpolation and replacement.Comment: As published in CVPR 2019 (camera ready version), with supplementary materia

    3D Point Capsule Networks

    Get PDF
    In this paper, we propose 3D point-capsule networks, an auto-encoder designed to process sparse 3D point clouds while preserving spatial arrangements of the input data. 3D capsule networks arise as a direct consequence of our novel unified 3D auto-encoder formulation. Their dynamic routing scheme and the peculiar 2D latent space deployed by our approach bring in improvements for several common point cloud-related tasks, such as object classification, object reconstruction and part segmentation as substantiated by our extensive evaluations. Moreover, it enables new applications such as part interpolation and replacement

    SECaps: A Sequence Enhanced Capsule Model for Charge Prediction

    Full text link
    Automatic charge prediction aims to predict appropriate final charges according to the fact descriptions for a given criminal case. Automatic charge prediction plays a critical role in assisting judges and lawyers to improve the efficiency of legal decisions, and thus has received much attention. Nevertheless, most existing works on automatic charge prediction perform adequately on high-frequency charges but are not yet capable of predicting few-shot charges with limited cases. In this paper, we propose a Sequence Enhanced Capsule model, dubbed as SECaps model, to relieve this problem. Specifically, following the work of capsule networks, we propose the seq-caps layer, which considers sequence information and spatial information of legal texts simultaneously. Then we design a attention residual unit, which provides auxiliary information for charge prediction. In addition, our SECaps model introduces focal loss, which relieves the problem of imbalanced charges. Comparing the state-of-the-art methods, our SECaps model obtains 4.5% and 6.4% absolutely considerable improvements under Macro F1 in Criminal-S and Criminal-L respectively. The experimental results consistently demonstrate the superiorities and competitiveness of our proposed model.Comment: 13 pages, 3figures, 5 table

    Capsule Routing for Sound Event Detection

    Get PDF
    The detection of acoustic scenes is a challenging problem in which environmental sound events must be detected from a given audio signal. This includes classifying the events as well as estimating their onset and offset times. We approach this problem with a neural network architecture that uses the recently-proposed capsule routing mechanism. A capsule is a group of activation units representing a set of properties for an entity of interest, and the purpose of routing is to identify part-whole relationships between capsules. That is, a capsule in one layer is assumed to belong to a capsule in the layer above in terms of the entity being represented. Using capsule routing, we wish to train a network that can learn global coherence implicitly, thereby improving generalization performance. Our proposed method is evaluated on Task 4 of the DCASE 2017 challenge. Results show that classification performance is state-of-the-art, achieving an F-score of 58.6%. In addition, overfitting is reduced considerably compared to other architectures.Comment: Paper accepted for 26th European Signal Processing Conference (EUSIPCO 2018
    corecore