1,147 research outputs found
Weakly- and Semi-Supervised Panoptic Segmentation
We present a weakly supervised model that jointly performs both semantic- and
instance-segmentation -- a particularly relevant problem given the substantial
cost of obtaining pixel-perfect annotation for these tasks. In contrast to many
popular instance segmentation approaches based on object detectors, our method
does not predict any overlapping instances. Moreover, we are able to segment
both "thing" and "stuff" classes, and thus explain all the pixels in the image.
"Thing" classes are weakly-supervised with bounding boxes, and "stuff" with
image-level tags. We obtain state-of-the-art results on Pascal VOC, for both
full and weak supervision (which achieves about 95% of fully-supervised
performance). Furthermore, we present the first weakly-supervised results on
Cityscapes for both semantic- and instance-segmentation. Finally, we use our
weakly supervised framework to analyse the relationship between annotation
quality and predictive performance, which is of interest to dataset creators.Comment: ECCV 2018. The first two authors contributed equall
An Overview of Multimodal Techniques for the Characterization of Sport Programmes
The problem of content characterization of sports videos is of great interest because sports video appeals to large audiences and its efficient distribution over various networks should contribute to widespread usage of multimedia services. In this paper we analyze several techniques proposed in literature for content characterization of sports videos. We focus this analysis on the typology of the signal (audio, video, text captions, ...) from which the low-level features are extracted. First we consider the techniques based on visual information, then the methods based on audio information, and finally the algorithms based on audio-visual cues, used in a multi-modal fashion. This analysis shows that each type of signal carries some peculiar information, and the multi-modal approach can fully exploit the multimedia information associated to the sports video. Moreover, we observe that the characterization is performed either considering what happens in a specific time segment, observing therefore the features in a "static" way, or trying to capture their "dynamic" evolution in time. The effectiveness of each approach depends mainly on the kind of sports it relates to, and the type of highlights we are focusing on
LOGICSEG: Parsing Visual Semantics with Neural Logic Learning and Reasoning
Current high-performance semantic segmentation models are purely data-driven
sub-symbolic approaches and blind to the structured nature of the visual world.
This is in stark contrast to human cognition which abstracts visual perceptions
at multiple levels and conducts symbolic reasoning with such structured
abstraction. To fill these fundamental gaps, we devise LOGICSEG, a holistic
visual semantic parser that integrates neural inductive learning and logic
reasoning with both rich data and symbolic knowledge. In particular, the
semantic concepts of interest are structured as a hierarchy, from which a set
of constraints are derived for describing the symbolic relations and formalized
as first-order logic rules. After fuzzy logic-based continuous relaxation,
logical formulae are grounded onto data and neural computational graphs, hence
enabling logic-induced network training. During inference, logical constraints
are packaged into an iterative process and injected into the network in a form
of several matrix multiplications, so as to achieve hierarchy-coherent
prediction with logic reasoning. These designs together make LOGICSEG a general
and compact neural-logic machine that is readily integrated into existing
segmentation models. Extensive experiments over four datasets with various
segmentation models and backbones verify the effectiveness and generality of
LOGICSEG. We believe this study opens a new avenue for visual semantic parsing.Comment: ICCV 2023 (Oral). Code: https://github.com/lingorX/LogicSeg
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
- …