259 research outputs found
Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection
Co-saliency detection aims to discover the common and salient foregrounds
from a group of relevant images. For this task, we present a novel adaptive
graph convolutional network with attention graph clustering (GCAGC). Three
major contributions have been made, and are experimentally shown to have
substantial practical merits. First, we propose a graph convolutional network
design to extract information cues to characterize the intra- and interimage
correspondence. Second, we develop an attention graph clustering algorithm to
discriminate the common objects from all the salient foreground objects in an
unsupervised fashion. Third, we present a unified framework with
encoder-decoder structure to jointly train and optimize the graph convolutional
network, attention graph cluster, and co-saliency detection decoder in an
end-to-end manner. We evaluate our proposed GCAGC method on three cosaliency
detection benchmark datasets (iCoseg, Cosal2015 and COCO-SEG). Our GCAGC method
obtains significant improvements over the state-of-the-arts on most of them.Comment: CVPR202
Understanding Optical Music Recognition
For over 50 years, researchers have been trying to teach computers to read music notation, referred to as Optical Music Recognition (OMR). However, this field is still difficult to access for new researchers, especially those without a significant musical background: Few introductory materials are available, and, furthermore, the field has struggled with defining itself and building a shared terminology. In this work, we address these shortcomings by (1) providing a robust definition of OMR and its relationship to related fields, (2) analyzing how OMR inverts the music encoding process to recover the musical notation and the musical semantics from documents, and (3) proposing a taxonomy of OMR, with most notably a novel taxonomy of applications. Additionally, we discuss how deep learning affects modern OMR research, as opposed to the traditional pipeline. Based on this work, the reader should be able to attain a basic understanding of OMR: its objectives, its inherent structure, its relationship to other fields, the state of the art, and the research opportunities it affords
Can humain association norm evaluate latent semantic analysis?
This paper presents the comparison of word association norm created by a psycholinguistic experiment to association lists generated by algorithms operating on text corpora. We compare lists generated by Church and Hanks algorithm and lists generated by LSA algorithm. An argument is presented on how those automatically generated lists reflect real semantic relations
Efficient resource allocation for automotive active vision systems
Individual mobility on roads has a noticeable impact upon peoples' lives, including
traffic accidents resulting in severe, or even lethal injuries. Therefore the main goal when
operating a vehicle is to safely participate in road-traffic while minimising the adverse
effects on our environment. This goal is pursued by road safety measures ranging from
safety-oriented road design to driver assistance systems. The latter require exteroceptive
sensors to acquire information about the vehicle's current environment.
In this thesis an efficient resource allocation for automotive vision systems is proposed.
The notion of allocating resources implies the presence of processes that observe the whole
environment and that are able to effeciently direct attentive processes. Directing attention
constitutes a decision making process dependent upon the environment it operates in, the
goal it pursues, and the sensor resources and computational resources it allocates. The
sensor resources considered in this thesis are a subset of the multi-modal sensor system on
a test vehicle provided by Audi AG, which is also used to evaluate our proposed resource
allocation system.
This thesis presents an original contribution in three respects. First, a system architecture
designed to efficiently allocate both high-resolution sensor resources and computational
expensive processes based upon low-resolution sensor data is proposed. Second,
a novel method to estimate 3-D range motion, e cient scan-patterns for spin image based
classifiers, and an evaluation of track-to-track fusion algorithms present contributions in
the field of data processing methods. Third, a Pareto efficient multi-objective resource
allocation method is formalised, implemented, and evaluated using road traffic test sequences
Soft Biometric Analysis: MultiPerson and RealTime Pedestrian Attribute Recognition in Crowded Urban Environments
Traditionally, recognition systems were only based on human hard biometrics. However,
the ubiquitous CCTV cameras have raised the desire to analyze human biometrics from
far distances, without people attendance in the acquisition process. Highresolution
face closeshots
are rarely available at far distances such that facebased
systems cannot
provide reliable results in surveillance applications. Human soft biometrics such as body
and clothing attributes are believed to be more effective in analyzing human data collected
by security cameras.
This thesis contributes to the human soft biometric analysis in uncontrolled environments
and mainly focuses on two tasks: Pedestrian Attribute Recognition (PAR) and person reidentification
(reid).
We first review the literature of both tasks and highlight the history
of advancements, recent developments, and the existing benchmarks. PAR and person reid
difficulties are due to significant distances between intraclass
samples, which originate
from variations in several factors such as body pose, illumination, background, occlusion,
and data resolution. Recent stateoftheart
approaches present endtoend
models that
can extract discriminative and comprehensive feature representations from people. The
correlation between different regions of the body and dealing with limited learning data
is also the objective of many recent works. Moreover, class imbalance and correlation
between human attributes are specific challenges associated with the PAR problem.
We collect a large surveillance dataset to train a novel gender recognition model suitable
for uncontrolled environments. We propose a deep residual network that extracts several
posewise
patches from samples and obtains a comprehensive feature representation. In
the next step, we develop a model for multiple attribute recognition at once. Considering
the correlation between human semantic attributes and class imbalance, we respectively
use a multitask
model and a weighted loss function. We also propose a multiplication
layer on top of the backbone features extraction layers to exclude the background features
from the final representation of samples and draw the attention of the model to the
foreground area.
We address the problem of person reid
by implicitly defining the receptive fields of
deep learning classification frameworks. The receptive fields of deep learning models
determine the most significant regions of the input data for providing correct decisions.
Therefore, we synthesize a set of learning data in which the destructive regions (e.g.,
background) in each pair of instances are interchanged. A segmentation module
determines destructive and useful regions in each sample, and the label of synthesized
instances are inherited from the sample that shared the useful regions in the synthesized
image. The synthesized learning data are then used in the learning phase and help
the model rapidly learn that the identity and background regions are not correlated.
Meanwhile, the proposed solution could be seen as a data augmentation approach that
fully preserves the label information and is compatible with other data augmentation
techniques.
When reid
methods are learned in scenarios where the target person appears with identical garments in the gallery, the visual appearance of clothes is given the most
importance in the final feature representation. Clothbased
representations are not
reliable in the longterm
reid
settings as people may change their clothes. Therefore,
developing solutions that ignore clothing cues and focus on identityrelevant
features are
in demand. We transform the original data such that the identityrelevant
information of
people (e.g., face and body shape) are removed, while the identityunrelated
cues (i.e.,
color and texture of clothes) remain unchanged. A learned model on the synthesized
dataset predicts the identityunrelated
cues (shortterm
features). Therefore, we train a
second model coupled with the first model and learns the embeddings of the original data
such that the similarity between the embeddings of the original and synthesized data is
minimized. This way, the second model predicts based on the identityrelated
(longterm)
representation of people.
To evaluate the performance of the proposed models, we use PAR and person reid
datasets, namely BIODI, PETA, RAP, Market1501,
MSMTV2,
PRCC, LTCC, and MIT
and compared our experimental results with stateoftheart
methods in the field.
In conclusion, the data collected from surveillance cameras have low resolution, such
that the extraction of hard biometric features is not possible, and facebased
approaches
produce poor results. In contrast, soft biometrics are robust to variations in data quality.
So, we propose approaches both for PAR and person reid
to learn discriminative features
from each instance and evaluate our proposed solutions on several publicly available
benchmarks.This thesis was prepared at the University of Beria Interior, IT Instituto de Telecomunicações, Soft Computing and Image Analysis Laboratory (SOCIA Lab), Covilhã Delegation, and was submitted to the University of Beira Interior for defense in a public examination session
Information structure and the referential status of linguistic expression : workshop as part of the 23th annual meetings of the Deutsche Gesellschaft für Sprachwissenschaft in Leipzig, Leipzig, February 28 - March 2, 2001
This volume comprises papers that were given at the workshop Information Structure and the Referential Status of Linguistic Expressions, which we organized during the Deutsche Gesellschaft für Sprachwissenschaft (DGfS) Conference in Leipzig in February 2001. At this workshop we discussed the connection between information structure and the referential interpretation of linguistic expressions, a topic mostly neglected in current linguistics research. One common aim of the papers is to find out to what extent the focus-background as well as the topic-comment structuring determine the referential interpretation of simple arguments like definite and indefinite NPs on the one hand and sentences on the other
Early aspects: aspect-oriented requirements engineering and architecture design
This paper reports on the third Early Aspects: Aspect-Oriented Requirements Engineering and Architecture Design Workshop, which has been held in Lancaster, UK, on March 21, 2004. The workshop included a presentation session and working sessions in which the particular topics on early aspects were discussed. The primary goal of the workshop was to focus on challenges to defining methodical software development processes for aspects from early on in the software life cycle and explore the potential of proposed methods and techniques to scale up to industrial applications
- …