9,015 research outputs found
Deep Siamese Networks toward Robust Visual Tracking
Recently, Siamese neural networks have been widely used in visual object tracking to leverage the template matching mechanism. Siamese network architecture contains two parallel streams to estimate the similarity between two inputs and has the ability to learn their discriminative features. Various deep Siamese-based tracking frameworks have been proposed to estimate the similarity between the target and the search region. In this chapter, we categorize deep Siamese networks into three categories by the position of the merging layers as late merge, intermediate merge and early merge architectures. In the late merge architecture, inputs are processed as two separate streams and merged at the end of the network, while in the intermediate merge architecture, inputs are initially processed separately and merged intermediate well before the final layer. Whereas in the early merge architecture, inputs are combined at the start of the network and a unified data stream is processed by a single convolutional neural network. We evaluate the performance of deep Siamese trackers based on the merge architectures and their output such as similarity score, response map, and bounding box in various tracking challenges. This chapter will give an overview of the recent development in deep Siamese trackers and provide insights for the new developments in the tracking field
Parameterizing animal sounds and motion with animal-attached tags to study acoustic communication
Funding: Dolphin Quest, Inc.; School of Biology, University of St Andrews; Scottish Universities Life Sciences Alliance; Office of Naval Research; Marine Alliance for Science and Technology for Scotland; Horizon H2020.Stemming from the traditional use of field observers to score states and events, the study of animal behaviour often relies on analyses of discrete behavioural categories. Many studies of acoustic communication record sequences of animal sounds, classify vocalizations, and then examine how call categories are used relative to behavioural states and events. However, acoustic parameters can also convey information independent of call type, offering complementary study approaches to call classifications. Animal-attached tags can continuously sample high-resolution behavioural data on sounds and movements, which enables testing how acoustic parameters of signals relate to parameters of animal motion. Here, we present this approach through case studies on wild common bottlenose dolphins (Tursiops truncatus). Using data from sound-and-movement recording tags deployed in Sarasota (FL), we parameterized dolphin vocalizations and motion to investigate how senders and receivers modified movement parameters (including vectorial dynamic body acceleration, “VeDBA”, a proxy for activity intensity) as a function of signal parameters. We show that (1) VeDBA of one female during consortships had a negative relationship with centroid frequency of male calls, matching predictions about agonistic interactions based on motivation-structural rules; (2) VeDBA of four males had a positive relationship with modulation rate of their pulsed vocalizations, confirming predictions that click-repetition rate of these calls increases with agonism intensity. Tags offer opportunities to study animal behaviour through analyses of continuously sampled quantitative parameters, which can complement traditional methods and facilitate research replication. Our case studies illustrate the value of this approach to investigate communicative roles of acoustic parameter changes.Publisher PDFPeer reviewe
Autonomous aerial robot for high-speed search and intercept applications
In recent years, high-speed navigation and environment interaction in the context of
aerial robotics has become a field of interest for several academic and industrial research studies. In
particular, Search and Intercept (SaI) applications for aerial robots pose a compelling research
area due to their potential usability in several environments. Nevertheless, SaI tasks involve a
challenging development regarding sensory weight, onboard computation resources, actuation design,
and algorithms for perception and control, among others. In this work, a fully autonomous aerial
robot for high-speed object grasping has been proposed. As an additional subtask, our system is able
to autonomously pierce balloons located in poles close to the surface. Our first contribution is the
design of the aerial robot at an actuation and sensory level consisting of a novel gripper design with
additional sensors enabling the robot to grasp objects at high speeds. The second contribution is
a complete software framework consisting of perception, state estimation, motion planning, motion
control, and mission control in order to rapidly and robustly perform the autonomous grasping
mission. Our approach has been validated in a challenging international competition and has shown
outstanding results, being able to autonomously search, follow, and grasp a moving object at 6 m/s
in an outdoor environment.Agencia Estatal de InvestigaciónKhalifa Universit
Correlation Filters for Unmanned Aerial Vehicle-Based Aerial Tracking: A Review and Experimental Evaluation
Aerial tracking, which has exhibited its omnipresent dedication and splendid
performance, is one of the most active applications in the remote sensing
field. Especially, unmanned aerial vehicle (UAV)-based remote sensing system,
equipped with a visual tracking approach, has been widely used in aviation,
navigation, agriculture,transportation, and public security, etc. As is
mentioned above, the UAV-based aerial tracking platform has been gradually
developed from research to practical application stage, reaching one of the
main aerial remote sensing technologies in the future. However, due to the
real-world onerous situations, e.g., harsh external challenges, the vibration
of the UAV mechanical structure (especially under strong wind conditions), the
maneuvering flight in complex environment, and the limited computation
resources onboard, accuracy, robustness, and high efficiency are all crucial
for the onboard tracking methods. Recently, the discriminative correlation
filter (DCF)-based trackers have stood out for their high computational
efficiency and appealing robustness on a single CPU, and have flourished in the
UAV visual tracking community. In this work, the basic framework of the
DCF-based trackers is firstly generalized, based on which, 23 state-of-the-art
DCF-based trackers are orderly summarized according to their innovations for
solving various issues. Besides, exhaustive and quantitative experiments have
been extended on various prevailing UAV tracking benchmarks, i.e., UAV123,
UAV123@10fps, UAV20L, UAVDT, DTB70, and VisDrone2019-SOT, which contain 371,903
frames in total. The experiments show the performance, verify the feasibility,
and demonstrate the current challenges of DCF-based trackers onboard UAV
tracking.Comment: 28 pages, 10 figures, submitted to GRS
Non-contact measures to monitor hand movement of people with rheumatoid arthritis using a monocular RGB camera
Hand movements play an essential role in a person’s ability to interact with the environment. In hand biomechanics, the range of joint motion is a crucial metric to quantify changes due to degenerative pathologies, such as rheumatoid arthritis (RA). RA is a chronic condition where the immune system mistakenly attacks the joints, particularly those in the hands. Optoelectronic motion capture systems are gold-standard tools to quantify changes but are challenging to adopt outside laboratory settings. Deep learning executed on standard video data can capture RA participants in their natural environments, potentially supporting objectivity in remote consultation.
The three main research aims in this thesis were 1) to assess the extent to which current deep learning architectures, which have been validated for quantifying motion of other body segments, can be applied to hand kinematics using monocular RGB cameras, 2) to localise where in videos the hand motions of interest are to be found, 3) to assess the validity of 1) and 2) to determine disease status in RA.
First, hand kinematics for twelve healthy participants, captured with OpenPose were benchmarked against those captured using an optoelectronic system, showing acceptable instrument errors below 10°. Then, a gesture classifier was tested to segment video recordings of twenty-two healthy participants, achieving an accuracy of 93.5%. Finally, OpenPose and the classifier were applied to videos of RA participants performing hand exercises to determine disease status. The inferred disease activity exhibited agreement with the in-person ground truth in nine out of ten instances, outperforming virtual consultations, which agreed only six times out of ten.
These results demonstrate that this approach is more effective than estimated disease activity performed by human experts during video consultations. The end goal sets the foundation for a tool that RA participants can use to observe their disease activity from their home.Open Acces
Deep Active Learning Explored Across Diverse Label Spaces
abstract: Deep learning architectures have been widely explored in computer vision and have
depicted commendable performance in a variety of applications. A fundamental challenge
in training deep networks is the requirement of large amounts of labeled training
data. While gathering large quantities of unlabeled data is cheap and easy, annotating
the data is an expensive process in terms of time, labor and human expertise.
Thus, developing algorithms that minimize the human effort in training deep models
is of immense practical importance. Active learning algorithms automatically identify
salient and exemplar samples from large amounts of unlabeled data and can augment
maximal information to supervised learning models, thereby reducing the human annotation
effort in training machine learning models. The goal of this dissertation is to
fuse ideas from deep learning and active learning and design novel deep active learning
algorithms. The proposed learning methodologies explore diverse label spaces to
solve different computer vision applications. Three major contributions have emerged
from this work; (i) a deep active framework for multi-class image classication, (ii)
a deep active model with and without label correlation for multi-label image classi-
cation and (iii) a deep active paradigm for regression. Extensive empirical studies
on a variety of multi-class, multi-label and regression vision datasets corroborate the
potential of the proposed methods for real-world applications. Additional contributions
include: (i) a multimodal emotion database consisting of recordings of facial
expressions, body gestures, vocal expressions and physiological signals of actors enacting
various emotions, (ii) four multimodal deep belief network models and (iii)
an in-depth analysis of the effect of transfer of multimodal emotion features between
source and target networks on classification accuracy and training time. These related
contributions help comprehend the challenges involved in training deep learning
models and motivate the main goal of this dissertation.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
- …