76 research outputs found
BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision
Single object tracking (SOT) is a fundamental problem in computer vision,
with a wide range of applications, including autonomous driving, augmented
reality, and robot navigation. The robustness of SOT faces two main challenges:
tiny target and fast motion. These challenges are especially manifested in
videos captured by unmanned aerial vehicles (UAV), where the target is usually
far away from the camera and often with significant motion relative to the
camera. To evaluate the robustness of SOT methods, we propose BioDrone -- the
first bionic drone-based visual benchmark for SOT. Unlike existing UAV
datasets, BioDrone features videos captured from a flapping-wing UAV system
with a major camera shake due to its aerodynamics. BioDrone hence highlights
the tracking of tiny targets with drastic changes between consecutive frames,
providing a new robust vision benchmark for SOT. To date, BioDrone offers the
largest UAV-based SOT benchmark with high-quality fine-grained manual
annotations and automatically generates frame-level labels, designed for robust
vision analyses. Leveraging our proposed BioDrone, we conduct a systematic
evaluation of existing SOT methods, comparing the performance of 20
representative models and studying novel means of optimizing a SOTA method
(KeepTrack KeepTrack) for robust SOT. Our evaluation leads to new baselines and
insights for robust SOT. Moving forward, we hope that BioDrone will not only
serve as a high-quality benchmark for robust SOT, but also invite future
research into robust computer vision. The database, toolkits, evaluation
server, and baseline results are available at http://biodrone.aitestunion.com.Comment: This paper is published in IJCV (refer to DOI). Please cite the
published IJC
Robust Audio-Codebooks for Large-Scale Event Detection in Consumer Videos
Abstract In this paper we present our audio based system for detecting "events" within consumer videos (e.g. You Tube) and report our experiments on the TRECVID Multimedia Event Detection (MED) task and development data. Codebook or bag-of-words models have been widely used in text, visual and audio domains and form the state-of-the-art in MED tasks. The overall effectiveness of these models on such datasets depends critically on the choice of low-level features, clustering approach, sampling method, codebook size, weighting schemes and choice of classifier. In this work we empirically evaluate several approaches to model expressive and robust audio codebooks for the task of MED while ensuring compactness. First, we introduce the Large Scale Pooling Features (LSPF) and Stacked Cepstral Features for encoding local temporal information in audio codebooks. Second, we discuss several design decisions for generating and representing expressive audio codebooks and show how they scale to large datasets. Third, we apply text based techniques like Latent Dirichlet Allocation (LDA) to learn acoustictopics as a means of providing compact representation while maintaining performance. By aggregating these decisions into our model, we obtained 11% relative improvement over our baseline audio systems
Demonstration of Robust and Efficient Quantum Property Learning with Shallow Shadows
Extracting information efficiently from quantum systems is a major component
of quantum information processing tasks. Randomized measurements, or classical
shadows, enable predicting many properties of arbitrary quantum states using
few measurements. While random single qubit measurements are experimentally
friendly and suitable for learning low-weight Pauli observables, they perform
poorly for nonlocal observables. Prepending a shallow random quantum circuit
before measurements maintains this experimental friendliness, but also has
favorable sample complexities for observables beyond low-weight Paulis,
including high-weight Paulis and global low-rank properties such as fidelity.
However, in realistic scenarios, quantum noise accumulated with each additional
layer of the shallow circuit biases the results. To address these challenges,
we propose the robust shallow shadows protocol. Our protocol uses Bayesian
inference to learn the experimentally relevant noise model and mitigate it in
postprocessing. This mitigation introduces a bias-variance trade-off:
correcting for noise-induced bias comes at the cost of a larger estimator
variance. Despite this increased variance, as we demonstrate on a
superconducting quantum processor, our protocol correctly recovers state
properties such as expectation values, fidelity, and entanglement entropy,
while maintaining a lower sample complexity compared to the random single qubit
measurement scheme. We also theoretically analyze the effects of noise on
sample complexity and show how the optimal choice of the shallow shadow depth
varies with noise strength. This combined theoretical and experimental analysis
positions the robust shallow shadow protocol as a scalable, robust, and
sample-efficient protocol for characterizing quantum states on current quantum
computing platforms.Comment: 12 pages, 5 figure
Gland Instance Segmentation by Deep Multichannel Side Supervision
Abstract. In this paper, we propose a new image instance segmentation method that segments individual glands (instances) in colon histology images. This is a task called instance segmentation that has recently become increasingly important. The problem is challenging since not only do the glands need to be segmented from the complex background, they are also required to be individually identified. Here we leverage the idea of image-to-image prediction in recent deep learning by building a framework that automatically exploits and fuses complex multichannel information, regional and boundary patterns, with side supervision (deep supervision on side responses) in gland histology images. Our proposed system, deep multichannel side supervision (DMCS), alleviates heavy feature design due to the use of convolutional neural networks guided by side supervision. Compared to methods reported in the 2015 MICCAI Gland Segmentation Challenge, we observe state-of-the-art results based on a number of evaluation metrics
Facile synthesis of graphene sheets intercalated by carbon spheres for high-performance supercapacitor electrodes
The composites consisting of graphene oxides (GOs) and carbon spheres (CSs), which were hydrothermally derived from the aqueous solution of glucose with average diameter of 200 nm, were mechanically mixed, and the GOs/CSs (GCSs) were thermally treated at high temperatures in the range of 700–900 °C. In the GCS composites, the CSs as spacers located between the GO sheets prevent the aggregation and restacking of graphene sheets. The GCS composites (GO/CS = 1) treated at 800 °C (GCS@800) have the high specific capacitances of 272.8 and 197.5 F g−1 in a three-electrode cell at the current density of 0.2 and 10 A g−1, respectively, in 6 M KOH aqueous solution, and demonstrated high rate capability and good cycling stability. The excellent electrochemical performance of the GCS@800 electrode is attributed to its structure with hierarchical porous structures including overwhelming micropores and a few of macropores. This work provides an effective and simple technique by integrating CSs and graphene sheets into composite structures for high-performance energy storage devices
Prototypical few-shot segmentation for cross-institution male pelvic structures with spatial registration
The prowess that makes few-shot learning desirable in medical image analysis is the efficient use of the support image data, which are labelled to classify or segment new classes, a task that otherwise requires substantially more training images and expert annotations. This work describes a fully 3D prototypical few-shot segmentation algorithm, such that the trained networks can be effectively adapted to clinically interesting structures that are absent in training, using only a few labelled images from a different institute. First, to compensate for the widely recognised spatial variability between institutions in episodic adaptation of novel classes, a novel spatial registration mechanism is integrated into prototypical learning, consisting of a segmentation head and an spatial alignment module. Second, to assist the training with observed imperfect alignment, support mask conditioning module is proposed to further utilise the annotation available from the support images. Extensive experiments are presented in an application of segmenting eight anatomical structures important for interventional planning, using a data set of 589 pelvic T2-weighted MR images, acquired at seven institutes. The results demonstrate the efficacy in each of the 3D formulation, the spatial registration, and the support mask conditioning, all of which made positive contributions independently or collectively. Compared with the previously proposed 2D alternatives, the few-shot segmentation performance was improved with statistical significance, regardless whether the support data come from the same or different institutes
Prototypical few-shot segmentation for cross-institution male pelvic structures with spatial registration
The prowess that makes few-shot learning desirable in medical image analysis
is the efficient use of the support image data, which are labelled to classify
or segment new classes, a task that otherwise requires substantially more
training images and expert annotations. This work describes a fully 3D
prototypical few-shot segmentation algorithm, such that the trained networks
can be effectively adapted to clinically interesting structures that are absent
in training, using only a few labelled images from a different institute.
First, to compensate for the widely recognised spatial variability between
institutions in episodic adaptation of novel classes, a novel spatial
registration mechanism is integrated into prototypical learning, consisting of
a segmentation head and an spatial alignment module. Second, to assist the
training with observed imperfect alignment, support mask conditioning module is
proposed to further utilise the annotation available from the support images.
Extensive experiments are presented in an application of segmenting eight
anatomical structures important for interventional planning, using a data set
of 589 pelvic T2-weighted MR images, acquired at seven institutes. The results
demonstrate the efficacy in each of the 3D formulation, the spatial
registration, and the support mask conditioning, all of which made positive
contributions independently or collectively. Compared with the previously
proposed 2D alternatives, the few-shot segmentation performance was improved
with statistical significance, regardless whether the support data come from
the same or different institutes.Comment: accepted by Medical Image Analysi
- …