267 research outputs found
Blood vessel enhancement via multi-dictionary and sparse coding: Application to retinal vessel enhancing
International audienceBlood vessel images can provide considerable information of many diseases, which are widely used by ophthalmologists for disease diagnosis and surgical planning. In this paper, we propose a novel method for the blood Vessel Enhancement via Multi-dictionary and Sparse Coding (VE-MSC). In the proposed method, two dictionaries are utilized to gain the vascular structures and details, including the Representation Dictionary (RD) generated from the original vascular images and the Enhancement Dictionary (ED) extracted from the corresponding label images. The sparse coding technology is utilized to represent the original target vessel image with RD. After that, the enhanced target vessel image can be reconstructed using the obtained sparse coefficients and ED. The proposed method has been evaluated for the retinal vessel enhancement on the DRIVE and STARE databases. Experimental results indicate that the proposed method can not only effectively improve the image contrast but also enhance the retinal vascular structures and details
A bioinspired flexible optical sensor for force and orientation sensing
Flexible optical sensors have been an emerging paradigm for applications in robotics, healthcare, and human–machine interfaces due to their high sensitivity, fast response, and anti-electromagnetic interference. Recently, Marques reports a bioinspired multifunctional flexible optical sensor (BioMFOS), achieving a forces sensitivity of 13.28 μN, and a spatial resolution of 0.02 mm. The BioMFOS has a small dimension (around 2 cm) and a light weight (0.8 g), making it suitable for wearable application and clothing integration. As proof-of-concept demonstrations, monitoring of finger position, trunk movements, and respiration rate are realized, implying their prominent applications in remote healthcare, intelligent robots, assistance devices teleoperation, and human-machine interfaces
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection
The relation modeling between actors and scene context advances video action
detection where the correlation of multiple actors makes their action
recognition challenging. Existing studies model each actor and scene relation
to improve action recognition. However, the scene variations and background
interference limit the effectiveness of this relation modeling. In this paper,
we propose to select actor-related scene context, rather than directly leverage
raw video scenario, to improve relation modeling. We develop a Cycle
Actor-Context Relation network (CycleACR) where there is a symmetric graph that
models the actor and context relations in a bidirectional form. Our CycleACR
consists of the Actor-to-Context Reorganization (A2C-R) that collects actor
features for context feature reorganizations, and the Context-to-Actor
Enhancement (C2A-E) that dynamically utilizes reorganized context features for
actor feature enhancement. Compared to existing designs that focus on C2A-E,
our CycleACR introduces A2C-R for a more effective relation modeling. This
modeling advances our CycleACR to achieve state-of-the-art performance on two
popular action detection datasets (i.e., AVA and UCF101-24). We also provide
ablation studies and visualizations as well to show how our cycle actor-context
relation modeling improves video action detection. Code is available at
https://github.com/MCG-NJU/CycleACR.Comment: technical repor
Memory-and-Anticipation Transformer for Online Action Understanding
Most existing forecasting systems are memory-based methods, which attempt to
mimic human forecasting ability by employing various memory mechanisms and have
progressed in temporal modeling for memory dependency. Nevertheless, an obvious
weakness of this paradigm is that it can only model limited historical
dependence and can not transcend the past. In this paper, we rethink the
temporal dependence of event evolution and propose a novel
memory-anticipation-based paradigm to model an entire temporal structure,
including the past, present, and future. Based on this idea, we present
Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based
approach, to address the online action detection and anticipation tasks. In
addition, owing to the inherent superiority of MAT, it can process online
action detection and anticipation tasks in a unified manner. The proposed MAT
model is tested on four challenging benchmarks TVSeries, THUMOS'14, HDD, and
EPIC-Kitchens-100, for online action detection and anticipation tasks, and it
significantly outperforms all existing methods. Code is available at
https://github.com/Echo0125/Memory-and-Anticipation-Transformer.Comment: ICCV 2023 Camera Read
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection
Temporal action detection (TAD) is extensively studied in the video
understanding community by generally following the object detection pipeline in
images. However, complex designs are not uncommon in TAD, such as two-stream
feature extraction, multi-stage training, complex temporal modeling, and global
context fusion. In this paper, we do not aim to introduce any novel technique
for TAD. Instead, we study a simple, straightforward, yet must-known baseline
given the current status of complex design and low detection efficiency in TAD.
In our simple baseline (termed BasicTAD), we decompose the TAD pipeline into
several essential components: data sampling, backbone design, neck
construction, and detection head. We extensively investigate the existing
techniques in each component for this baseline, and more importantly, perform
end-to-end training over the entire pipeline thanks to the simplicity of
design. As a result, this simple BasicTAD yields an astounding and real-time
RGB-Only baseline very close to the state-of-the-art methods with two-stream
inputs. In addition, we further improve the BasicTAD by preserving more
temporal and spatial information in network representation (termed as PlusTAD).
Empirical results demonstrate that our PlusTAD is very efficient and
significantly outperforms the previous methods on the datasets of THUMOS14 and
FineAction. Meanwhile, we also perform in-depth visualization and error
analysis on our proposed method and try to provide more insights on the TAD
problem. Our approach can serve as a strong baseline for future TAD research.
The code and model will be released at https://github.com/MCG-NJU/BasicTAD.Comment: Accepted by CVI
- …