8,287 research outputs found
Gold nanorods as molecular contrast agents in photoacoustic imaging: the promises and the caveats\ud
Rod-shaped gold nanoparticles exhibit intense and narrow absorption peaks for light in the far-red and near-infrared wavelength regions, owing to the excitation of longitudinal plasmons. Light absorption is followed predominantly by non radiative de-excitation, and the released heat and subsequent temperature rise cause strong photoacoustic (optoacoustic) signals to be produced. This feature combined with the relative inertness of gold, and its favorable surface chemistry, which permits affinity biomolecule coupling, has seen gold nanorods (AuNR) attracting much attention as contrast agents and molecular probes for photoacoustic imaging. In this article we provide an short overview of the current status of the use of AuNR in molecular imaging using photoacoustics. We further examine the state of the art in various chemical, physical and biochemical phenomena that have implications for the future photoacoustic applications of these particles. We cover the route through fine-tuning of AuNR synthetic procedures, toxicity reduction by appropriate coatings, in vitro cellular interactions of AuNRs, attachment of targeting antibodies, in vivo fate of the particles and the effects of certain light interactions with the AuN
Artistic-style text detector and a new Movie-Poster dataset
Although current text detection algorithms demonstrate effectiveness in
general scenarios, their performance declines when confronted with
artistic-style text featuring complex structures. This paper proposes a method
that utilizes Criss-Cross Attention and residual dense block to address the
incomplete and misdiagnosis of artistic-style text detection by current
algorithms. Specifically, our method mainly consists of a feature extraction
backbone, a feature enhancement network, a multi-scale feature fusion module,
and a boundary discrimination module. The feature enhancement network
significantly enhances the model's perceptual capabilities in complex
environments by fusing horizontal and vertical contextual information, allowing
it to capture detailed features overlooked in artistic-style text. We
incorporate residual dense block into the Feature Pyramid Network to suppress
the effect of background noise during feature fusion. Aiming to omit the
complex post-processing, we explore a boundary discrimination module that
guides the correct generation of boundary proposals. Furthermore, given that
movie poster titles often use stylized art fonts, we collected a Movie-Poster
dataset to address the scarcity of artistic-style text data. Extensive
experiments demonstrate that our proposed method performs superiorly on the
Movie-Poster dataset and produces excellent results on multiple benchmark
datasets. The code and the Movie-Poster dataset will be available at:
https://github.com/biedaxiaohua/Artistic-style-text-detectio
BPDO:Boundary Points Dynamic Optimization for Arbitrary Shape Scene Text Detection
Arbitrary shape scene text detection is of great importance in scene
understanding tasks. Due to the complexity and diversity of text in natural
scenes, existing scene text algorithms have limited accuracy for detecting
arbitrary shape text. In this paper, we propose a novel arbitrary shape scene
text detector through boundary points dynamic optimization(BPDO). The proposed
model is designed with a text aware module (TAM) and a boundary point dynamic
optimization module (DOM). Specifically, the model designs a text aware module
based on segmentation to obtain boundary points describing the central region
of the text by extracting a priori information about the text region. Then,
based on the idea of deformable attention, it proposes a dynamic optimization
model for boundary points, which gradually optimizes the exact position of the
boundary points based on the information of the adjacent region of each
boundary point. Experiments on CTW-1500, Total-Text, and MSRA-TD500 datasets
show that the model proposed in this paper achieves a performance that is
better than or comparable to the state-of-the-art algorithm, proving the
effectiveness of the model.Comment: Accepted to ICASSP 202
What's Wrong with the Bottom-up Methods in Arbitrary-shape Scene Text Detection
The latest trend in the bottom-up perspective for arbitrary-shape scene text
detection is to reason the links between text segments using Graph
Convolutional Network (GCN). Notwithstanding, the performance of the best
performing bottom-up method is still inferior to that of the best performing
top-down method even with the help of GCN. We argue that this is not mainly
caused by the limited feature capturing ability of the text proposal backbone
or GCN, but by their failure to make a full use of visual-relational features
for suppressing false detection, as well as the sub-optimal route-finding
mechanism used for grouping text segments. In this paper, we revitalize the
classic text detection frameworks by aggregating the visual-relational features
of text with two effective false positive/negative suppression mechanisms.
First, dense overlapping text segments depicting the `characterness' and
`streamline' of text are generated for further relational reasoning and weakly
supervised segment classification. Here, relational graph features are used for
suppressing false positives/negatives. Then, to fuse the relational features
with visual features, a Location-Aware Transfer (LAT) module is designed to
transfer text's relational features into visual compatible features with a Fuse
Decoding (FD) module to enhance the representation of text regions for the
second step suppression. Finally, a novel multiple-text-map-aware
contour-approximation strategy is developed, instead of the widely-used
route-finding process. Experiments conducted on five benchmark datasets, i.e.,
CTW1500, Total-Text, ICDAR2015, MSRA-TD500, and MLT2017 demonstrate that our
method outperforms the state-of-the-art performance when being embedded in a
classic text detection framework, which revitalises the superb strength of the
bottom-up methods.Comment: Accepted by Trans. on Multimedi
Detection and Rectification of Arbitrary Shaped Scene Texts by using Text Keypoints and Links
Detection and recognition of scene texts of arbitrary shapes remain a grand
challenge due to the super-rich text shape variation in text line orientations,
lengths, curvatures, etc. This paper presents a mask-guided multi-task network
that detects and rectifies scene texts of arbitrary shapes reliably. Three
types of keypoints are detected which specify the centre line and so the shape
of text instances accurately. In addition, four types of keypoint links are
detected of which the horizontal links associate the detected keypoints of each
text instance and the vertical links predict a pair of landmark points (for
each keypoint) along the upper and lower text boundary, respectively. Scene
texts can be located and rectified by linking up the associated landmark points
(giving localization polygon boxes) and transforming the polygon boxes via thin
plate spline, respectively. Extensive experiments over several public datasets
show that the use of text keypoints is tolerant to the variation in text
orientations, lengths, and curvatures, and it achieves superior scene text
detection and rectification performance as compared with state-of-the-art
methods
CT-Net: Arbitrary-Shaped Text Detection via Contour Transformer
Contour based scene text detection methods have rapidly developed recently,
but still suffer from inaccurate frontend contour initialization, multi-stage
error accumulation, or deficient local information aggregation. To tackle these
limitations, we propose a novel arbitrary-shaped scene text detection framework
named CT-Net by progressive contour regression with contour transformers.
Specifically, we first employ a contour initialization module that generates
coarse text contours without any post-processing. Then, we adopt contour
refinement modules to adaptively refine text contours in an iterative manner,
which are beneficial for context information capturing and progressive global
contour deformation. Besides, we propose an adaptive training strategy to
enable the contour transformers to learn more potential deformation paths, and
introduce a re-score mechanism that can effectively suppress false positives.
Extensive experiments are conducted on four challenging datasets, which
demonstrate the accuracy and efficiency of our CT-Net over state-of-the-art
methods. Particularly, CT-Net achieves F-measure of 86.1 at 11.2 frames per
second (FPS) and F-measure of 87.8 at 10.1 FPS for CTW1500 and Total-Text
datasets, respectively.Comment: This paper has been accepted by IEEE Transactions on Circuits and
  Systems for Video Technolog
Towards Robust Real-Time Scene Text Detection: From Semantic to Instance Representation Learning
Due to the flexible representation of arbitrary-shaped scene text and simple
pipeline, bottom-up segmentation-based methods begin to be mainstream in
real-time scene text detection. Despite great progress, these methods show
deficiencies in robustness and still suffer from false positives and instance
adhesion. Different from existing methods which integrate multiple-granularity
features or multiple outputs, we resort to the perspective of representation
learning in which auxiliary tasks are utilized to enable the encoder to jointly
learn robust features with the main task of per-pixel classification during
optimization. For semantic representation learning, we propose global-dense
semantic contrast (GDSC), in which a vector is extracted for global semantic
representation, then used to perform element-wise contrast with the dense grid
features. To learn instance-aware representation, we propose to combine
top-down modeling (TDM) with the bottom-up framework to provide implicit
instance-level clues for the encoder. With the proposed GDSC and TDM, the
encoder network learns stronger representation without introducing any
parameters and computations during inference. Equipped with a very light
decoder, the detector can achieve more robust real-time scene text detection.
Experimental results on four public datasets show that the proposed method can
outperform or be comparable to the state-of-the-art on both accuracy and speed.
Specifically, the proposed method achieves 87.2% F-measure with 48.2 FPS on
Total-Text and 89.6% F-measure with 36.9 FPS on MSRA-TD500 on a single GeForce
RTX 2080 Ti GPU.Comment: Accepted by ACM MM 202
- …
