706 research outputs found
Morphological Classification of Radio Galaxies using Semi-Supervised Group Equivariant CNNs
Out of the estimated few trillion galaxies, only around a million have been
detected through radio frequencies, and only a tiny fraction, approximately a
thousand, have been manually classified. We have addressed this disparity
between labeled and unlabeled images of radio galaxies by employing a
semi-supervised learning approach to classify them into the known
Fanaroff-Riley Type I (FRI) and Type II (FRII) categories. A Group Equivariant
Convolutional Neural Network (G-CNN) was used as an encoder of the
state-of-the-art self-supervised methods SimCLR (A Simple Framework for
Contrastive Learning of Visual Representations) and BYOL (Bootstrap Your Own
Latent). The G-CNN preserves the equivariance for the Euclidean Group E(2),
enabling it to effectively learn the representation of globally oriented
feature maps. After representation learning, we trained a fully-connected
classifier and fine-tuned the trained encoder with labeled data. Our findings
demonstrate that our semi-supervised approach outperforms existing
state-of-the-art methods across several metrics, including cluster quality,
convergence rate, accuracy, precision, recall, and the F1-score. Moreover,
statistical significance testing via a t-test revealed that our method
surpasses the performance of a fully supervised G-CNN. This study emphasizes
the importance of semi-supervised learning in radio galaxy classification,
where labeled data are still scarce, but the prospects for discovery are
immense.Comment: 9 pages, 6 figures, accepted in INNS Deep Learning Innovations and
Applications (INNS DLIA 2023) workshop, IJCNN 2023, to be published in
Procedia Computer Scienc
OFAR: A Multimodal Evidence Retrieval Framework for Illegal Live-streaming Identification
Illegal live-streaming identification, which aims to help live-streaming
platforms immediately recognize the illegal behaviors in the live-streaming,
such as selling precious and endangered animals, plays a crucial role in
purifying the network environment. Traditionally, the live-streaming platform
needs to employ some professionals to manually identify the potential illegal
live-streaming. Specifically, the professional needs to search for related
evidence from a large-scale knowledge database for evaluating whether a given
live-streaming clip contains illegal behavior, which is time-consuming and
laborious. To address this issue, in this work, we propose a multimodal
evidence retrieval system, named OFAR, to facilitate the illegal live-streaming
identification. OFAR consists of three modules: Query Encoder, Document
Encoder, and MaxSim-based Contrastive Late Intersection. Both query encoder and
document encoder are implemented with the advanced OFA encoder, which is
pretrained on a large-scale multimodal dataset. In the last module, we
introduce contrastive learning on the basis of the MaxiSim-based late
intersection, to enhance the model's ability of query-document matching. The
proposed framework achieves significant improvement on our industrial dataset
TaoLive, demonstrating the advances of our scheme
Align Yourself: Self-supervised Pre-training for Fine-grained Recognition via Saliency Alignment
Self-supervised contrastive learning has demonstrated great potential in
learning visual representations. Despite their success on various downstream
tasks such as image classification and object detection, self-supervised
pre-training for fine-grained scenarios is not fully explored. In this paper,
we first point out that current contrastive methods are prone to memorizing
background/foreground texture and therefore have a limitation in localizing the
foreground object. Analysis suggests that learning to extract discriminative
texture information and localization are equally crucial for self-supervised
pre-training in fine-grained scenarios. Based on our findings, we introduce
cross-view saliency alignment (CVSA), a contrastive learning framework that
first crops and swaps saliency regions of images as a novel view generation and
then guides the model to localize on the foreground object via a cross-view
alignment loss. Extensive experiments on four popular fine-grained
classification benchmarks show that CVSA significantly improves the learned
representation.Comment: The second version of CVSA. 10 pages, 4 figure
One-shot domain adaptation in video-based assessment of surgical skills
Deep Learning (DL) has achieved automatic and objective assessment of
surgical skills. However, DL models are data-hungry and restricted to their
training domain. This prevents them from transitioning to new tasks where data
is limited. Hence, domain adaptation is crucial to implement DL in real life.
Here, we propose a meta-learning model, A-VBANet, that can deliver
domain-agnostic surgical skill classification via one-shot learning. We develop
the A-VBANet on five laparoscopic and robotic surgical simulators.
Additionally, we test it on operating room (OR) videos of laparoscopic
cholecystectomy. Our model successfully adapts with accuracies up to 99.5% in
one-shot and 99.9% in few-shot settings for simulated tasks and 89.7% for
laparoscopic cholecystectomy. For the first time, we provide a domain-agnostic
procedure for video-based assessment of surgical skills. A significant
implication of this approach is that it allows the use of data from surgical
simulators to assess performance in the operating room.Comment: 12 pages (+9 pages of Supplementary Materials), 4 figures (+2
Supplementary Figures), 2 tables (+5 Supplementary Tables
- …