116 research outputs found
Not just Birds and Cars: Generic, Scalable and Explainable Models for Professional Visual Recognition
Some visual recognition tasks are more challenging then the general ones as
they require professional categories of images. The previous efforts, like
fine-grained vision classification, primarily introduced models tailored to
specific tasks, like identifying bird species or car brands with limited
scalability and generalizability. This paper aims to design a scalable and
explainable model to solve Professional Visual Recognition tasks from a generic
standpoint. We introduce a biologically-inspired structure named Pro-NeXt and
reveal that Pro-NeXt exhibits substantial generalizability across diverse
professional fields such as fashion, medicine, and art-areas previously
considered disparate. Our basic-sized Pro-NeXt-B surpasses all preceding
task-specific models across 12 distinct datasets within 5 diverse domains.
Furthermore, we find its good scaling property that scaling up Pro-NeXt in
depth and width with increasing GFlops can consistently enhances its accuracy.
Beyond scalability and adaptability, the intermediate features of Pro-NeXt
achieve reliable object detection and segmentation performance without extra
training, highlighting its solid explainability. We will release the code to
foster further research in this area.Comment: 20 pages including reference. arXiv admin note: text overlap with
arXiv:2211.1567
Keep Your Eye on the Best: Contrastive Regression Transformer for Skill Assessment in Robotic Surgery
This letter proposes a novel video-based, contrastive regression architecture, Contra-Sformer, for automated surgical skill assessment in robot-assisted surgery. The proposed framework is structured to capture the differences in the surgical performance, between a test video and a reference video which represents optimal surgical execution. A feature extractor combining a spatial component (ResNet-18), supervised on frame-level with gesture labels, and a temporal component (TCN), generates spatio-temporal feature matrices of the test and reference videos. These are then fed into an action-aware Transformer with multi-head attention that produces inter-video contrastive features at frame level, representative of the skill similarity/deviation between the two videos. Moments of sub-optimal performance can be identified and temporally localized in the obtained feature vectors, which are ultimately used to regress the manually assigned skill scores. Validated on the JIGSAWS dataset, Contra-Sformer achieves competitive performance (Spearman 0.65 - 0.89), with a normalized mean absolute error between 5.8% - 13.4% on all tasks and across validation setups. Source code and models are available at https://github.com/anastadimi/Contra-Sformer.git
Expression and function analysis of wheat expasin genes expa2 and expb1
Expansins are a group of plant cell wall loosening proteins that play important roles in plant growth and development. In this study, we performed the first study on the molecular characterization, transcriptional expression and functional properties of two wheat expansin genes TaEXPA2 and TaEXPB1. The results indicated that TaEXPA2 and TaEXPB1 genes had typical structural features of plant expansin gene family. As a member of alpha-expansins, TaEXPA2 is closely related to rice OsEXPA17 while the beta-expansin member TaEXPB1 has closely phylogenetic relationships with rice OsEXPAB4. The genetic transformation to Arabidopsis showed that both TaEXPA2 and TaEXPB1 were located in cell wall and highly expressed in roots, leaves and seeds. Overexpression of TaEXPA2 and TaEXPB1 genes showed similar functions, causing rapid root elongation, early bolting, and increases in leaves number, rosette diameter and stems length. These results demonstrated that wheat expansin genes TaEXPA1 and TaEXPB2 can enhance plant growth and development
Obfuscation-resilient Android Malware Analysis Based on Contrastive Learning
Due to its open-source nature, Android operating system has been the main
target of attackers to exploit. Malware creators always perform different code
obfuscations on their apps to hide malicious activities. Features extracted
from these obfuscated samples through program analysis contain many useless and
disguised features, which leads to many false negatives. To address the issue,
in this paper, we demonstrate that obfuscation-resilient malware analysis can
be achieved through contrastive learning. We take the Android malware
classification as an example to demonstrate our analysis. The key insight
behind our analysis is that contrastive learning can be used to reduce the
difference introduced by obfuscation while amplifying the difference between
malware and benign apps (or other types of malware).
Based on the proposed analysis, we design a system that can achieve robust
and interpretable classification of Android malware. To achieve robust
classification, we perform contrastive learning on malware samples to learn an
encoder that can automatically extract robust features from malware samples. To
achieve interpretable classification, we transform the function call graph of a
sample into an image by centrality analysis. Then the corresponding heatmaps
are obtained by visualization techniques. These heatmaps can help users
understand why the malware is classified as this family. We implement IFDroid
and perform extensive evaluations on two widely used datasets. Experimental
results show that IFDroid is superior to state-of-the-art Android malware
familial classification systems. Moreover, IFDroid is capable of maintaining
98.2% true positive rate on classifying 8,112 obfuscated malware samples
- …