116 research outputs found

    Not just Birds and Cars: Generic, Scalable and Explainable Models for Professional Visual Recognition

    Full text link
    Some visual recognition tasks are more challenging then the general ones as they require professional categories of images. The previous efforts, like fine-grained vision classification, primarily introduced models tailored to specific tasks, like identifying bird species or car brands with limited scalability and generalizability. This paper aims to design a scalable and explainable model to solve Professional Visual Recognition tasks from a generic standpoint. We introduce a biologically-inspired structure named Pro-NeXt and reveal that Pro-NeXt exhibits substantial generalizability across diverse professional fields such as fashion, medicine, and art-areas previously considered disparate. Our basic-sized Pro-NeXt-B surpasses all preceding task-specific models across 12 distinct datasets within 5 diverse domains. Furthermore, we find its good scaling property that scaling up Pro-NeXt in depth and width with increasing GFlops can consistently enhances its accuracy. Beyond scalability and adaptability, the intermediate features of Pro-NeXt achieve reliable object detection and segmentation performance without extra training, highlighting its solid explainability. We will release the code to foster further research in this area.Comment: 20 pages including reference. arXiv admin note: text overlap with arXiv:2211.1567

    Keep Your Eye on the Best: Contrastive Regression Transformer for Skill Assessment in Robotic Surgery

    Get PDF
    This letter proposes a novel video-based, contrastive regression architecture, Contra-Sformer, for automated surgical skill assessment in robot-assisted surgery. The proposed framework is structured to capture the differences in the surgical performance, between a test video and a reference video which represents optimal surgical execution. A feature extractor combining a spatial component (ResNet-18), supervised on frame-level with gesture labels, and a temporal component (TCN), generates spatio-temporal feature matrices of the test and reference videos. These are then fed into an action-aware Transformer with multi-head attention that produces inter-video contrastive features at frame level, representative of the skill similarity/deviation between the two videos. Moments of sub-optimal performance can be identified and temporally localized in the obtained feature vectors, which are ultimately used to regress the manually assigned skill scores. Validated on the JIGSAWS dataset, Contra-Sformer achieves competitive performance (Spearman 0.65 - 0.89), with a normalized mean absolute error between 5.8% - 13.4% on all tasks and across validation setups. Source code and models are available at https://github.com/anastadimi/Contra-Sformer.git

    Expression and function analysis of wheat expasin genes expa2 and expb1

    Get PDF
    Expansins are a group of plant cell wall loosening proteins that play important roles in plant growth and development. In this study, we performed the first study on the molecular characterization, transcriptional expression and functional properties of two wheat expansin genes TaEXPA2 and TaEXPB1. The results indicated that TaEXPA2 and TaEXPB1 genes had typical structural features of plant expansin gene family. As a member of alpha-expansins, TaEXPA2 is closely related to rice OsEXPA17 while the beta-expansin member TaEXPB1 has closely phylogenetic relationships with rice OsEXPAB4. The genetic transformation to Arabidopsis showed that both TaEXPA2 and TaEXPB1 were located in cell wall and highly expressed in roots, leaves and seeds. Overexpression of TaEXPA2 and TaEXPB1 genes showed similar functions, causing rapid root elongation, early bolting, and increases in leaves number, rosette diameter and stems length. These results demonstrated that wheat expansin genes TaEXPA1 and TaEXPB2 can enhance plant growth and development

    Obfuscation-resilient Android Malware Analysis Based on Contrastive Learning

    Full text link
    Due to its open-source nature, Android operating system has been the main target of attackers to exploit. Malware creators always perform different code obfuscations on their apps to hide malicious activities. Features extracted from these obfuscated samples through program analysis contain many useless and disguised features, which leads to many false negatives. To address the issue, in this paper, we demonstrate that obfuscation-resilient malware analysis can be achieved through contrastive learning. We take the Android malware classification as an example to demonstrate our analysis. The key insight behind our analysis is that contrastive learning can be used to reduce the difference introduced by obfuscation while amplifying the difference between malware and benign apps (or other types of malware). Based on the proposed analysis, we design a system that can achieve robust and interpretable classification of Android malware. To achieve robust classification, we perform contrastive learning on malware samples to learn an encoder that can automatically extract robust features from malware samples. To achieve interpretable classification, we transform the function call graph of a sample into an image by centrality analysis. Then the corresponding heatmaps are obtained by visualization techniques. These heatmaps can help users understand why the malware is classified as this family. We implement IFDroid and perform extensive evaluations on two widely used datasets. Experimental results show that IFDroid is superior to state-of-the-art Android malware familial classification systems. Moreover, IFDroid is capable of maintaining 98.2% true positive rate on classifying 8,112 obfuscated malware samples
    corecore