18,755 research outputs found

    Spiking Neural Network for Ultra-low-latency and High-accurate Object Detection

    Full text link
    Spiking Neural Networks (SNNs) have garnered widespread interest for their energy efficiency and brain-inspired event-driven properties. While recent methods like Spiking-YOLO have expanded the SNNs to more challenging object detection tasks, they often suffer from high latency and low detection accuracy, making them difficult to deploy on latency sensitive mobile platforms. Furthermore, the conversion method from Artificial Neural Networks (ANNs) to SNNs is hard to maintain the complete structure of the ANNs, resulting in poor feature representation and high conversion errors. To address these challenges, we propose two methods: timesteps compression and spike-time-dependent integrated (STDI) coding. The former reduces the timesteps required in ANN-SNN conversion by compressing information, while the latter sets a time-varying threshold to expand the information holding capacity. We also present a SNN-based ultra-low latency and high accurate object detection model (SUHD) that achieves state-of-the-art performance on nontrivial datasets like PASCAL VOC and MS COCO, with about remarkable 750x fewer timesteps and 30% mean average precision (mAP) improvement, compared to the Spiking-YOLO on MS COCO datasets. To the best of our knowledge, SUHD is the deepest spike-based object detection model to date that achieves ultra low timesteps to complete the lossless conversion.Comment: 14 pages, 10 figure

    Hierarchical Graph Neural Networks for Proprioceptive 6D Pose Estimation of In-hand Objects

    Full text link
    Robotic manipulation, in particular in-hand object manipulation, often requires an accurate estimate of the object's 6D pose. To improve the accuracy of the estimated pose, state-of-the-art approaches in 6D object pose estimation use observational data from one or more modalities, e.g., RGB images, depth, and tactile readings. However, existing approaches make limited use of the underlying geometric structure of the object captured by these modalities, thereby, increasing their reliance on visual features. This results in poor performance when presented with objects that lack such visual features or when visual features are simply occluded. Furthermore, current approaches do not take advantage of the proprioceptive information embedded in the position of the fingers. To address these limitations, in this paper: (1) we introduce a hierarchical graph neural network architecture for combining multimodal (vision and touch) data that allows for a geometrically informed 6D object pose estimation, (2) we introduce a hierarchical message passing operation that flows the information within and across modalities to learn a graph-based object representation, and (3) we introduce a method that accounts for the proprioceptive information for in-hand object representation. We evaluate our model on a diverse subset of objects from the YCB Object and Model Set, and show that our method substantially outperforms existing state-of-the-art work in accuracy and robustness to occlusion. We also deploy our proposed framework on a real robot and qualitatively demonstrate successful transfer to real settings

    Advancing Adversarial Training by Injecting Booster Signal

    Full text link
    Recent works have demonstrated that deep neural networks (DNNs) are highly vulnerable to adversarial attacks. To defend against adversarial attacks, many defense strategies have been proposed, among which adversarial training has been demonstrated to be the most effective strategy. However, it has been known that adversarial training sometimes hurts natural accuracy. Then, many works focus on optimizing model parameters to handle the problem. Different from the previous approaches, in this paper, we propose a new approach to improve the adversarial robustness by using an external signal rather than model parameters. In the proposed method, a well-optimized universal external signal called a booster signal is injected into the outside of the image which does not overlap with the original content. Then, it boosts both adversarial robustness and natural accuracy. The booster signal is optimized in parallel to model parameters step by step collaboratively. Experimental results show that the booster signal can improve both the natural and robust accuracies over the recent state-of-the-art adversarial training methods. Also, optimizing the booster signal is general and flexible enough to be adopted on any existing adversarial training methods.Comment: Accepted at IEEE Transactions on Neural Networks and Learning System

    Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

    Full text link
    Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impractical. Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution. This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models using a small- or mid-scale dataset. Notably, this study focuses on open-vocabulary out-of-distribution (OOD) generalization, a challenging problem that has been overlooked in previous model distillation literature. We propose two principles from vision and language modality perspectives to enhance student's OOD generalization: (1) by better imitating teacher's visual representation space, and carefully promoting better coherence in vision-language alignment with the teacher; (2) by enriching the teacher's language representations with informative and finegrained semantic attributes to effectively distinguish between different labels. We propose several metrics and conduct extensive experiments to investigate their techniques. The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification, highlighting the effectiveness of our proposed approaches. Our code will be released at https://github.com/xuanlinli17/large_vlm_distillation_oo

    A case of pure apraxia of speech after left hemisphere stroke: behavioral findings and neural correlates

    Get PDF
    IntroductionApraxia of speech (AOS) is a motor speech disorder impairing the coordination of complex articulatory movements needed to produce speech. AOS typically co-occurs with a non-fluent aphasia, or language disorder, making it challenging to determine the specific brain structures that cause AOS. Cases of pure AOS without aphasia are rare but offer the best window into the neural correlates that support articulatory planning. The goal of the current study was to explore patterns of apraxic speech errors and their underlying neural correlates in a case of pure AOS.MethodsA 67-year-old right-handed man presented with severe AOS resulting from a fronto-insular lesion caused by an ischemic stroke. The participant’s speech and language were evaluated at 1-, 3- and 12-months post-onset. High resolution structural MRI, including diffusion weighted imaging, was acquired at 12 months post-onset.ResultsAt the first assessment, the participant made minor errors on the Comprehensive Aphasia Test, demonstrating mild deficits in writing, auditory comprehension, and repetition. By the second assessment, he no longer had aphasia. On the Motor Speech Evaluation, the severity of his AOS was initially rated as 5 (out of 7) and improved to a score of 4 by the second visit, likely due to training by his SLP at the time to slow his speech. Structural MRI data showed a fronto-insular lesion encompassing the superior precentral gyrus of the insula and portions of the inferior and middle frontal gyri and precentral gyrus. Tractography derived from diffusion MRI showed partial damage to the frontal aslant tract and arcuate fasciculus along the white matter projections to the insula.DiscussionThis pure case of severe AOS without aphasia affords a unique window into the behavioral and neural mechanisms of this motor speech disorder. The current findings support previous observations that AOS and aphasia are dissociable and confirm a role for the precentral gyrus of the insula and BA44, as well as underlying white matter in supporting the coordination of complex articulatory movements. Additionally, other regions including the precentral gyrus, Broca’s area, and Area 55b are discussed regarding their potential role in successful speech production

    Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for Occluded Facial Expression Recognition

    Full text link
    Most research on facial expression recognition (FER) is conducted in highly controlled environments, but its performance is often unacceptable when applied to real-world situations. This is because when unexpected objects occlude the face, the FER network faces difficulties extracting facial features and accurately predicting facial expressions. Therefore, occluded FER (OFER) is a challenging problem. Previous studies on occlusion-aware FER have typically required fully annotated facial images for training. However, collecting facial images with various occlusions and expression annotations is time-consuming and expensive. Latent-OFER, the proposed method, can detect occlusions, restore occluded parts of the face as if they were unoccluded, and recognize them, improving FER accuracy. This approach involves three steps: First, the vision transformer (ViT)-based occlusion patch detector masks the occluded position by training only latent vectors from the unoccluded patches using the support vector data description algorithm. Second, the hybrid reconstruction network generates the masking position as a complete image using the ViT and convolutional neural network (CNN). Last, the expression-relevant latent vector extractor retrieves and uses expression-related information from all latent vectors by applying a CNN-based class activation map. This mechanism has a significant advantage in preventing performance degradation from occlusion by unseen objects. The experimental results on several databases demonstrate the superiority of the proposed method over state-of-the-art methods.Comment: 11 pages, 8 figure

    RoboChop: Autonomous Framework for Fruit and Vegetable Chopping Leveraging Foundational Models

    Full text link
    With the goal of developing fully autonomous cooking robots, developing robust systems that can chop a wide variety of objects is important. Existing approaches focus primarily on the low-level dynamics of the cutting action, which overlooks some of the practical real-world challenges of implementing autonomous cutting systems. In this work we propose an autonomous framework to sequence together action primitives for the purpose of chopping fruits and vegetables on a cluttered cutting board. We present a novel technique to leverage vision foundational models SAM and YOLO to accurately detect, segment, and track fruits and vegetables as they visually change through the sequences of chops, finetuning YOLO on a novel dataset of whole and chopped fruits and vegetables. In our experiments, we demonstrate that our simple pipeline is able to reliably chop a variety of fruits and vegetables ranging in size, appearance, and texture, meeting a variety of chopping specifications, including fruit type, number of slices, and types of slices

    Segmentation of Pathology Images: A Deep Learning Strategy with Annotated Data

    Get PDF
    Cancer has significantly threatened human life and health for many years. In the clinic, histopathology image segmentation is the golden stand for evaluating the prediction of patient prognosis and treatment outcome. Generally, manually labelling tumour regions in hundreds of high-resolution histopathological images is time-consuming and expensive for pathologists. Recently, the advancements in hardware and computer vision have allowed deep-learning-based methods to become mainstream to segment tumours automatically, significantly reducing the workload of pathologists. However, most current methods rely on large-scale labelled histopathological images. Therefore, this research studies label-effective tumour segmentation methods using deep-learning paradigms to relieve the annotation limitations. Chapter 3 proposes an ensemble framework for fully-supervised tumour segmentation. Usually, the performance of an individual-trained network is limited by significant morphological variances in histopathological images. We propose a fully-supervised learning ensemble fusion model that uses both shallow and deep U-Nets, trained with images of different resolutions and subsets of images, for robust predictions of tumour regions. Noise elimination is achieved with Convolutional Conditional Random Fields. Two open datasets are used to evaluate the proposed method: the ACDC@LungHP challenge at ISBI2019 and the DigestPath challenge at MICCAI2019. With a dice coefficient of 79.7 %, the proposed method takes third place in ACDC@LungHP. In DigestPath 2019, the proposed method achieves a dice coefficient 77.3 %. Well-annotated images are an indispensable part of training fully-supervised segmentation strategies. However, large-scale histopathology images are hardly annotated finely in clinical practice. It is common for labels to be of poor quality or for only a few images to be manually marked by experts. Consequently, fully-supervised methods cannot perform well in these cases. Chapter 4 proposes a self-supervised contrast learning for tumour segmentation. A self-supervised cancer segmentation framework is proposed to reduce label dependency. An innovative contrastive learning scheme is developed to represent tumour features based on unlabelled images. Unlike a normal U-Net, the backbone is a patch-based segmentation network. Additionally, data augmentation and contrastive losses are applied to improve the discriminability of tumour features. A convolutional Conditional Random Field is used to smooth and eliminate noise. Three labelled, and fourteen unlabelled images are collected from a private skin cancer dataset called BSS. Experimental results show that the proposed method achieves better tumour segmentation performance than other popular self-supervised methods. However, by evaluated on the same public dataset as chapter 3, the proposed self-supervised method is hard to handle fine-grained segmentation around tumour boundaries compared to the supervised method we proposed. Chapter 5 proposes a sketch-based weakly-supervised tumour segmentation method. To segment tumour regions precisely with coarse annotations, a sketch-supervised method is proposed, containing a dual CNN-Transformer network and a global normalised class activation map. CNN-Transformer networks simultaneously model global and local tumour features. With the global normalised class activation map, a gradient-based tumour representation can be obtained from the dual network predictions. We invited experts to mark fine and coarse annotations in the private BSS and the public PAIP2019 datasets to facilitate reproducible performance comparisons. Using the BSS dataset, the proposed method achieves 76.686 % IOU and 86.6 % Dice scores, outperforming state-of-the-art methods. Additionally, the proposed method achieves a Dice gain of 8.372 % compared with U-Net on the PAIP2019 dataset. The thesis presents three approaches to segmenting cancers from histology images: fully-supervised, unsupervised, and weakly supervised methods. This research effectively segments tumour regions based on histopathological annotations and well-designed modules. Our studies comprehensively demonstrate label-effective automatic histopathological image segmentation. Experimental results prove that our works achieve state-of-the-art segmentation performances on private and public datasets. In the future, we plan to integrate more tumour feature representation technologies with other medical modalities and apply them to clinical research

    Research progress on deep learning in magnetic resonance imaging–based diagnosis and treatment of prostate cancer: a review on the current status and perspectives

    Get PDF
    Multiparametric magnetic resonance imaging (mpMRI) has emerged as a first-line screening and diagnostic tool for prostate cancer, aiding in treatment selection and noninvasive radiotherapy guidance. However, the manual interpretation of MRI data is challenging and time-consuming, which may impact sensitivity and specificity. With recent technological advances, artificial intelligence (AI) in the form of computer-aided diagnosis (CAD) based on MRI data has been applied to prostate cancer diagnosis and treatment. Among AI techniques, deep learning involving convolutional neural networks contributes to detection, segmentation, scoring, grading, and prognostic evaluation of prostate cancer. CAD systems have automatic operation, rapid processing, and accuracy, incorporating multiple sequences of multiparametric MRI data of the prostate gland into the deep learning model. Thus, they have become a research direction of great interest, especially in smart healthcare. This review highlights the current progress of deep learning technology in MRI-based diagnosis and treatment of prostate cancer. The key elements of deep learning-based MRI image processing in CAD systems and radiotherapy of prostate cancer are briefly described, making it understandable not only for radiologists but also for general physicians without specialized imaging interpretation training. Deep learning technology enables lesion identification, detection, and segmentation, grading and scoring of prostate cancer, and prediction of postoperative recurrence and prognostic outcomes. The diagnostic accuracy of deep learning can be improved by optimizing models and algorithms, expanding medical database resources, and combining multi-omics data and comprehensive analysis of various morphological data. Deep learning has the potential to become the key diagnostic method in prostate cancer diagnosis and treatment in the future
    • …
    corecore