120 research outputs found

    Region-Enhanced Feature Learning for Scene Semantic Segmentation

    Full text link
    Semantic segmentation in complex scenes not only relies on local object appearance but also on object locations and the surrounding environment. Nonetheless, it is difficult to model long-range context in the format of pairwise point correlations due to its huge computational cost for large-scale point clouds. In this paper, we propose to use regions as the intermediate representation of point clouds instead of fine-grained points or voxels to reduce the computational burden. We introduce a novel Region-Enhanced Feature Learning network (REFL-Net) that leverages region correlations to enhance the features of ambiguous points. We design a Region-based Feature Enhancement module (RFE) which consists of a Semantic-Spatial Region Extraction (SSRE) stage and a Region Dependency Modeling (RDM) stage. In the SSRE stage, we group the input points into a set of regions according to the point distances in both semantic and spatial space. In the RDM part, we explore region-wise semantic and spatial relationships via a self-attention block on region features and fuse point features with the region features to obtain more discriminative representations. Our proposed RFE module is a plug-and-play module that can be integrated with common semantic segmentation backbones. We conduct extensive experiments on ScanNetv2 and S3DIS datasets, and evaluate our RFE module with different segmentation backbones. Our REFL-Net achieves 1.8% mIoU gain on ScanNetv2 and 1.0% mIoU gain on S3DIS respectively with negligible computational cost compared to the backbone networks. Both quantitative and qualitative results show the powerful long-range context modeling ability and strong generalization ability of our REFL-Net

    Learning Multimodal Volumetric Features for Large-Scale Neuron Tracing

    Full text link
    The current neuron reconstruction pipeline for electron microscopy (EM) data usually includes automatic image segmentation followed by extensive human expert proofreading. In this work, we aim to reduce human workload by predicting connectivity between over-segmented neuron pieces, taking both microscopy image and 3D morphology features into account, similar to human proofreading workflow. To this end, we first construct a dataset, named FlyTracing, that contains millions of pairwise connections of segments expanding the whole fly brain, which is three orders of magnitude larger than existing datasets for neuron segment connection. To learn sophisticated biological imaging features from the connectivity annotations, we propose a novel connectivity-aware contrastive learning method to generate dense volumetric EM image embedding. The learned embeddings can be easily incorporated with any point or voxel-based morphological representations for automatic neuron tracing. Extensive comparisons of different combination schemes of image and morphological representation in identifying split errors across the whole fly brain demonstrate the superiority of the proposed approach, especially for the locations that contain severe imaging artifacts, such as section missing and misalignment. The dataset and code are available at https://github.com/Levishery/Flywire-Neuron-Tracing.Comment: 9 pages, 6 figures, AAAI 2024 accepte

    Slot-VLM: SlowFast Slots for Video-Language Modeling

    Full text link
    Video-Language Models (VLMs), powered by the advancements in Large Language Models (LLMs), are charting new frontiers in video understanding. A pivotal challenge is the development of an efficient method to encapsulate video content into a set of representative tokens to align with LLMs. In this work, we introduce Slot-VLM, a novel framework designed to generate semantically decomposed video tokens, in terms of object-wise and event-wise visual representations, to facilitate LLM inference. Particularly, we design a SlowFast Slots module, i.e., SF-Slots, that adaptively aggregates the dense video tokens from the CLIP vision encoder to a set of representative slots. In order to take into account both the spatial object details and the varied temporal dynamics, SF-Slots is built with a dual-branch structure. The Slow-Slots branch focuses on extracting object-centric slots from features at high spatial resolution but low (slow) frame sample rate, emphasizing detailed object information. Conversely, Fast-Slots branch is engineered to learn event-centric slots from high temporal sample rate but low spatial resolution features. These complementary slots are combined to form the vision context, serving as the input to the LLM for efficient question answering. Our experimental results demonstrate the effectiveness of our Slot-VLM, which achieves the state-of-the-art performance on video question-answering.Comment: 16 pages, 10 figure

    The Optimization and Mathematical Modeling of Quality Attributes of Parboiled Rice Using a Response Surface Method

    Get PDF
    The response surface methodology was used to optimize the hydrothermal processing conditions based on the rice quality parameters of the Rong Youhua Zhan rice variety (Indica). The effect of soaking temperature (29.77, 40, 55, 70, and 80.23°C), soaking time (67.55, 90, 120, 150, and 170.45 min), and steaming time (1.59, 5, 10, 15, and 18.41 min), each tested at five levels, on percentage of head rice yield (HRY), hardness, cooking time, lightness, and color were determined, with R2 values of 0.96, 0.94, 0.90, 0.88, and 0.94, respectively. HRY, hardness, cooking time, and color increased with process severity while lightness decreased, although HRY decreased after reaching a maximum. The predicted optimum soaking temperature, soaking time, and steaming time were 69.88°C, 150 min, and 6.73 min, respectively, and the predicted HRY, hardness, cooking time, lightness, and color under these conditions were 73.43%, 29.95 N, 32.14 min, 83.03 min, and 12.24 min, respectively, with a composite desirability of 0.9658. The parboiling industry could use the findings of the current study to obtain the desired quality of parboiled rice. This manuscript will be helpful for researchers working on commercializing parboiled rice processes in China as well as in other countries

    A Two-Stream Mutual Attention Network for Semi-supervised Biomedical Segmentation with Noisy Labels

    Full text link
    \begin{abstract} Learning-based methods suffer from a deficiency of clean annotations, especially in biomedical segmentation. Although many semi-supervised methods have been proposed to provide extra training data, automatically generated labels are usually too noisy to retrain models effectively. In this paper, we propose a Two-Stream Mutual Attention Network (TSMAN) that weakens the influence of back-propagated gradients caused by incorrect labels, thereby rendering the network robust to unclean data. The proposed TSMAN consists of two sub-networks that are connected by three types of attention models in different layers. The target of each attention model is to indicate potentially incorrect gradients in a certain layer for both sub-networks by analyzing their inferred features using the same input. In order to achieve this purpose, the attention models are designed based on the propagation analysis of noisy gradients at different layers. This allows the attention models to effectively discover incorrect labels and weaken their influence during the parameter updating process. By exchanging multi-level features within the two-stream architecture, the effects of noisy labels in each sub-network are reduced by decreasing the updating gradients. Furthermore, a hierarchical distillation is developed to provide more reliable pseudo labels for unlabelded data, which further boosts the performance of our retrained TSMAN. The experiments using both the HVSMR 2016 and BRATS 2015 benchmarks demonstrate that our semi-supervised learning framework surpasses the state-of-the-art fully-supervised results
    • …
    corecore