120 research outputs found
Region-Enhanced Feature Learning for Scene Semantic Segmentation
Semantic segmentation in complex scenes not only relies on local object
appearance but also on object locations and the surrounding environment.
Nonetheless, it is difficult to model long-range context in the format of
pairwise point correlations due to its huge computational cost for large-scale
point clouds. In this paper, we propose to use regions as the intermediate
representation of point clouds instead of fine-grained points or voxels to
reduce the computational burden. We introduce a novel Region-Enhanced Feature
Learning network (REFL-Net) that leverages region correlations to enhance the
features of ambiguous points. We design a Region-based Feature Enhancement
module (RFE) which consists of a Semantic-Spatial Region Extraction (SSRE)
stage and a Region Dependency Modeling (RDM) stage. In the SSRE stage, we group
the input points into a set of regions according to the point distances in both
semantic and spatial space. In the RDM part, we explore region-wise semantic
and spatial relationships via a self-attention block on region features and
fuse point features with the region features to obtain more discriminative
representations. Our proposed RFE module is a plug-and-play module that can be
integrated with common semantic segmentation backbones. We conduct extensive
experiments on ScanNetv2 and S3DIS datasets, and evaluate our RFE module with
different segmentation backbones. Our REFL-Net achieves 1.8% mIoU gain on
ScanNetv2 and 1.0% mIoU gain on S3DIS respectively with negligible
computational cost compared to the backbone networks. Both quantitative and
qualitative results show the powerful long-range context modeling ability and
strong generalization ability of our REFL-Net
Learning Multimodal Volumetric Features for Large-Scale Neuron Tracing
The current neuron reconstruction pipeline for electron microscopy (EM) data
usually includes automatic image segmentation followed by extensive human
expert proofreading. In this work, we aim to reduce human workload by
predicting connectivity between over-segmented neuron pieces, taking both
microscopy image and 3D morphology features into account, similar to human
proofreading workflow. To this end, we first construct a dataset, named
FlyTracing, that contains millions of pairwise connections of segments
expanding the whole fly brain, which is three orders of magnitude larger than
existing datasets for neuron segment connection. To learn sophisticated
biological imaging features from the connectivity annotations, we propose a
novel connectivity-aware contrastive learning method to generate dense
volumetric EM image embedding. The learned embeddings can be easily
incorporated with any point or voxel-based morphological representations for
automatic neuron tracing. Extensive comparisons of different combination
schemes of image and morphological representation in identifying split errors
across the whole fly brain demonstrate the superiority of the proposed
approach, especially for the locations that contain severe imaging artifacts,
such as section missing and misalignment. The dataset and code are available at
https://github.com/Levishery/Flywire-Neuron-Tracing.Comment: 9 pages, 6 figures, AAAI 2024 accepte
Slot-VLM: SlowFast Slots for Video-Language Modeling
Video-Language Models (VLMs), powered by the advancements in Large Language
Models (LLMs), are charting new frontiers in video understanding. A pivotal
challenge is the development of an efficient method to encapsulate video
content into a set of representative tokens to align with LLMs. In this work,
we introduce Slot-VLM, a novel framework designed to generate semantically
decomposed video tokens, in terms of object-wise and event-wise visual
representations, to facilitate LLM inference. Particularly, we design a
SlowFast Slots module, i.e., SF-Slots, that adaptively aggregates the dense
video tokens from the CLIP vision encoder to a set of representative slots. In
order to take into account both the spatial object details and the varied
temporal dynamics, SF-Slots is built with a dual-branch structure. The
Slow-Slots branch focuses on extracting object-centric slots from features at
high spatial resolution but low (slow) frame sample rate, emphasizing detailed
object information. Conversely, Fast-Slots branch is engineered to learn
event-centric slots from high temporal sample rate but low spatial resolution
features. These complementary slots are combined to form the vision context,
serving as the input to the LLM for efficient question answering. Our
experimental results demonstrate the effectiveness of our Slot-VLM, which
achieves the state-of-the-art performance on video question-answering.Comment: 16 pages, 10 figure
The Optimization and Mathematical Modeling of Quality Attributes of Parboiled Rice Using a Response Surface Method
The response surface methodology was used to optimize the hydrothermal processing conditions based on the rice quality parameters of the Rong Youhua Zhan rice variety (Indica). The effect of soaking temperature (29.77, 40, 55, 70, and 80.23°C), soaking time (67.55, 90, 120, 150, and 170.45 min), and steaming time (1.59, 5, 10, 15, and 18.41 min), each tested at five levels, on percentage of head rice yield (HRY), hardness, cooking time, lightness, and color were determined, with R2 values of 0.96, 0.94, 0.90, 0.88, and 0.94, respectively. HRY, hardness, cooking time, and color increased with process severity while lightness decreased, although HRY decreased after reaching a maximum. The predicted optimum soaking temperature, soaking time, and steaming time were 69.88°C, 150 min, and 6.73 min, respectively, and the predicted HRY, hardness, cooking time, lightness, and color under these conditions were 73.43%, 29.95 N, 32.14 min, 83.03 min, and 12.24 min, respectively, with a composite desirability of 0.9658. The parboiling industry could use the findings of the current study to obtain the desired quality of parboiled rice. This manuscript will be helpful for researchers working on commercializing parboiled rice processes in China as well as in other countries
A Two-Stream Mutual Attention Network for Semi-supervised Biomedical Segmentation with Noisy Labels
\begin{abstract} Learning-based methods suffer from a deficiency of clean
annotations, especially in biomedical segmentation. Although many
semi-supervised methods have been proposed to provide extra training data,
automatically generated labels are usually too noisy to retrain models
effectively. In this paper, we propose a Two-Stream Mutual Attention Network
(TSMAN) that weakens the influence of back-propagated gradients caused by
incorrect labels, thereby rendering the network robust to unclean data. The
proposed TSMAN consists of two sub-networks that are connected by three types
of attention models in different layers. The target of each attention model is
to indicate potentially incorrect gradients in a certain layer for both
sub-networks by analyzing their inferred features using the same input. In
order to achieve this purpose, the attention models are designed based on the
propagation analysis of noisy gradients at different layers. This allows the
attention models to effectively discover incorrect labels and weaken their
influence during the parameter updating process. By exchanging multi-level
features within the two-stream architecture, the effects of noisy labels in
each sub-network are reduced by decreasing the updating gradients. Furthermore,
a hierarchical distillation is developed to provide more reliable pseudo labels
for unlabelded data, which further boosts the performance of our retrained
TSMAN. The experiments using both the HVSMR 2016 and BRATS 2015 benchmarks
demonstrate that our semi-supervised learning framework surpasses the
state-of-the-art fully-supervised results
- …