137 research outputs found
NearbyPatchCL: Leveraging Nearby Patches for Self-Supervised Patch-Level Multi-Class Classification in Whole-Slide Images
Whole-slide image (WSI) analysis plays a crucial role in cancer diagnosis and
treatment. In addressing the demands of this critical task, self-supervised
learning (SSL) methods have emerged as a valuable resource, leveraging their
efficiency in circumventing the need for a large number of annotations, which
can be both costly and time-consuming to deploy supervised methods.
Nevertheless, patch-wise representation may exhibit instability in performance,
primarily due to class imbalances stemming from patch selection within WSIs. In
this paper, we introduce Nearby Patch Contrastive Learning (NearbyPatchCL), a
novel self-supervised learning method that leverages nearby patches as positive
samples and a decoupled contrastive loss for robust representation learning.
Our method demonstrates a tangible enhancement in performance for downstream
tasks involving patch-level multi-class classification. Additionally, we curate
a new dataset derived from WSIs sourced from the Canine Cutaneous Cancer
Histology, thus establishing a benchmark for the rigorous evaluation of
patch-level multi-class classification methodologies. Intensive experiments
show that our method significantly outperforms the supervised baseline and
state-of-the-art SSL methods with top-1 classification accuracy of 87.56%. Our
method also achieves comparable results while utilizing a mere 1% of labeled
data, a stark contrast to the 100% labeled data requirement of other
approaches. Source code: https://github.com/nvtien457/NearbyPatchCLComment: MMM 202
Multi-Branch Network for Imagery Emotion Prediction
For a long time, images have proved perfect at both storing and conveying
rich semantics, especially human emotions. A lot of research has been conducted
to provide machines with the ability to recognize emotions in photos of people.
Previous methods mostly focus on facial expressions but fail to consider the
scene context, meanwhile scene context plays an important role in predicting
emotions, leading to more accurate results. In addition,
Valence-Arousal-Dominance (VAD) values offer a more precise quantitative
understanding of continuous emotions, yet there has been less emphasis on
predicting them compared to discrete emotional categories. In this paper, we
present a novel Multi-Branch Network (MBN), which utilizes various source
information, including faces, bodies, and scene contexts to predict both
discrete and continuous emotions in an image. Experimental results on EMOTIC
dataset, which contains large-scale images of people in unconstrained
situations labeled with 26 discrete categories of emotions and VAD values, show
that our proposed method significantly outperforms state-of-the-art methods
with 28.4% in mAP and 0.93 in MAE. The results highlight the importance of
utilizing multiple contextual information in emotion prediction and illustrate
the potential of our proposed method in a wide range of applications, such as
effective computing, human-computer interaction, and social robotics. Source
code:
https://github.com/BaoNinh2808/Multi-Branch-Network-for-Imagery-Emotion-PredictionComment: SOICT 202
Masked Face Analysis via Multi-Task Deep Learning
Face recognition with wearable items has been a challenging task in computer vision and involves the problem of identifying humans wearing a face mask. Masked face analysis via multi-task learning could effectively improve performance in many fields of face analysis. In this paper, we propose a unified framework for predicting the age, gender, and emotions of people wearing face masks. We first construct FGNET-MASK, a masked face dataset for the problem. Then, we propose a multi-task deep learning model to tackle the problem. In particular, the multi-task deep learning model takes the data as inputs and shares their weight to yield predictions of age, expression, and gender for the masked face. Through extensive experiments, the proposed framework has been found to provide a better performance than other existing methods
GUNNEL: Guided Mixup Augmentation and Multi-View Fusion for Aquatic Animal Segmentation
Recent years have witnessed great advances in object segmentation research.
In addition to generic objects, aquatic animals have attracted research
attention. Deep learning-based methods are widely used for aquatic animal
segmentation and have achieved promising performance. However, there is a lack
of challenging datasets for benchmarking. In this work, we build a new dataset
dubbed "Aquatic Animal Species." We also devise a novel GUided mixup
augmeNtatioN and multi-viEw fusion for aquatic animaL segmentation (GUNNEL)
that leverages the advantages of multiple view segmentation models to
effectively segment aquatic animals and improves the training performance by
synthesizing hard samples. Extensive experiments demonstrated the superiority
of our proposed framework over existing state-of-the-art instance segmentation
methods
iCONTRA: Toward Thematic Collection Design Via Interactive Concept Transfer
Creating thematic collections in industries demands innovative designs and
cohesive concepts. Designers may face challenges in maintaining thematic
consistency when drawing inspiration from existing objects, landscapes, or
artifacts. While AI-powered graphic design tools offer help, they often fail to
generate cohesive sets based on specific thematic concepts. In response, we
introduce iCONTRA, an interactive CONcept TRAnsfer system. With a user-friendly
interface, iCONTRA enables both experienced designers and novices to
effortlessly explore creative design concepts and efficiently generate thematic
collections. We also propose a zero-shot image editing algorithm, eliminating
the need for fine-tuning models, which gradually integrates information from
initial objects, ensuring consistency in the generation process without
influencing the background. A pilot study suggests iCONTRA's potential to
reduce designers' efforts. Experimental results demonstrate its effectiveness
in producing consistent and high-quality object concept transfers. iCONTRA
stands as a promising tool for innovation and creative exploration in thematic
collection design. The source code will be available at:
https://github.com/vdkhoi20/iCONTRA.Comment: CHI 202
MaskDiff: Modeling Mask Distribution with Diffusion Probabilistic Model for Few-Shot Instance Segmentation
Few-shot instance segmentation extends the few-shot learning paradigm to the
instance segmentation task, which tries to segment instance objects from a
query image with a few annotated examples of novel categories. Conventional
approaches have attempted to address the task via prototype learning, known as
point estimation. However, this mechanism depends on prototypes (\eg mean of
shot) for prediction, leading to performance instability. To overcome the
disadvantage of the point estimation mechanism, we propose a novel approach,
dubbed MaskDiff, which models the underlying conditional distribution of a
binary mask, which is conditioned on an object region and shot information.
Inspired by augmentation approaches that perturb data with Gaussian noise for
populating low data density regions, we model the mask distribution with a
diffusion probabilistic model. We also propose to utilize classifier-free
guided mask sampling to integrate category information into the binary mask
generation process. Without bells and whistles, our proposed method
consistently outperforms state-of-the-art methods on both base and novel
classes of the COCO dataset while simultaneously being more stable than
existing methods. The source code is available at:
https://github.com/minhquanlecs/MaskDiff.Comment: Accepted at AAAI 2024 (oral presentation
Finding optimal reactive power dispatch solutions by using a novel improved stochastic fractal search optimization algorithm
In this paper, a novel improved Stochastic Fractal Search optimization algorithm (ISFSOA) is proposed for finding effective solutions of a complex optimal reactive power dispatch (ORPD) problem with consideration of all constraints in transmission power network. Three different objectives consisting of total power loss (TPL), total voltage deviation (TVD) and voltage stabilization enhancement index are independently optimized by running the proposed ISFSOA and standard Stochastic Fractal Search optimization algorithm (SFSOA). The potential search of the proposed ISFSOA can be highly improved since diffusion process of SFSOA is modified. Compared to SFSOA, the proposed method can explore large search zones and exploit local search zones effectively based on the comparison of solution quality. One standard IEEE 30-bus system with three study cases is employed for testing the proposed method and compared to other so far applied methods. For each study case, the proposed method together with SFSOA are run fifty run and three main results consisting of the best, mean and standard deviation fitness function are compared. The indication is that the proposed method can find more promising solutions for the three cases and its search ability is always more stable than those of SFSOA. The comparison with other methods also give the same evaluation that the proposed method can be superior to almost all compared methods. As a result, it can conclude that the proposed modification is really appropriate for SFSOA in dealing with ORPD problem and the method can be used for other engineering optimization problems
- …