134 research outputs found

    NearbyPatchCL: Leveraging Nearby Patches for Self-Supervised Patch-Level Multi-Class Classification in Whole-Slide Images

    Full text link
    Whole-slide image (WSI) analysis plays a crucial role in cancer diagnosis and treatment. In addressing the demands of this critical task, self-supervised learning (SSL) methods have emerged as a valuable resource, leveraging their efficiency in circumventing the need for a large number of annotations, which can be both costly and time-consuming to deploy supervised methods. Nevertheless, patch-wise representation may exhibit instability in performance, primarily due to class imbalances stemming from patch selection within WSIs. In this paper, we introduce Nearby Patch Contrastive Learning (NearbyPatchCL), a novel self-supervised learning method that leverages nearby patches as positive samples and a decoupled contrastive loss for robust representation learning. Our method demonstrates a tangible enhancement in performance for downstream tasks involving patch-level multi-class classification. Additionally, we curate a new dataset derived from WSIs sourced from the Canine Cutaneous Cancer Histology, thus establishing a benchmark for the rigorous evaluation of patch-level multi-class classification methodologies. Intensive experiments show that our method significantly outperforms the supervised baseline and state-of-the-art SSL methods with top-1 classification accuracy of 87.56%. Our method also achieves comparable results while utilizing a mere 1% of labeled data, a stark contrast to the 100% labeled data requirement of other approaches. Source code: https://github.com/nvtien457/NearbyPatchCLComment: MMM 202

    Masked Face Analysis via Multi-Task Deep Learning

    Get PDF
    Face recognition with wearable items has been a challenging task in computer vision and involves the problem of identifying humans wearing a face mask. Masked face analysis via multi-task learning could effectively improve performance in many fields of face analysis. In this paper, we propose a unified framework for predicting the age, gender, and emotions of people wearing face masks. We first construct FGNET-MASK, a masked face dataset for the problem. Then, we propose a multi-task deep learning model to tackle the problem. In particular, the multi-task deep learning model takes the data as inputs and shares their weight to yield predictions of age, expression, and gender for the masked face. Through extensive experiments, the proposed framework has been found to provide a better performance than other existing methods

    Multi-Branch Network for Imagery Emotion Prediction

    Full text link
    For a long time, images have proved perfect at both storing and conveying rich semantics, especially human emotions. A lot of research has been conducted to provide machines with the ability to recognize emotions in photos of people. Previous methods mostly focus on facial expressions but fail to consider the scene context, meanwhile scene context plays an important role in predicting emotions, leading to more accurate results. In addition, Valence-Arousal-Dominance (VAD) values offer a more precise quantitative understanding of continuous emotions, yet there has been less emphasis on predicting them compared to discrete emotional categories. In this paper, we present a novel Multi-Branch Network (MBN), which utilizes various source information, including faces, bodies, and scene contexts to predict both discrete and continuous emotions in an image. Experimental results on EMOTIC dataset, which contains large-scale images of people in unconstrained situations labeled with 26 discrete categories of emotions and VAD values, show that our proposed method significantly outperforms state-of-the-art methods with 28.4% in mAP and 0.93 in MAE. The results highlight the importance of utilizing multiple contextual information in emotion prediction and illustrate the potential of our proposed method in a wide range of applications, such as effective computing, human-computer interaction, and social robotics. Source code: https://github.com/BaoNinh2808/Multi-Branch-Network-for-Imagery-Emotion-PredictionComment: SOICT 202

    GUNNEL: Guided Mixup Augmentation and Multi-View Fusion for Aquatic Animal Segmentation

    Full text link
    Recent years have witnessed great advances in object segmentation research. In addition to generic objects, aquatic animals have attracted research attention. Deep learning-based methods are widely used for aquatic animal segmentation and have achieved promising performance. However, there is a lack of challenging datasets for benchmarking. In this work, we build a new dataset dubbed "Aquatic Animal Species." We also devise a novel GUided mixup augmeNtatioN and multi-viEw fusion for aquatic animaL segmentation (GUNNEL) that leverages the advantages of multiple view segmentation models to effectively segment aquatic animals and improves the training performance by synthesizing hard samples. Extensive experiments demonstrated the superiority of our proposed framework over existing state-of-the-art instance segmentation methods

    iCONTRA: Toward Thematic Collection Design Via Interactive Concept Transfer

    Full text link
    Creating thematic collections in industries demands innovative designs and cohesive concepts. Designers may face challenges in maintaining thematic consistency when drawing inspiration from existing objects, landscapes, or artifacts. While AI-powered graphic design tools offer help, they often fail to generate cohesive sets based on specific thematic concepts. In response, we introduce iCONTRA, an interactive CONcept TRAnsfer system. With a user-friendly interface, iCONTRA enables both experienced designers and novices to effortlessly explore creative design concepts and efficiently generate thematic collections. We also propose a zero-shot image editing algorithm, eliminating the need for fine-tuning models, which gradually integrates information from initial objects, ensuring consistency in the generation process without influencing the background. A pilot study suggests iCONTRA's potential to reduce designers' efforts. Experimental results demonstrate its effectiveness in producing consistent and high-quality object concept transfers. iCONTRA stands as a promising tool for innovation and creative exploration in thematic collection design. The source code will be available at: https://github.com/vdkhoi20/iCONTRA.Comment: CHI 202

    MaskDiff: Modeling Mask Distribution with Diffusion Probabilistic Model for Few-Shot Instance Segmentation

    Full text link
    Few-shot instance segmentation extends the few-shot learning paradigm to the instance segmentation task, which tries to segment instance objects from a query image with a few annotated examples of novel categories. Conventional approaches have attempted to address the task via prototype learning, known as point estimation. However, this mechanism depends on prototypes (\eg mean of K−K-shot) for prediction, leading to performance instability. To overcome the disadvantage of the point estimation mechanism, we propose a novel approach, dubbed MaskDiff, which models the underlying conditional distribution of a binary mask, which is conditioned on an object region and K−K-shot information. Inspired by augmentation approaches that perturb data with Gaussian noise for populating low data density regions, we model the mask distribution with a diffusion probabilistic model. We also propose to utilize classifier-free guided mask sampling to integrate category information into the binary mask generation process. Without bells and whistles, our proposed method consistently outperforms state-of-the-art methods on both base and novel classes of the COCO dataset while simultaneously being more stable than existing methods. The source code is available at: https://github.com/minhquanlecs/MaskDiff.Comment: Accepted at AAAI 2024 (oral presentation

    Finding optimal reactive power dispatch solutions by using a novel improved stochastic fractal search optimization algorithm

    Get PDF
    In this paper, a novel improved Stochastic Fractal Search optimization algorithm (ISFSOA) is proposed for finding effective solutions of a complex optimal reactive power dispatch (ORPD) problem with consideration of all constraints in transmission power network. Three different objectives consisting of total power loss (TPL), total voltage deviation (TVD) and voltage stabilization enhancement index are independently optimized by running the proposed ISFSOA and standard Stochastic Fractal Search optimization algorithm (SFSOA). The potential search of the proposed ISFSOA can be highly improved since diffusion process of SFSOA is modified. Compared to SFSOA, the proposed method can explore large search zones and exploit local search zones effectively based on the comparison of solution quality. One standard IEEE 30-bus system with three study cases is employed for testing the proposed method and compared to other so far applied methods. For each study case, the proposed method together with SFSOA are run fifty run and three main results consisting of the best, mean and standard deviation fitness function are compared. The indication is that the proposed method can find more promising solutions for the three cases and its search ability is always more stable than those of SFSOA. The comparison with other methods also give the same evaluation that the proposed method can be superior to almost all compared methods. As a result, it can conclude that the proposed modification is really appropriate for SFSOA in dealing with ORPD problem and the method can be used for other engineering optimization problems
    • …
    corecore