26 research outputs found

    Scalable Language Model with Generalized Continual Learning

    Full text link
    Continual learning has gained increasing importance as it facilitates the acquisition and refinement of scalable knowledge and skills in language models. However, existing methods typically encounter strict limitations and challenges in real-world scenarios, such as reliance on experience replay, optimization constraints, and inference task-ID. In this study, we introduce the Scalable Language Model (SLM) to overcome these limitations within a more challenging and generalized setting, representing a significant advancement toward practical applications for continual learning. Specifically, we propose the Joint Adaptive Re-Parameterization (JARe), integrated with Dynamic Task-related Knowledge Retrieval (DTKR), to enable adaptive adjustment of language models based on specific downstream tasks. This approach leverages the task distribution within the vector space, aiming to achieve a smooth and effortless continual learning process. Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting. Moreover, while prior research primarily focused on a single task type such as classification, our study goes beyond, with the large language model, i.e., LLaMA-2, to explore the effects across diverse domains and task types, such that a single language model can be decently scaled to broader applications.Comment: The Twelfth International Conference on Learning Representation

    Hierarchical Dense Correlation Distillation for Few-Shot Segmentation-Extended Abstract

    Full text link
    Few-shot semantic segmentation (FSS) aims to form class-agnostic models segmenting unseen classes with only a handful of annotations. Previous methods limited to the semantic feature and prototype representation suffer from coarse segmentation granularity and train-set overfitting. In this work, we design Hierarchically Decoupled Matching Network (HDMNet) mining pixel-level support correlation based on the transformer architecture. The self-attention modules are used to assist in establishing hierarchical dense features, as a means to accomplish the cascade matching between query and support features. Moreover, we propose a matching module to reduce train-set overfitting and introduce correlation distillation leveraging semantic correspondence from coarse resolution to boost fine-grained segmentation. Our method performs decently in experiments. We achieve 50.0% mIoU on COCO dataset one-shot setting and 56.0% on five-shot segmentation, respectively. The code will be available on the project website. We hope our work can benefit broader industrial applications where novel classes with limited annotations are required to be decently identified.Comment: Accepted to CVPR 2023 VISION Workshop, Oral. The extended abstract of Hierarchical Dense Correlation Distillation for Few-Shot Segmentation. arXiv admin note: substantial text overlap with arXiv:2303.1465

    GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

    Full text link
    Self-supervised 3D representation learning aims to learn effective representations from large-scale unlabeled point clouds. Most existing approaches adopt point discrimination as the pretext task, which assigns matched points in two distinct views as positive pairs and unmatched points as negative pairs. However, this approach often results in semantically identical points having dissimilar representations, leading to a high number of false negatives and introducing a "semantic conflict" problem. To address this issue, we propose GroupContrast, a novel approach that combines segment grouping and semantic-aware contrastive learning. Segment grouping partitions points into semantically meaningful regions, which enhances semantic coherence and provides semantic guidance for the subsequent contrastive representation learning. Semantic-aware contrastive learning augments the semantic information extracted from segment grouping and helps to alleviate the issue of "semantic conflict". We conducted extensive experiments on multiple 3D scene understanding tasks. The results demonstrate that GroupContrast learns semantically meaningful representations and achieves promising transfer learning performance.Comment: CVPR 202

    Collaboration of Pre-trained Models Makes Better Few-shot Learner

    Full text link
    Few-shot classification requires deep neural networks to learn generalized representations only from limited training images, which is challenging but significant in low-data regimes. Recently, CLIP-based methods have shown promising few-shot performance benefited from the contrastive language-image pre-training. Based on this point, we question if the large-scale pre-training can alleviate the few-shot data deficiency and also assist the representation learning by the pre-learned knowledge. In this paper, we propose CoMo, a Collaboration of pre-trained Models that incorporates diverse prior knowledge from various pre-training paradigms for better few-shot learning. Our CoMo includes: CLIP's language-contrastive knowledge, DINO's vision-contrastive knowledge, and DALL-E's language-generative knowledge. Specifically, CoMo works in two aspects: few-shot data expansion and diverse knowledge ensemble. For one, we generate synthetic images via zero-shot DALL-E to enrich the few-shot training data without any manpower. For the other, we introduce a learnable Multi-Knowledge Adapter (MK-Adapter) to adaptively blend the predictions from CLIP and DINO. By such collaboration, CoMo can fully unleash the potential of different pre-training methods and unify them to perform state-of-the-art for few-shot classification. We conduct extensive experiments on 11 datasets to demonstrate the superiority and generalization ability of our approach.Comment: 10 pages, 6 figure

    LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model

    Full text link
    While LISA effectively bridges the gap between segmentation and large language models to enable reasoning segmentation, it poses certain limitations: unable to distinguish different instances of the target region, and constrained by the pre-defined textual response formats. In this work, we introduce LISA++, an update to the existing LISA model, focusing on improving core functionalities while keeping the base architecture intact. The main enhancements in LISA++ include: \textbf{1) Enhanced Segmentation}: The instance segmentation ability has been added, providing a more detailed scene analysis along with the existing multi-region semantic segmentation. \textbf{2) More Natural Conversation}: Improved capability for multi-turn dialogue, with the ability to incorporate segmentation results directly into text responses, i.e., Segmentation in Dialogue (SiD). These improvements are achieved by curating the existing samples of generic segmentation datasets, aimed specifically at enhancing the segmentation and conversational skills without structural change and additional data sources. Comparative analysis with the original LISA model shows significant advancements in these areas, positioning LISA++ as a notable upgrade in visual understanding and interaction. LISA++'s adaptability and improved features highlight the versatility of the mask-as-embedding paradigm proposed by LISA, and the potential as a foundational model for diverse applications.Comment: Typo fixe

    Nlrp2, a Maternal Effect Gene Required for Early Embryonic Development in the Mouse

    Get PDF
    Maternal effect genes encode proteins that are produced during oogenesis and play an essential role during early embryogenesis. Genetic ablation of such genes in oocytes can result in female subfertility or infertility. Here we report a newly identified maternal effect gene, Nlrp2, which plays a role in early embryogenesis in the mouse. Nlrp2 mRNAs and their proteins (∼118 KDa) are expressed in oocytes and granulosa cells during folliculogenesis. The transcripts show a striking decline in early preimplantation embryos before zygotic genome activation, but the proteins remain present through to the blastocyst stage. Immunogold electron microscopy revealed that the NLRP2 protein is located in the cytoplasm, nucleus and close to nuclear pores in the oocytes, as well as in the surrounding granulosa cells. Using RNA interference, we knocked down Nlrp2 transcription specifically in mouse germinal vesicle oocytes. The knockdown oocytes could progress through the metaphase of meiosis I and emit the first polar body. However, the development of parthenogenetic embryos derived from Nlrp2 knockdown oocytes mainly blocked at the 2-cell stage. The maternal depletion of Nlrp2 in zygotes led to early embryonic arrest. In addition, overexpression of Nlrp2 in zygotes appears to lead to normal development, but increases blastomere apoptosis in blastocysts. These results provide the first evidence that Nlrp2 is a member of the mammalian maternal effect genes and required for early embryonic development in the mouse

    SCMA Codebook Design Based on Decomposition of the Superposed Constellation for AWGN Channel

    No full text
    In this study, we propose a method named decomposition of the superposed constellation (DCSC) to design sparse code multiple access (SCMA) codebooks for the additive white Gaussian noise (AWGN) channel. We prove that the power of the user symbols (USs) is accurately determined by the power of the superposed constellation (SC). Thus, we select quadrature amplitude modulation (QAM) constellations as the SC and decompose the SC into several groups of USs with power diversity. The minimum Euclidean distance (MED) between superposed symbols (SS-MED) in the receiver is determined by the selected QAM and MED between the multi-dimensional codewords (CW-MED) is optimized by matching the symbols on different dimensions. We propose a simplified DCSC (S-DCSC) by modifying the factor graph and avoiding the transmission of USs with low power, which greatly reduces the complexity of the message passing algorithm (MPA). The simulations show that the SS-MEDs of DCSC and S-DCSC are larger than those in previous papers and the BER performance of the proposed codebooks is better than others

    A New Hybrid Prediction Method of Ultra-Short-Term Wind Power Forecasting Based on EEMD-PE and LSSVM Optimized by the GSA

    Get PDF
    Wind power time series data always exhibits nonlinear and non-stationary features, making it very difficult to accurately predict. In this paper, a novel hybrid wind power time series prediction model, based on ensemble empirical mode decomposition-permutation entropy (EEMD-PE), the least squares support vector machine model (LSSVM), and gravitational search algorithm (GSA), is proposed to improve accuracy of ultra-short-term wind power forecasting. To process the data, original wind power series were decomposed by EEMD-PE techniques into a number of subsequences with obvious complexity differences. Then, a new heuristic GSA algorithm was utilized to optimize the parameters of the LSSVM. The optimized model was developed for wind power forecasting and improved regression prediction accuracy. The proposed model was validated with practical wind power generation data from the Hebei province, China. A comprehensive error metric analysis was carried out to compare the performance of our method with other approaches. The results showed that the proposed model enhanced forecasting performance compared to other benchmark models

    OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation

    Full text link
    The booming of 3D recognition in the 2020s began with the introduction of point cloud transformers. They quickly overwhelmed sparse CNNs and became state-of-the-art models, especially in 3D semantic segmentation. However, sparse CNNs are still valuable networks, due to their efficiency treasure, and ease of application. In this work, we reexamine the design distinctions and test the limits of what a sparse CNN can achieve. We discover that the key credit to the performance difference is adaptivity. Specifically, we propose two key components, i.e., adaptive receptive fields (spatially) and adaptive relation, to bridge the gap. This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module to greatly enhance the adaptivity of sparse CNNs at minimal computational cost. Without any self-attention modules, OA-CNNs favorably surpass point transformers in terms of accuracy in both indoor and outdoor scenes, with much less latency and memory cost. Notably, it achieves 76.1%, 78.9%, and 70.6% mIoU on ScanNet v2, nuScenes, and SemanticKITTI validation benchmarks respectively, while maintaining at most 5x better speed than transformer counterparts. This revelation highlights the potential of pure sparse CNNs to outperform transformer-related networks.Comment: CVPR 202

    Hierarchical model predictive control strategy based on dynamic active power dispatch for wind power cluster integration

    No full text
    Large-scale wind power cluster with distributed wind farms has generated the active power dispatch and control problems in the power system. In this paper, a novel hierarchical model predictive control (HMPC) strategy based on dynamic active power dispatch is proposed to improve wind power schedule and increase wind power accommodation. The strategy consists of four layers with refined time scales, including intra-day dispatch, real-time dispatch, cluster optimization and wind farm modulation layer. A dynamic grouping strategy is specifically developed to allocate the schedule for wind farms in cluster optimization layer. In order to maximize wind power output, downward spinning reserve and transmission pathway utilization are developed in wind farm modulation layer. Meanwhile, a stratification analysis approach for ultra-short-term wind power forecasting error is presented as feedback correction to increase forecasting accuracy. The proposed strategy is evaluated by a case study in the IEEE network with wind power cluster integration. Results show that wind power accommodation has been enhanced by use of the proposed HMPC strategy, compared with the conventional dispatch and allocation methods
    corecore