7 research outputs found

    xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein

    Full text link
    Protein language models have shown remarkable success in learning biological information from protein sequences. However, most existing models are limited by either autoencoding or autoregressive pre-training objectives, which makes them struggle to handle protein understanding and generation tasks concurrently. We propose a unified protein language model, xTrimoPGLM, to address these two types of tasks simultaneously through an innovative pre-training framework. Our key technical contribution is an exploration of the compatibility and the potential for joint optimization of the two types of objectives, which has led to a strategy for training xTrimoPGLM at an unprecedented scale of 100 billion parameters and 1 trillion training tokens. Our extensive experiments reveal that 1) xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories. The model also facilitates an atomic-resolution view of protein structures, leading to an advanced 3D structural prediction model that surpasses existing language model-based tools. 2) xTrimoPGLM not only can generate de novo protein sequences following the principles of natural ones, but also can perform programmable generation after supervised fine-tuning (SFT) on curated sequences. These results highlight the substantial capability and versatility of xTrimoPGLM in understanding and generating protein sequences, contributing to the evolving landscape of foundation models in protein science

    Towards Plastic and Stable Exemplar-Free Incremental Learning: A Dual-Learner Framework with Cumulative Parameter Averaging

    Full text link
    The dilemma between plasticity and stability presents a significant challenge in Incremental Learning (IL), especially in the exemplar-free scenario where accessing old-task samples is strictly prohibited during the learning of a new task. A straightforward solution to this issue is learning and storing an independent model for each task, known as Single Task Learning (STL). Despite the linear growth in model storage with the number of tasks in STL, we empirically discover that averaging these model parameters can potentially preserve knowledge across all tasks. Inspired by this observation, we propose a Dual-Learner framework with Cumulative Parameter Averaging (DLCPA). DLCPA employs a dual-learner design: a plastic learner focused on acquiring new-task knowledge and a stable learner responsible for accumulating all learned knowledge. The knowledge from the plastic learner is transferred to the stable learner via cumulative parameter averaging. Additionally, several task-specific classifiers work in cooperation with the stable learner to yield the final prediction. Specifically, when learning a new task, these modules are updated in a cyclic manner: i) the plastic learner is initially optimized using a self-supervised loss besides the supervised loss to enhance the feature extraction robustness; ii) the stable learner is then updated with respect to the plastic learner in a cumulative parameter averaging manner to maintain its task-wise generalization; iii) the task-specific classifier is accordingly optimized to align with the stable learner. Experimental results on CIFAR-100 and Tiny-ImageNet show that DLCPA outperforms several state-of-the-art exemplar-free baselines in both Task-IL and Class-IL settings

    AirNet: A Calibration Model for Low-Cost Air Monitoring Sensors Using Dual Sequence Encoder Networks

    No full text
    Air pollution monitoring has attracted much attention in recent years. However, accurate and high-resolution monitoring of atmospheric pollution remains challenging. There are two types of devices for air pollution monitoring, i.e., static stations and mobile stations. Static stations can provide accurate pollution measurements but their spatial distribution is sparse because of their high expense. In contrast, mobile stations offer an effective solution for dense placement by utilizing low-cost air monitoring sensors, whereas their measurements are less accurate. In this work, we propose a data-driven model based on deep neural networks, referred to as AirNet, for calibrating low-cost air monitoring sensors. Unlike traditional methods, which treat the calibration task as a point-to-point regression problem, we model it as a sequence-to-point mapping problem by introducing historical data sequences from both a mobile station (to be calibrated) and the referred static station. Specifically, AirNet first extracts an observation trend feature of the mobile station and a reference trend feature of the static station via dual encoder neural networks. Then, a social-based guidance mechanism is designed to select periodic and adjacent features. Finally, the features are fused and fed into a decoder to obtain a calibrated measurement. We evaluate the proposed method on two real-world datasets and compare it with six baselines. The experimental results demonstrate that our method yields the best performance

    Exemplar-free Class Incremental Learning via Discriminative and Comparable One-class Classifiers

    Full text link
    The exemplar-free class incremental learning requires classification models to learn new class knowledge incrementally without retaining any old samples. Recently, the framework based on parallel one-class classifiers (POC), which trains a one-class classifier (OCC) independently for each category, has attracted extensive attention, since it can naturally avoid catastrophic forgetting. POC, however, suffers from weak discriminability and comparability due to its independent training strategy for different OOCs. To meet this challenge, we propose a new framework, named Discriminative and Comparable One-class classifiers for Incremental Learning (DisCOIL). DisCOIL follows the basic principle of POC, but it adopts variational auto-encoders (VAE) instead of other well-established one-class classifiers (e.g. deep SVDD), because a trained VAE can not only identify the probability of an input sample belonging to a class but also generate pseudo samples of the class to assist in learning new tasks. With this advantage, DisCOIL trains a new-class VAE in contrast with the old-class VAEs, which forces the new-class VAE to reconstruct better for new-class samples but worse for the old-class pseudo samples, thus enhancing the comparability. Furthermore, DisCOIL introduces a hinge reconstruction loss to ensure the discriminability. We evaluate our method extensively on MNIST, CIFAR10, and Tiny-ImageNet. The experimental results show that DisCOIL achieves state-of-the-art performance

    A Coarse-to-Fine Model for Rail Surface Defect Detection

    No full text

    A Power-Angle-Spectrum Based Clustering and Tracking Algorithm for Time-Varying Radio Channels

    No full text
    Radio channel modeling has been an important research topic, since the performance of any communication system depends on channel characteristics. So far, most existing clustering algorithms are conducted based on the multipath components (MPCs) extracted by using a high-resolution parameter estimation approach, e.g., SAGE or MUSIC, etc. However, most of the estimation approaches require prior information to extract MPCs. Moreover, the high-resolution estimation approaches usually result in relatively high complexity, and thus, the clusters can only be identified by using an offline approach after the measurements. Therefore, a power-angle-spectrum (PAS) based clustering and tracking algorithm (PASCT) is proposed in this paper. First, a PAS is extracted from measurement data by using a Bartlett beamformer. For each PAS, the potential targets are selected from the background and separated into clusters by using image processing approaches. The recognized clusters are characterized by the following three attributes: size, position, and shape feature, where an orientation histogram is developed to describe the shape feature of the clusters. Moreover, a cost minimizing tracking approach based on Kuhn-Munkres method is proposed to accurately identify the clusters in non-stationary channels. The proposed PASCT algorithm is validated based on both simulations and measurements. It is found that the dominating clusters in both line-of-sight and non-line-of-sight environments can be well recognized and tracked with the proposed algorithm. By using the proposed algorithm, the dynamic changes of the clusters in real-time channel measurements, e.g., number, birth-death process, and size of the clusters, can be well observed. Through the experiments, the proposed algorithm can achieve fairly good accuracy on the cluster identification with lower complexity compared to the conventional solution
    corecore