7 research outputs found
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein
Protein language models have shown remarkable success in learning biological
information from protein sequences. However, most existing models are limited
by either autoencoding or autoregressive pre-training objectives, which makes
them struggle to handle protein understanding and generation tasks
concurrently. We propose a unified protein language model, xTrimoPGLM, to
address these two types of tasks simultaneously through an innovative
pre-training framework. Our key technical contribution is an exploration of the
compatibility and the potential for joint optimization of the two types of
objectives, which has led to a strategy for training xTrimoPGLM at an
unprecedented scale of 100 billion parameters and 1 trillion training tokens.
Our extensive experiments reveal that 1) xTrimoPGLM significantly outperforms
other advanced baselines in 18 protein understanding benchmarks across four
categories. The model also facilitates an atomic-resolution view of protein
structures, leading to an advanced 3D structural prediction model that
surpasses existing language model-based tools. 2) xTrimoPGLM not only can
generate de novo protein sequences following the principles of natural ones,
but also can perform programmable generation after supervised fine-tuning (SFT)
on curated sequences. These results highlight the substantial capability and
versatility of xTrimoPGLM in understanding and generating protein sequences,
contributing to the evolving landscape of foundation models in protein science
Towards Plastic and Stable Exemplar-Free Incremental Learning: A Dual-Learner Framework with Cumulative Parameter Averaging
The dilemma between plasticity and stability presents a significant challenge
in Incremental Learning (IL), especially in the exemplar-free scenario where
accessing old-task samples is strictly prohibited during the learning of a new
task. A straightforward solution to this issue is learning and storing an
independent model for each task, known as Single Task Learning (STL). Despite
the linear growth in model storage with the number of tasks in STL, we
empirically discover that averaging these model parameters can potentially
preserve knowledge across all tasks. Inspired by this observation, we propose a
Dual-Learner framework with Cumulative Parameter Averaging (DLCPA). DLCPA
employs a dual-learner design: a plastic learner focused on acquiring new-task
knowledge and a stable learner responsible for accumulating all learned
knowledge. The knowledge from the plastic learner is transferred to the stable
learner via cumulative parameter averaging. Additionally, several task-specific
classifiers work in cooperation with the stable learner to yield the final
prediction. Specifically, when learning a new task, these modules are updated
in a cyclic manner: i) the plastic learner is initially optimized using a
self-supervised loss besides the supervised loss to enhance the feature
extraction robustness; ii) the stable learner is then updated with respect to
the plastic learner in a cumulative parameter averaging manner to maintain its
task-wise generalization; iii) the task-specific classifier is accordingly
optimized to align with the stable learner. Experimental results on CIFAR-100
and Tiny-ImageNet show that DLCPA outperforms several state-of-the-art
exemplar-free baselines in both Task-IL and Class-IL settings
AirNet: A Calibration Model for Low-Cost Air Monitoring Sensors Using Dual Sequence Encoder Networks
Air pollution monitoring has attracted much attention in recent years. However, accurate and high-resolution monitoring of atmospheric pollution remains challenging. There are two types of devices for air pollution monitoring, i.e., static stations and mobile stations. Static stations can provide accurate pollution measurements but their spatial distribution is sparse because of their high expense. In contrast, mobile stations offer an effective solution for dense placement by utilizing low-cost air monitoring sensors, whereas their measurements are less accurate. In this work, we propose a data-driven model based on deep neural networks, referred to as AirNet, for calibrating low-cost air monitoring sensors. Unlike traditional methods, which treat the calibration task as a point-to-point regression problem, we model it as a sequence-to-point mapping problem by introducing historical data sequences from both a mobile station (to be calibrated) and the referred static station. Specifically, AirNet first extracts an observation trend feature of the mobile station and a reference trend feature of the static station via dual encoder neural networks. Then, a social-based guidance mechanism is designed to select periodic and adjacent features. Finally, the features are fused and fed into a decoder to obtain a calibrated measurement. We evaluate the proposed method on two real-world datasets and compare it with six baselines. The experimental results demonstrate that our method yields the best performance
Exemplar-free Class Incremental Learning via Discriminative and Comparable One-class Classifiers
The exemplar-free class incremental learning requires classification models
to learn new class knowledge incrementally without retaining any old samples.
Recently, the framework based on parallel one-class classifiers (POC), which
trains a one-class classifier (OCC) independently for each category, has
attracted extensive attention, since it can naturally avoid catastrophic
forgetting. POC, however, suffers from weak discriminability and comparability
due to its independent training strategy for different OOCs. To meet this
challenge, we propose a new framework, named Discriminative and Comparable
One-class classifiers for Incremental Learning (DisCOIL). DisCOIL follows the
basic principle of POC, but it adopts variational auto-encoders (VAE) instead
of other well-established one-class classifiers (e.g. deep SVDD), because a
trained VAE can not only identify the probability of an input sample belonging
to a class but also generate pseudo samples of the class to assist in learning
new tasks. With this advantage, DisCOIL trains a new-class VAE in contrast with
the old-class VAEs, which forces the new-class VAE to reconstruct better for
new-class samples but worse for the old-class pseudo samples, thus enhancing
the comparability. Furthermore, DisCOIL introduces a hinge reconstruction loss
to ensure the discriminability. We evaluate our method extensively on MNIST,
CIFAR10, and Tiny-ImageNet. The experimental results show that DisCOIL achieves
state-of-the-art performance
A Power-Angle-Spectrum Based Clustering and Tracking Algorithm for Time-Varying Radio Channels
Radio channel modeling has been an important research topic, since the performance of any communication system depends on channel characteristics. So far, most existing clustering algorithms are conducted based on the multipath components (MPCs) extracted by using a high-resolution parameter estimation approach, e.g., SAGE or MUSIC, etc. However, most of the estimation approaches require prior information to extract MPCs. Moreover, the high-resolution estimation approaches usually result in relatively high complexity, and thus, the clusters can only be identified by using an offline approach after the measurements. Therefore, a power-angle-spectrum (PAS) based clustering and tracking algorithm (PASCT) is proposed in this paper. First, a PAS is extracted from measurement data by using a Bartlett beamformer. For each PAS, the potential targets are selected from the background and separated into clusters by using image processing approaches. The recognized clusters are characterized by the following three attributes: size, position, and shape feature, where an orientation histogram is developed to describe the shape feature of the clusters. Moreover, a cost minimizing tracking approach based on Kuhn-Munkres method is proposed to accurately identify the clusters in non-stationary channels. The proposed PASCT algorithm is validated based on both simulations and measurements. It is found that the dominating clusters in both line-of-sight and non-line-of-sight environments can be well recognized and tracked with the proposed algorithm. By using the proposed algorithm, the dynamic changes of the clusters in real-time channel measurements, e.g., number, birth-death process, and size of the clusters, can be well observed. Through the experiments, the proposed algorithm can achieve fairly good accuracy on the cluster identification with lower complexity compared to the conventional solution