Association for the Advancement of Artificial Intelligence: AAAI Publications
Not a member yet
25748 research outputs found
Sort by 
Multi-View Incremental Learning with Structured Hebbian Plasticity for Enhanced Fusion Efficiency
The rapid evolution of multimedia technology has revolutionized human perception, paving the way for multi-view learning. However, traditional multi-view learning approaches are tailored for scenarios with fixed data views, falling short of emulating the intricate cognitive procedures of the human brain processing signals sequentially. Our cerebral architecture seamlessly integrates sequential data through intricate feed-forward and feedback mechanisms. In stark contrast, traditional methods struggle to generalize effectively when confronted with data spanning diverse domains, highlighting the need for innovative strategies that can mimic the brain's adaptability and dynamic integration capabilities. In this paper, we propose a bio-neurologically inspired multi-view incremental framework named MVIL aimed at emulating the brain's fine-grained fusion of sequentially arriving views. MVIL lies two fundamental modules: structured Hebbian plasticity and synaptic partition learning. The structured Hebbian plasticity reshapes the structure of weights to express the high correlation between view representations, facilitating a fine-grained fusion of view representations. Moreover, synaptic partition learning is efficient in alleviating drastic changes in weights and also retaining old knowledge by inhibiting partial synapses. These modules bionically play a central role in reinforcing crucial associations between newly acquired information and existing knowledge repositories, thereby enhancing the network's capacity for generalization. Experimental results on six benchmark datasets show MVIL's effectiveness over state-of-the-art methods
A Thorough Comparison Between Independent Cascade and Susceptible-Infected-Recovered Models
We study cascades in social networks with the independent cascade (IC) model and the Susceptible-Infected-recovered (SIR) model. The well-studied IC model fails to capture the feature of node recovery, and the SIR model is a variant of the IC model with the node recovery feature. In the SIR model, by computing the probability that a node successfully infects another before its recovery and viewing this probability as the corresponding IC parameter, an equivalence between the two models is established, except that the events of the infections along different out-going edges of a node become dependent in the SIR model, whereas these events are independent in the IC model. In this paper, we thoroughly compare the two models and examine the effect of this extra dependency in the SIR model. By a carefully designed coupling argument, we show that the seeds in the IC model have a stronger influence spread than their counterparts in the SIR model, and sometimes it can be significantly stronger. Specifically, we prove that, given the same network, the same seed sets, and the parameters of the two models being set based on the above-mentioned equivalence, the expected number of infected nodes at the end of the cascade for the IC model is weakly larger than that for the SIR model, and there are instances where this dominance is significant.
We also study the influence maximization problem (the optimization problem of selecting a set of nodes as initial seeds in a social network to maximize their influence) with the SIR model. We show that the above-mentioned difference in the two models yields different seed-selection strategies, which motivates the design of influence maximization algorithms specifically for the SIR model. We design efficient approximation algorithms with theoretical guarantees by adapting the reverse-reachable-set-based algorithms, commonly used for the IC model, to the SIR model
Phoneme-Level Feature Discrepancies: A Key to Detecting Sophisticated Speech Deepfakes
Recent advancements in text-to-speech and speech conversion technologies have enabled the creation of highly convincing synthetic speech. While these innovations offer numerous practical benefits, they also cause significant security challenges when maliciously misused. Therefore, there is an urgent need to detect these synthetic speech signals. Phoneme features provide a powerful speech representation for deepfake detection. However, previous phoneme-based detection approaches typically focused on specific phonemes, overlooking temporal inconsistencies across the entire phoneme sequence. In this paper, we develop a new mechanism for detecting speech deepfakes by identifying the inconsistencies of phoneme-level speech features. We design an adaptive phoneme pooling technique that extracts sample-specific phoneme-level features from frame-level speech data. By applying this technique to features extracted by pre-trained audio models on previously unseen deepfake datasets, we demonstrate that deepfake samples often exhibit phoneme-level inconsistencies when compared to genuine speech. To further enhance detection accuracy, we propose a deepfake detector that uses a graph attention network to model the temporal dependencies of phoneme-level features. Additionally, we introduce a random phoneme substitution augmentation technique to increase feature diversity during training. Extensive experiments on four benchmark datasets demonstrate the superior performance of our method over existing state-of-the-art detection methods
AnalogCoder: Analog Circuit Design via Training-Free Code Generation
Analog circuit design is a significant task in modern chip technology, focusing on the selection of component types, connectivity, and parameters to ensure proper circuit functionality. Despite advances made by Large Language Models (LLMs) in digital circuit design, the complexity and scarcity of data in analog circuitry pose significant challenges. To mitigate these issues, we introduce AnalogCoder, the first training-free LLM agent for designing analog circuits through Python code generation. Firstly, AnalogCoder incorporates a feedback-enhanced flow with tailored domain-specific prompts, enabling the automated and self-correcting design of analog circuits with a high success rate. Secondly, it proposes a circuit tool library to archive successful designs as reusable modular sub-circuits, simplifying composite circuit creation. Thirdly, extensive experiments on a benchmark designed to cover a wide range of analog circuit tasks show that AnalogCoder outperforms other LLM-based methods. It has successfully designed 20 circuits, 5 more than standard GPT-4o. We believe AnalogCoder can significantly improve the labor-intensive chip design process, enabling non-experts to design analog circuits efficiently
Deep Reinforcement Learning with Time-Scale Invariant Memory
The ability to estimate temporal relationships is critical for both animals and artificial agents. Cognitive science and neuroscience provide remarkable insights into behavioral and neural aspects of temporal credit assignment. In particular, scale invariance of learning dynamics, observed in behavior and supported by neural data, is one of the key principles that governs animal perception: proportional rescaling of temporal relationships does not alter the overall learning efficiency. Here we integrate a computational neuroscience model of scale invariant memory into deep reinforcement learning (RL) agents. We first provide a theoretical analysis and then demonstrate through experiments that such agents can learn robustly across a wide range of temporal scales, unlike agents built with commonly used recurrent memory architectures such as LSTM. This result illustrates that incorporating computational principles from neuroscience and cognitive science into deep neural networks can enhance adaptability to complex temporal dynamics, mirroring some of the core properties of human learning
Controllable Protein Sequence Generation with LLM Preference Optimization
Designing proteins with specific attributes offers an important solution to address biomedical challenges. Pre-trained protein large language models (LLMs) have shown promising results on protein sequence generation. However, to control sequence generation for specific attributes, existing work still exhibits poor functionality and structural stability. In this paper, we propose a novel controllable protein design method called CtrlProt. We finetune a protein LLM with a new multi-listwise preference optimization strategy to improve generation quality and support multi-attribute controllable generation. Experiments demonstrate that CtrlProt can meet functionality and structural stability requirements effectively, achieving state-of-the-art performance in both single-attribute and multi-attribute protein sequence generation
Enhancing Generalizability in Molecular Conformation Generation with METRIZATION-Informed Geometric Diffusion Pretraining
Diffusion-based generative models have recently excelled in generating molecular conformations but struggled with the generalization issue -- models trained on one dataset may produce meaningless conformations on out-of-distribution molecules. 
On the other hand, distance geometry serves as a generalizable tool for the traditional computational chemistry methods of molecular conformation, which is predicated on the assumption that it is possible to adequately define the set of all potential conformations of any non-rigid molecular system using purely geometric constraints.
In this work, we for the first time explicitly incorporate distance geometry constraints into pretraining phase of diffusion-based molecular generation models to improve the generalizability.
Inspired by the classical distance geometry solution designed for solving the molecular distance geometry problem, we propose MiGDiff, a Metrization-Informed Geometric Diffusion framework. MiGDiff injects distance geometry constraints by pretraining the deep geometric diffusion backbone within the Metrization sampling approach, yielding a "Metrization-driven pretraining + Data-driven finetuning" paradigm. Experimental results demonstrate that MiGDiff outperforms state-of-the-art methods and possesses strong generalization capabilities, particularly on generating previously unseen molecules, revealing the vast untapped potential of combining traditional computational methods with deep generative models for 3D molecular generation
Revisiting Tampered Scene Text Detection in the Era of Generative AI
The rapid advancements of generative AI have fueled the potential of generative text image editing, meanwhile escalating the threat of misinformation spreading.  However, existing forensics methods struggle to detect unseen forgery types that they have not been trained on, underscoring the need for a model capable of generalized detection of tampered scene text.  To tackle this, we propose a novel task: open-set tampered scene text detection, which evaluates forensics models on their ability to identify both seen and previously unseen forgery types. We have curated a comprehensive, high-quality dataset, featuring the texts tampered by eight text editing models, to thoroughly assess the open-set generalization capabilities. Further, we introduce a novel and effective pre-training paradigm that subtly alters the texture of selected texts within an image and trains the model to identify these regions. This approach not only mitigates the scarcity of high-quality training data but also enhances models' fine-grained perception and open-set generalization abilities. Additionally, we present DAF, a novel framework that improves open-set generalization by distinguishing between the features of authentic and tampered text, rather than focusing solely on the tampered text's features. Our extensive experiments validate the remarkable efficacy of our methods. For example, our zero-shot performance can even beat the previous state-of-the-art full-shot model by a large margin
CG-TGAN: Conditional Generative Adversarial Networks with Graph Neural Networks for Tabular Data Synthesizing
Data sharing is necessary for AI to be widely used, but sharing sensitive data with others with privacy is risky.
To solve these problems, it is necessary to synthesize realistic tabular data.
In many cases, tabular data contains a mixture of continuous, mixed, categorical columns.
Moreover, columns of the same type may have multimodal distribution or be highly imbalanced.
These issues make it challenging to synthesize tabular data.
The synthesized tabular data should reflect the relational meaning between columns of tabular data, so modeling the probability distribution of tabular data is a non-trivial task.
Traditional tabular data synthesizing models are based on GAN or diffusion models and are built using fully connected or convolutional layers.
However, fully connected layers have the disadvantage of low inductive bias, and convolutional layers are not invariant to the column order of tabular data.
Therefore, we assume that converting tabular data into graph-structured data and using a graph neural network would produce better synthetic data than using fully connected layers or convolutional layers.
Our study aims to show that GANs constructed with graph neural networks can outperform existing GAN models using fully connected layers or convolutional layers.
We propose CG-TGAN, a conditional GAN built using graph neural networks. To learn how to synthesize realistic data, the graph neural networks in the discriminator and generator learn graph-level tasks and node-level tasks together.
The discriminator of CG-TGAN learns a graph-level task to distinguish between real and synthetic data and node-level tasks to predict the value of the target node.
CG-TGAN’s generator learns a graph-level task to synthesize an overall graph similar to real data and node-level tasks to learn how to synthesize a fake graph with the proper relation between nodes.
In this paper, we show that CG-TGAN outperforms GAN-based models and is comparable to diffusion-based models
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers
Pretrained Language Models (PLMs) have become the de facto starting point for fine-tuning on downstream tasks. However, as model sizes continue to increase, traditional fine-tuning of all parameters becomes challenging. To address this, parameter-efficient fine-tuning (PEFT) methods have gained popularity as a means to adapt PLMs effectively. In parallel, recent studies have revealed the presence of activation sparsity within the intermediate outputs of the multilayer perceptron (MLP) blocks in transformers. Low activation density enables efficient model inference on sparsity-aware hardware. Building upon this insight, in this work, we propose a novel density loss that encourages higher activation sparsity (equivalently, lower activation density) in the pre-trained models. We demonstrate the effectiveness of our approach by utilizing mainstream PEFT techniques, including QLoRA, LoRA, Adapter, and Prompt/Prefix Tuning, to facilitate efficient model adaptation across diverse downstream tasks. Experiments show that our proposed method, DEFT (Density-Efficient Fine-Tuning), can consistently reduce activation density by up to 44.94% on RoBERTa (Large) and by 53.19 (encoder density) and 90.60% (decoder density) on Flan-T5-XXL (11B) compared to PEFT, using GLUE and QA (SQuAD) benchmarks respectively while maintaining competitive performance on downstream tasks. We also introduce ADA-DEFT, an adaptive variant of our DEFT approach, which achieves significant memory and runtime savings during inference for large models. For instance, ADA-DEFT reduces runtime by 8.75% and memory usage by 16.78% in Flan-T5-XL and by 2.79% and 2.54%, respectively, in Flan-T5- XXL. Additionally, we showcase that DEFT works complementarily with quantized and pruned models