Search CORE

74 research outputs found

Margin Maximization in Attention Mechanism

Author: Li Yingcong
Oymak Samet
Tarzanagh Davoud Ataee
Zhang Xuechen
Publication venue
Publication date: 23/06/2023
Field of study

Attention mechanism is a central component of the transformer architecture which led to the phenomenal success of large language models. However, the theoretical principles underlying the attention mechanism are poorly understood, especially its nonconvex optimization dynamics. In this work, we explore the seminal softmax-attention model

f(\boldsymbol{X})=\langle \boldsymbol{Xv}, \texttt{softmax}(\boldsymbol{XWp})\rangle

, where,

\boldsymbol{X}

is the token sequence and

(\boldsymbol{v},\boldsymbol{W},\boldsymbol{p})

are tunable parameters. We prove that running gradient descent on

\boldsymbol{p}

, or equivalently

\boldsymbol{W}

, converges in direction to a max-margin solution that separates

\textit{locally-optimal}

tokens from non-optimal ones. This clearly formalizes attention as a token separation mechanism. Remarkably, our results are applicable to general data and precisely characterize

\textit{optimality}

of tokens in terms of the value embeddings

\boldsymbol{Xv}

and problem geometry. We also provide a broader regularization path analysis that establishes the margin maximizing nature of attention even for nonlinear prediction heads. When optimizing

\boldsymbol{v}

and

\boldsymbol{p}

simultaneously with logistic loss, we identify conditions under which the regularization paths directionally converge to their respective hard-margin SVM solutions where

\boldsymbol{v}

separates the input features based on their labels. Interestingly, the SVM formulation of

\boldsymbol{p}

is influenced by the support vector geometry of

\boldsymbol{v}

. Finally, we verify our theoretical findings via numerical experiments and provide insights

arXiv.org e-Print Archive

Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs

Author: Giannou Angeliki
Li Yingcong
Oymak Samet
Papailiopoulos Dimitris
Sreenivasan Kartik
Publication venue
Publication date: 30/05/2023
Field of study

Chain-of-thought (CoT) is a method that enables language models to handle complex reasoning tasks by decomposing them into simpler steps. Despite its success, the underlying mechanics of CoT are not yet fully understood. In an attempt to shed light on this, our study investigates the impact of CoT on the ability of transformers to in-context learn a simple to study, yet general family of compositional functions: multi-layer perceptrons (MLPs). In this setting, we reveal that the success of CoT can be attributed to breaking down in-context learning of a compositional function into two distinct phases: focusing on data related to each step of the composition and in-context learning the single-step composition function. Through both experimental and theoretical evidence, we demonstrate how CoT significantly reduces the sample complexity of in-context learning (ICL) and facilitates the learning of complex functions that non-CoT methods struggle with. Furthermore, we illustrate how transformers can transition from vanilla in-context learning to mastering a compositional function with CoT by simply incorporating an additional layer that performs the necessary filtering for CoT via the attention mechanism. In addition to these test-time benefits, we highlight how CoT accelerates pretraining by learning shortcuts to represent complex functions and how filtering plays an important role in pretraining. These findings collectively provide insights into the mechanics of CoT, inviting further investigation of its role in complex reasoning tasks

arXiv.org e-Print Archive

Addressing Variable Dependency in GNN-based SAT Solving

Author: Chen Yingcong
Li Min
Shi Zhengyuan
Yan Zhiyuan
Zhang Hongce
Zhang Wenjie
Publication venue
Publication date: 18/04/2023
Field of study

Boolean satisfiability problem (SAT) is fundamental to many applications. Existing works have used graph neural networks (GNNs) for (approximate) SAT solving. Typical GNN-based end-to-end SAT solvers predict SAT solutions concurrently. We show that for a group of symmetric SAT problems, the concurrent prediction is guaranteed to produce a wrong answer because it neglects the dependency among Boolean variables in SAT problems. % We propose AsymSAT, a GNN-based architecture which integrates recurrent neural networks to generate dependent predictions for variable assignments. The experiment results show that dependent variable prediction extends the solving capability of the GNN-based method as it improves the number of solved SAT instances on large test sets

arXiv.org e-Print Archive

Mechanics of Next Token Prediction with Self-Attention

Author: Huang Yixiao
Ildiz M. Emrullah
Li Yingcong
Oymak Samet
Rawat Ankit Singh
Publication venue
Publication date: 12/03/2024
Field of study

Transformer-based language models are trained on large datasets to predict the next token given an input sequence. Despite this simple training objective, they have led to revolutionary advances in natural language processing. Underlying this success is the self-attention mechanism. In this work, we ask:

\textit{What}

\textit{does}

\textit{a}

\textit{single}

\textit{self-attention}

\textit{layer}

\textit{learn}

\textit{from}

\textit{next-token}

\textit{prediction?}

We show that training self-attention with gradient descent learns an automaton which generates the next token in two distinct steps:

\textbf{(1)}

\textbf{Hard}

\textbf{retrieval:}

Given input sequence, self-attention precisely selects the

\textit{high-priority}

\textit{input}

\textit{tokens}

associated with the last input token.

\textbf{(2)}

\textbf{Soft}

\textbf{composition:}

It then creates a convex combination of the high-priority tokens from which the next token can be sampled. Under suitable conditions, we rigorously characterize these mechanics through a directed graph over tokens extracted from the training data. We prove that gradient descent implicitly discovers the strongly-connected components (SCC) of this graph and self-attention learns to retrieve the tokens that belong to the highest-priority SCC available in the context window. Our theory relies on decomposing the model weights into a directional component and a finite component that correspond to hard retrieval and soft composition steps respectively. This also formalizes a related implicit bias formula conjectured in [Tarzanagh et al. 2023]. We hope that these findings shed light on how self-attention processes sequential data and pave the path toward demystifying more complex architectures.Comment: Accepted to AISTATS 202

arXiv.org e-Print Archive

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

Author: Chen Yingcong
Li Haodong
Liang Yixun
Lin Jiantao
Xu Xiaogang
Yang Xin
Publication venue
Publication date: 01/12/2023
Field of study

The recent advancements in text-to-3D generation mark a significant milestone in generative models, unlocking new possibilities for creating imaginative 3D assets across various real-world scenarios. While recent advancements in text-to-3D generation have shown promise, they often fall short in rendering detailed and high-quality 3D models. This problem is especially prevalent as many methods base themselves on Score Distillation Sampling (SDS). This paper identifies a notable deficiency in SDS, that it brings inconsistent and low-quality updating direction for the 3D model, causing the over-smoothing effect. To address this, we propose a novel approach called Interval Score Matching (ISM). ISM employs deterministic diffusing trajectories and utilizes interval-based score matching to counteract over-smoothing. Furthermore, we incorporate 3D Gaussian Splatting into our text-to-3D generation pipeline. Extensive experiments show that our model largely outperforms the state-of-the-art in quality and training efficiency.Comment: The first two authors contributed equally to this work. Our code will be available at: https://github.com/EnVision-Research/LucidDreame

arXiv.org e-Print Archive

Analysis on the response of the dip slope with weak layer to earthquake

Author: Laigui WANG
Li XIANG
Na ZHAO
Yingcong SUN
Publication venue: Editorial Office of The Chinese Journal of Geological Hazard and Control
Publication date: 01/06/2024
Field of study

Taking the down-slope with weak strata in the south of Fushun west open-pit mine as the reference prototype, numerical simulations of the down-slope with weak strata were conducted through FLAC3D software, which included the simulation of actual ground motion and ground motion input, the boundary conditions of slope model, rock mass parameters, and grid model division. The response rule of down-slope with weak strata under an earthquake was investigated by analyzing the acceleration and velocity of the monitoring points. The results revealed that: (1) The thickness of the weak layer is a critical factor affecting the response characteristics of the slope with a single weak layer under an earthquake, and it has a greater impact on the stability of the slope under earthquake load than the dip angle of a single weak layer. (2) Based on the horizontal velocity of monitoring point 2# at the intersection of the weak layer and the slope surface, it was concluded that the thickness has a significant influence on the velocity in the X direction. (3) The failure response law at the intersection of weak layers and slope changes with an increase in slope height when analyzing the response law of the double weak layer characteristic slope under an earthquake. The acceleration amplitude and velocity change degree of monitoring point 3# with double weak layers are more noticeable than that of monitoring point 2#. The response law of the down-slope under an earthquake is related to the dip angle, thickness, number, and location of weak layers. Therefore, the coupling effect of earthquake and weak layer characteristics on slope stability should be thoroughly considered in the process of slope treatment and protection

Directory of Open Access Journals

Mechanistic study of visible light-driven CdS or g-C3N4-catalyzed C–H direct trifluoromethylation of (hetero)arenes using CF3SO2Na as the trifluoromethyl source

Author: Fu Xianzhi
Gao Li
Liu Ming
Lu Chenggang
Ma Xiongfeng
Qin Nanfang
Qiu Wenzhao
Sa Rongjian
Wang Lele
Wei Yingcong
Xu Chunwang
Yuan Rusheng
Zha Wenying
Publication venue
Publication date: 01/09/2020
Field of study

The mild and sustainable methods for C–H direct trifluoromethylation of (hetero)arenes without any base or strong oxidants are in extremely high demand. Here, we report that the photo-generated electron-hole pairs of classical semiconductors (CdS or g-C3N4) under visible light excitation are effective to drive C–H trifluoromethylation of (hetero)arenes with stable and inexpensive CF3SO2Na as the trifluoromethyl (TFM) source via radical pathway. Either CdS or g-C3N4 propagated reaction can efficiently transform CF3SO2Na to [rad]CF3 radical and further afford the desired benzotrifluoride derivatives in moderate to good yields. After visible light initiated photocatalytic process, the key elements (such as F, S and C) derived from the starting TFM source of CF3SO2Na exhibited differential chemical forms as compared to those in other oxidative reactions. The photogenerated electron was trapped by chemisorbed O2 on photocatalysts to form superoxide radical anion (O2[rad]−) which will further attack [rad]CF3 radical with the generation of inorganic product F− and CO2. This resulted in a low utilization efficiency of [rad]CF3 (<50%). When nitro aromatic compounds and CF3SO2Na served as the starting materials in inert atmosphere, the photoexcited electrons can be directed to reduce the nitro group to amino group rather than being trapped by O2. Meanwhile, the photogenerated holes oxidize SO2CF3− into [rad]CF3. Both the photogenerated electrons and holes were engaged in reductive and oxidative paths, respectively. The desired product, trifluoromethylated aniline, was obtained successfully via one-pot free-radical synthesis.</p

Proceedings - University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Establishment of a viable cell detection system for microorganisms in wine based on ethidium monoazide and quantitative PCR

Author: Huang Kunlun
Li Yingcong
Liang Zhihong
Luo Yunbo
Quoclinh Trinh
Shi Hui
Xu Wentao
Xu WT (reprint author), China Agr Univ, Coll Food Sci & Nutr Engn, Lab Food Safety, Beijing 100083, Peoples R China.
Publication venue
Publication date: 01/09/2012
Field of study

Fermentability and contamination level of wine can be assessed through the detection of viable fermentation-related and spoilage-related microorganisms. Ethidium monoazide in combination with quantitative PCR (EMA-qPCR) has been considered as a promising method to enumerate viable cells. Milling for 80 s by O 500-mu m glass beads is demonstrated to be optimal for DNA extraction from yeasts, lactic acid bacteria (LAB) and acetic acid bacteria (AAB) in wine to be used as a template for PCR. EMA-qPCR results from experiments using DNA extracted by this method correlate well with the results of a plating assay (R-2 > 0.99), and a PCR efficiency between 96% and 105% was obtained. Moreover, for all of these microorganisms, EMA treatment of pure cultures at a low concentration (10 mu g/mL) for 20 min photoactivation resulted in effective differentiation between viable and non-viable cells and had no effect on viable cells. Due to sublethal injury to some cells, underestimation of cell counts was found in most of the wine samples tested using the EMA-qPCR method, and a 40-min incubation in recovery medium could completely offset this error. Our results suggest an optimal glass-bead DNA extraction method and EMA treatment suitable for all of the main microorganisms in wine. The EMA-qPCR method was successfully applied to quantify yeasts. Saccharomyces cerevisiae (S. cerevisiae), LAB, non-Oenococcus oeni LAB (non-O. oeni LAB) and AAB in wine samples. (C) 2012 Elsevier Ltd. All rights reserved

Institutional Repository of Institute of Psychology, Chinese Academy of Sciences