67 research outputs found

    Margin Maximization in Attention Mechanism

    Full text link
    Attention mechanism is a central component of the transformer architecture which led to the phenomenal success of large language models. However, the theoretical principles underlying the attention mechanism are poorly understood, especially its nonconvex optimization dynamics. In this work, we explore the seminal softmax-attention model f(X)=⟨Xv,softmax(XWp)⟩f(\boldsymbol{X})=\langle \boldsymbol{Xv}, \texttt{softmax}(\boldsymbol{XWp})\rangle, where, X\boldsymbol{X} is the token sequence and (v,W,p)(\boldsymbol{v},\boldsymbol{W},\boldsymbol{p}) are tunable parameters. We prove that running gradient descent on p\boldsymbol{p}, or equivalently W\boldsymbol{W}, converges in direction to a max-margin solution that separates locally-optimal\textit{locally-optimal} tokens from non-optimal ones. This clearly formalizes attention as a token separation mechanism. Remarkably, our results are applicable to general data and precisely characterize optimality\textit{optimality} of tokens in terms of the value embeddings Xv\boldsymbol{Xv} and problem geometry. We also provide a broader regularization path analysis that establishes the margin maximizing nature of attention even for nonlinear prediction heads. When optimizing v\boldsymbol{v} and p\boldsymbol{p} simultaneously with logistic loss, we identify conditions under which the regularization paths directionally converge to their respective hard-margin SVM solutions where v\boldsymbol{v} separates the input features based on their labels. Interestingly, the SVM formulation of p\boldsymbol{p} is influenced by the support vector geometry of v\boldsymbol{v}. Finally, we verify our theoretical findings via numerical experiments and provide insights

    Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs

    Full text link
    Chain-of-thought (CoT) is a method that enables language models to handle complex reasoning tasks by decomposing them into simpler steps. Despite its success, the underlying mechanics of CoT are not yet fully understood. In an attempt to shed light on this, our study investigates the impact of CoT on the ability of transformers to in-context learn a simple to study, yet general family of compositional functions: multi-layer perceptrons (MLPs). In this setting, we reveal that the success of CoT can be attributed to breaking down in-context learning of a compositional function into two distinct phases: focusing on data related to each step of the composition and in-context learning the single-step composition function. Through both experimental and theoretical evidence, we demonstrate how CoT significantly reduces the sample complexity of in-context learning (ICL) and facilitates the learning of complex functions that non-CoT methods struggle with. Furthermore, we illustrate how transformers can transition from vanilla in-context learning to mastering a compositional function with CoT by simply incorporating an additional layer that performs the necessary filtering for CoT via the attention mechanism. In addition to these test-time benefits, we highlight how CoT accelerates pretraining by learning shortcuts to represent complex functions and how filtering plays an important role in pretraining. These findings collectively provide insights into the mechanics of CoT, inviting further investigation of its role in complex reasoning tasks

    Addressing Variable Dependency in GNN-based SAT Solving

    Full text link
    Boolean satisfiability problem (SAT) is fundamental to many applications. Existing works have used graph neural networks (GNNs) for (approximate) SAT solving. Typical GNN-based end-to-end SAT solvers predict SAT solutions concurrently. We show that for a group of symmetric SAT problems, the concurrent prediction is guaranteed to produce a wrong answer because it neglects the dependency among Boolean variables in SAT problems. % We propose AsymSAT, a GNN-based architecture which integrates recurrent neural networks to generate dependent predictions for variable assignments. The experiment results show that dependent variable prediction extends the solving capability of the GNN-based method as it improves the number of solved SAT instances on large test sets

    Mechanistic study of visible light-driven CdS or g-C<sub>3</sub>N<sub>4</sub>-catalyzed C–H direct trifluoromethylation of (hetero)arenes using CF<sub>3</sub>SO<sub>2</sub>Na as the trifluoromethyl source

    Get PDF
    The mild and sustainable methods for C–H direct trifluoromethylation of (hetero)arenes without any base or strong oxidants are in extremely high demand. Here, we report that the photo-generated electron-hole pairs of classical semiconductors (CdS or g-C3N4) under visible light excitation are effective to drive C–H trifluoromethylation of (hetero)arenes with stable and inexpensive CF3SO2Na as the trifluoromethyl (TFM) source via radical pathway. Either CdS or g-C3N4 propagated reaction can efficiently transform CF3SO2Na to [rad]CF3 radical and further afford the desired benzotrifluoride derivatives in moderate to good yields. After visible light initiated photocatalytic process, the key elements (such as F, S and C) derived from the starting TFM source of CF3SO2Na exhibited differential chemical forms as compared to those in other oxidative reactions. The photogenerated electron was trapped by chemisorbed O2 on photocatalysts to form superoxide radical anion (O2[rad]−) which will further attack [rad]CF3 radical with the generation of inorganic product F− and CO2. This resulted in a low utilization efficiency of [rad]CF3 (&lt;50%). When nitro aromatic compounds and CF3SO2Na served as the starting materials in inert atmosphere, the photoexcited electrons can be directed to reduce the nitro group to amino group rather than being trapped by O2. Meanwhile, the photogenerated holes oxidize SO2CF3− into [rad]CF3. Both the photogenerated electrons and holes were engaged in reductive and oxidative paths, respectively. The desired product, trifluoromethylated aniline, was obtained successfully via one-pot free-radical synthesis.</p

    Establishment of a viable cell detection system for microorganisms in wine based on ethidium monoazide and quantitative PCR

    Get PDF
    Fermentability and contamination level of wine can be assessed through the detection of viable fermentation-related and spoilage-related microorganisms. Ethidium monoazide in combination with quantitative PCR (EMA-qPCR) has been considered as a promising method to enumerate viable cells. Milling for 80 s by O 500-mu m glass beads is demonstrated to be optimal for DNA extraction from yeasts, lactic acid bacteria (LAB) and acetic acid bacteria (AAB) in wine to be used as a template for PCR. EMA-qPCR results from experiments using DNA extracted by this method correlate well with the results of a plating assay (R-2 > 0.99), and a PCR efficiency between 96% and 105% was obtained. Moreover, for all of these microorganisms, EMA treatment of pure cultures at a low concentration (10 mu g/mL) for 20 min photoactivation resulted in effective differentiation between viable and non-viable cells and had no effect on viable cells. Due to sublethal injury to some cells, underestimation of cell counts was found in most of the wine samples tested using the EMA-qPCR method, and a 40-min incubation in recovery medium could completely offset this error. Our results suggest an optimal glass-bead DNA extraction method and EMA treatment suitable for all of the main microorganisms in wine. The EMA-qPCR method was successfully applied to quantify yeasts. Saccharomyces cerevisiae (S. cerevisiae), LAB, non-Oenococcus oeni LAB (non-O. oeni LAB) and AAB in wine samples. (C) 2012 Elsevier Ltd. All rights reserved

    Intermittent PI3Kδ inhibition sustains anti-tumour immunity and curbs irAEs

    Get PDF
    Phosphoinositide 3-kinase δ (PI3Kδ) has a key role in lymphocytes, and inhibitors that target this PI3K have been approved for treatment of B cell malignancies1-3. Although studies in mouse models of solid tumours have demonstrated that PI3Kδ inhibitors (PI3Kδi) can induce anti-tumour immunity4,5, its effect on solid tumours in humans remains unclear. Here we assessed the effects of the PI3Kδi AMG319 in human patients with head and neck cancer in a neoadjuvant, double-blind, placebo-controlled randomized phase II trial (EudraCT no. 2014-004388-20). PI3Kδ inhibition decreased the number of tumour-infiltrating regulatory T (Treg) cells and enhanced the cytotoxic potential of tumour-infiltrating T cells. At the tested doses of AMG319, immune-related adverse events (irAEs) required treatment to be discontinued in 12 out of 21 of patients treated with AMG319, suggestive of systemic effects on Treg cells. Accordingly, in mouse models, PI3Kδi decreased the number of Treg cells systemically and caused colitis. Single-cell RNA-sequencing analysis revealed a PI3Kδi-driven loss of tissue-resident colonic ST2 Treg cells, accompanied by expansion of pathogenic T helper 17 (TH17) and type 17 CD8+ T (TC17) cells, which probably contributed to toxicity; this points towards a specific mode of action for the emergence of irAEs. A modified treatment regimen with intermittent dosing of PI3Kδi in mouse models led to a significant decrease in tumour growth without inducing pathogenic T cells in colonic tissue, indicating that alternative dosing regimens might limit toxicity
    • …
    corecore