67 research outputs found
Margin Maximization in Attention Mechanism
Attention mechanism is a central component of the transformer architecture
which led to the phenomenal success of large language models. However, the
theoretical principles underlying the attention mechanism are poorly
understood, especially its nonconvex optimization dynamics. In this work, we
explore the seminal softmax-attention model , where,
is the token sequence and
are tunable parameters. We
prove that running gradient descent on , or equivalently
, converges in direction to a max-margin solution that
separates tokens from non-optimal ones. This clearly
formalizes attention as a token separation mechanism. Remarkably, our results
are applicable to general data and precisely characterize
of tokens in terms of the value embeddings and problem
geometry. We also provide a broader regularization path analysis that
establishes the margin maximizing nature of attention even for nonlinear
prediction heads. When optimizing and
simultaneously with logistic loss, we identify conditions under which the
regularization paths directionally converge to their respective hard-margin SVM
solutions where separates the input features based on their
labels. Interestingly, the SVM formulation of is influenced by
the support vector geometry of . Finally, we verify our
theoretical findings via numerical experiments and provide insights
Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs
Chain-of-thought (CoT) is a method that enables language models to handle
complex reasoning tasks by decomposing them into simpler steps. Despite its
success, the underlying mechanics of CoT are not yet fully understood. In an
attempt to shed light on this, our study investigates the impact of CoT on the
ability of transformers to in-context learn a simple to study, yet general
family of compositional functions: multi-layer perceptrons (MLPs). In this
setting, we reveal that the success of CoT can be attributed to breaking down
in-context learning of a compositional function into two distinct phases:
focusing on data related to each step of the composition and in-context
learning the single-step composition function. Through both experimental and
theoretical evidence, we demonstrate how CoT significantly reduces the sample
complexity of in-context learning (ICL) and facilitates the learning of complex
functions that non-CoT methods struggle with. Furthermore, we illustrate how
transformers can transition from vanilla in-context learning to mastering a
compositional function with CoT by simply incorporating an additional layer
that performs the necessary filtering for CoT via the attention mechanism. In
addition to these test-time benefits, we highlight how CoT accelerates
pretraining by learning shortcuts to represent complex functions and how
filtering plays an important role in pretraining. These findings collectively
provide insights into the mechanics of CoT, inviting further investigation of
its role in complex reasoning tasks
Addressing Variable Dependency in GNN-based SAT Solving
Boolean satisfiability problem (SAT) is fundamental to many applications.
Existing works have used graph neural networks (GNNs) for (approximate) SAT
solving. Typical GNN-based end-to-end SAT solvers predict SAT solutions
concurrently. We show that for a group of symmetric SAT problems, the
concurrent prediction is guaranteed to produce a wrong answer because it
neglects the dependency among Boolean variables in SAT problems. % We propose
AsymSAT, a GNN-based architecture which integrates recurrent neural networks to
generate dependent predictions for variable assignments. The experiment results
show that dependent variable prediction extends the solving capability of the
GNN-based method as it improves the number of solved SAT instances on large
test sets
Mechanistic study of visible light-driven CdS or g-C<sub>3</sub>N<sub>4</sub>-catalyzed CâH direct trifluoromethylation of (hetero)arenes using CF<sub>3</sub>SO<sub>2</sub>Na as the trifluoromethyl source
The mild and sustainable methods for CâH direct trifluoromethylation of (hetero)arenes without any base or strong oxidants are in extremely high demand. Here, we report that the photo-generated electron-hole pairs of classical semiconductors (CdS or g-C3N4) under visible light excitation are effective to drive CâH trifluoromethylation of (hetero)arenes with stable and inexpensive CF3SO2Na as the trifluoromethyl (TFM) source via radical pathway. Either CdS or g-C3N4 propagated reaction can efficiently transform CF3SO2Na to [rad]CF3 radical and further afford the desired benzotrifluoride derivatives in moderate to good yields. After visible light initiated photocatalytic process, the key elements (such as F, S and C) derived from the starting TFM source of CF3SO2Na exhibited differential chemical forms as compared to those in other oxidative reactions. The photogenerated electron was trapped by chemisorbed O2 on photocatalysts to form superoxide radical anion (O2[rad]â) which will further attack [rad]CF3 radical with the generation of inorganic product Fâ and CO2. This resulted in a low utilization efficiency of [rad]CF3 (<50%). When nitro aromatic compounds and CF3SO2Na served as the starting materials in inert atmosphere, the photoexcited electrons can be directed to reduce the nitro group to amino group rather than being trapped by O2. Meanwhile, the photogenerated holes oxidize SO2CF3â into [rad]CF3. Both the photogenerated electrons and holes were engaged in reductive and oxidative paths, respectively. The desired product, trifluoromethylated aniline, was obtained successfully via one-pot free-radical synthesis.</p
Establishment of a viable cell detection system for microorganisms in wine based on ethidium monoazide and quantitative PCR
Fermentability and contamination level of wine can be assessed through the detection of viable fermentation-related and spoilage-related microorganisms. Ethidium monoazide in combination with quantitative PCR (EMA-qPCR) has been considered as a promising method to enumerate viable cells. Milling for 80 s by O 500-mu m glass beads is demonstrated to be optimal for DNA extraction from yeasts, lactic acid bacteria (LAB) and acetic acid bacteria (AAB) in wine to be used as a template for PCR. EMA-qPCR results from experiments using DNA extracted by this method correlate well with the results of a plating assay (R-2 > 0.99), and a PCR efficiency between 96% and 105% was obtained. Moreover, for all of these microorganisms, EMA treatment of pure cultures at a low concentration (10 mu g/mL) for 20 min photoactivation resulted in effective differentiation between viable and non-viable cells and had no effect on viable cells. Due to sublethal injury to some cells, underestimation of cell counts was found in most of the wine samples tested using the EMA-qPCR method, and a 40-min incubation in recovery medium could completely offset this error. Our results suggest an optimal glass-bead DNA extraction method and EMA treatment suitable for all of the main microorganisms in wine. The EMA-qPCR method was successfully applied to quantify yeasts. Saccharomyces cerevisiae (S. cerevisiae), LAB, non-Oenococcus oeni LAB (non-O. oeni LAB) and AAB in wine samples. (C) 2012 Elsevier Ltd. All rights reserved
Intermittent PI3Kδ inhibition sustains anti-tumour immunity and curbs irAEs
Phosphoinositide 3-kinase δ (PI3Kδ) has a key role in lymphocytes, and inhibitors that target this PI3K have been approved for treatment of B cell malignancies1-3. Although studies in mouse models of solid tumours have demonstrated that PI3Kδ inhibitors (PI3Kδi) can induce anti-tumour immunity4,5, its effect on solid tumours in humans remains unclear. Here we assessed the effects of the PI3Kδi AMG319 in human patients with head and neck cancer in a neoadjuvant, double-blind, placebo-controlled randomized phase II trial (EudraCT no. 2014-004388-20). PI3Kδ inhibition decreased the number of tumour-infiltrating regulatory T (Treg) cells and enhanced the cytotoxic potential of tumour-infiltrating T cells. At the tested doses of AMG319, immune-related adverse events (irAEs) required treatment to be discontinued in 12 out of 21 of patients treated with AMG319, suggestive of systemic effects on Treg cells. Accordingly, in mouse models, PI3Kδi decreased the number of Treg cells systemically and caused colitis. Single-cell RNA-sequencing analysis revealed a PI3Kδi-driven loss of tissue-resident colonic ST2 Treg cells, accompanied by expansion of pathogenic T helper 17 (TH17) and type 17 CD8+ T (TC17) cells, which probably contributed to toxicity; this points towards a specific mode of action for the emergence of irAEs. A modified treatment regimen with intermittent dosing of PI3Kδi in mouse models led to a significant decrease in tumour growth without inducing pathogenic T cells in colonic tissue, indicating that alternative dosing regimens might limit toxicity
Recommended from our members
Multi-layered study of T cells in inflammatory bowel disease pathogenesis
Inflammatory bowel disease (IBD) is a complicated disease characterized by an inflammation of the gastrointestinal (GI) tract, but the mechanism remains unknown. Among all the immune cells, T cells showed strong association with IBD pathogenesis. In this thesis, we studied how T cells contribute to IBD through genetics and pathogenic transcriptomic programs. Genome wide association studies (GWAS) identify a site near the metabolism gene laccase domain containing 1 (LACC1) as a risk for Crohnâs disease (CD). We previously found in populations that the Crohnâs disease risk allele correlates with decreased LACC1 expression in T lymphocytes. Despite this, the mechanism by which T cell gene expression is affected, and a link to T cell function and inflammatory disease, remained unknown. Here we identified sites in the promoter region in a haploblock that influenced LACC1 gene expression. Direct association of disease-risk variants with lower LACC1 mRNA was confirmed by comparing transcript quantity of the alleles in LACC1 heterozygous human CD4+ T cells. Using gene editing, we validated the role of this LACC1 region in gene expression in T cells. Human CD4+ T cells with LACC1 gene knockdown showed altered metabolism and reduced regulatory T cell differentiation. Overall, our study connects a disease GWAS hit by linking promoter region alterations specifically to changes in T cell metabolism and function. In the other part of the thesis, to identify the pathogenic T cell subsets, we compared T cells from the inflamed and non-involved tissues of active UC patients, with T cells from healthy donors and remission patients whose UC symptoms were temporarily suppressed by medications. Single-cell RNA seq analysis indicated that CD4 and CD8 T cells from inflamed tissues both showed increased IL17A-expressing cells (TH17/TC17), and TCF7-expressing stem-like T cells, which all showed more activation and pro-inflammatory features. RNA velocity and TCR analyses implied that the pathogenic TH17/TC17 cells subsets were derived from TCF7-expressing stem-like T cells, thus we hypothesized that stemness program is critical for the disease development. To validate the idea, we adaptively transferred Bcl6-deficient T cells, whose stemness program was impaired, into Rag1-/- mice to induce colitis. Compared to WT T cells, Bcl6-deficient T cells induced much less severe colitis with lower expansion and lower pathogenic TH17/TC17 populations, which showed high TFH gene signatures. In summary, our study unveiled the novel stem-like T cells in colitis, which could lead to new therapies to the disease
- âŚ