Search CORE

6 research outputs found

Text Assisted Insight Ranking Using Context-Aware Memory Network

Author: Huang Wenhao
Luo Liangchen
Tang Yang
Zeng Qi
Publication venue
Publication date: 13/11/2018
Field of study

Extracting valuable facts or informative summaries from multi-dimensional tables, i.e. insight mining, is an important task in data analysis and business intelligence. However, ranking the importance of insights remains a challenging and unexplored task. The main challenge is that explicitly scoring an insight or giving it a rank requires a thorough understanding of the tables and costs a lot of manual efforts, which leads to the lack of available training data for the insight ranking problem. In this paper, we propose an insight ranking model that consists of two parts: A neural ranking model explores the data characteristics, such as the header semantics and the data statistical features, and a memory network model introduces table structure and context information into the ranking process. We also build a dataset with text assistance. Experimental results show that our approach largely improves the ranking precision as reported in multi evaluation metrics.Comment: Accepted to AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Tight Lower Bounds for Multiplicative Weights Algorithmic Families

Author: Gravin Nick
Peres Yuval
Sivan Balasubramanian
Publication venue
Publication date: 13/07/2016
Field of study

We study the fundamental problem of prediction with expert advice and develop regret lower bounds for a large family of algorithms for this problem. We develop simple adversarial primitives, that lend themselves to various combinations leading to sharp lower bounds for many algorithmic families. We use these primitives to show that the classic Multiplicative Weights Algorithm (MWA) has a regret of

\sqrt{\frac{T \ln k}{2}}

, there by completely closing the gap between upper and lower bounds. We further show a regret lower bound of

\frac{2}{3}\sqrt{\frac{T\ln k}{2}}

for a much more general family of algorithms than MWA, where the learning rate can be arbitrarily varied over time, or even picked from arbitrary distributions over time. We also use our primitives to construct adversaries in the geometric horizon setting for MWA to precisely characterize the regret at

\frac{0.391}{\sqrt{\delta}}

for the case of

2

experts and a lower bound of

\frac{1}{2}\sqrt{\frac{\ln k}{2\delta}}

for the case of arbitrary number of experts

k

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

What can a Single Attention Layer Learn? A Study Through the Random Features Lens

Author: Bai Yu
Fu Hengyu
Guo Tianyu
Mei Song
Publication venue
Publication date: 21/07/2023
Field of study

Attention layers -- which map a sequence of inputs to a sequence of outputs -- are core building blocks of the Transformer architecture which has achieved significant breakthroughs in modern artificial intelligence. This paper presents a rigorous theoretical study on the learning and generalization of a single multi-head attention layer, with a sequence of key vectors and a separate query vector as input. We consider the random feature setting where the attention layer has a large number of heads, with randomly sampled frozen query and key matrices, and trainable value matrices. We show that such a random-feature attention layer can express a broad class of target functions that are permutation invariant to the key vectors. We further provide quantitative excess risk bounds for learning these target functions from finite samples, using random feature attention with finitely many heads. Our results feature several implications unique to the attention structure compared with existing random features theory for neural networks, such as (1) Advantages in the sample complexity over standard two-layer random-feature networks; (2) Concrete and natural classes of functions that can be learned efficiently by a random-feature attention layer; and (3) The effect of the sampling distribution of the query-key weight matrix (the product of the query and key matrix), where Gaussian random weights with a non-zero mean result in better sample complexities over the zero-mean counterpart for learning certain natural target functions. Experiments on simulated data corroborate our theoretical findings and further illustrate the interplay between the sample size and the complexity of the target function.Comment: 41pages, 5 figure

arXiv.org e-Print Archive