22 research outputs found
Bivariate Beta-LSTM
Long Short-Term Memory (LSTM) infers the long term dependency through a cell
state maintained by the input and the forget gate structures, which models a
gate output as a value in [0,1] through a sigmoid function. However, due to the
graduality of the sigmoid function, the sigmoid gate is not flexible in
representing multi-modality or skewness. Besides, the previous models lack
modeling on the correlation between the gates, which would be a new method to
adopt inductive bias for a relationship between previous and current input.
This paper proposes a new gate structure with the bivariate Beta distribution.
The proposed gate structure enables probabilistic modeling on the gates within
the LSTM cell so that the modelers can customize the cell state flow with
priors and distributions. Moreover, we theoretically show the higher upper
bound of the gradient compared to the sigmoid function, and we empirically
observed that the bivariate Beta distribution gate structure provides higher
gradient values in training. We demonstrate the effectiveness of bivariate Beta
gate structure on the sentence classification, image classification, polyphonic
music modeling, and image caption generation.Comment: AAAI 202
METRICC: Harnessing Comparable Corpora for Multilingual Lexicon Development
International audienceResearch on comparable corpora has grown in recent years bringing about the possibility of developing multilingual lexicons through the exploitation of comparable corpora to create corpus-driven multilingual dictionaries. To date, this issue has not been widely addressed. This paper focuses on the use of the mechanism of collocational networks proposed by Williams (1998) for exploiting comparable corpora. The paper first provides a description of the METRICC project, which is aimed at the automatically creation of comparable corpora and describes one of the crawlers developed for comparable corpora building, and then discusses the power of collocational networks for multilingual corpus-driven dictionary development
Contrastive Learning with Prompt-derived Virtual Semantic Prototypes for Unsupervised Sentence Embedding
Contrastive learning has become a new paradigm for unsupervised sentence
embeddings. Previous studies focus on instance-wise contrastive learning,
attempting to construct positive pairs with textual data augmentation. In this
paper, we propose a novel Contrastive learning method with Prompt-derived
Virtual semantic Prototypes (ConPVP). Specifically, with the help of prompts,
we construct virtual semantic prototypes to each instance, and derive negative
prototypes by using the negative form of the prompts. Using a prototypical
contrastive loss, we enforce the anchor sentence embedding to be close to its
corresponding semantic prototypes, and far apart from the negative prototypes
as well as the prototypes of other sentences. Extensive experimental results on
semantic textual similarity, transfer, and clustering tasks demonstrate the
effectiveness of our proposed model compared to strong baselines. Code is
available at https://github.com/lemon0830/promptCSE.Comment: Findings of EMNLP 202
Creation and evaluation of large keyphrase extraction collections with multiple opinions
While several automatic keyphrase extraction (AKE) techniques have been developed and analyzed, there is little consensus on the definition of the task and a lack of overview of the effectiveness of different techniques. Proper evaluation of keyphrase extraction requires large test collections with multiple opinions, currently not available for research. In this paper, we (i) present a set of test collections derived from various sources with multiple annotations (which we also refer to as opinions in the remained of the paper) for each document, (ii) systematically evaluate keyphrase extraction using several supervised and unsupervised AKE techniques, (iii) and experimentally analyze the effects of disagreement on AKE evaluation. Our newly created set of test collections spans different types of topical content from general news and magazines, and is annotated with multiple annotations per article by a large annotator panel. Our annotator study shows that for a given document there seems to be a large disagreement on the preferred keyphrases, suggesting the need for multiple opinions per document. A first systematic evaluation of ranking and classification of keyphrases using both unsupervised and supervised AKE techniques on the test collections shows a superior effectiveness of supervised models, even for a low annotation effort and with basic positional and frequency features, and highlights the importance of a suitable keyphrase candidate generation approach. We also study the influence of multiple opinions, training data and document length on evaluation of keyphrase extraction. Our new test collection for keyphrase extraction is one of the largest of its kind and will be made available to stimulate future work to improve reliable evaluation of new keyphrase extractors
S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models
The rapid development of Large Language Models (LLMs) has led to great
strides in model capabilities like reasoning and long-context understanding.
However, as LLMs are able to process longer contexts, it becomes more
challenging to evaluate whether they have acquired certain capabilities, since
the length of text (e.g., 100K tokens) they can process far exceeds what humans
can reliably assess in a reasonable duration. In this paper, we propose using
complex synthetic tasks as a proxy evaluation method, and present S3Eval, a
Synthetic, Scalable, Systematic evaluation suite for LLMs evaluation. As a
synthetic benchmark, S3Eval enables the creation of any number of evaluation
examples that are theoretically invisible to LLMs, mitigating the test set
contamination issue. The synthetic nature of S3Eval provides users full control
over the dataset, allowing them to systematically probe LLM capabilities by
scaling text length and varying task difficulty across diverse scenarios. The
strong correlation between S3Eval performance and scores of real-world
benchmarks like Big-Bench Hard (BBH) demonstrates the soundness of using S3Eval
for evaluation of LLMs. The in-depth analysis also uncover additional insights,
including performance drop when the answer is sparsely distributed or located
in the middle context, as well as some counter-intuitive trends of model
performance.Comment: Work in progres
Representation Learning for Texts and Graphs: A Unified Perspective on Efficiency, Multimodality, and Adaptability
[...] This thesis is situated between natural language processing and graph representation learning and investigates selected connections. First, we introduce matrix embeddings as an efficient text representation sensitive to word order. [...] Experiments with ten linguistic probing tasks, 11 supervised, and five unsupervised downstream tasks reveal that vector and matrix embeddings have complementary strengths and that a jointly trained hybrid model outperforms both. Second, a popular pretrained language model, BERT, is distilled into matrix embeddings. [...] The results on the GLUE benchmark show that these models are competitive with other recent contextualized language models while being more efficient in time and space. Third, we compare three model types for text classification: bag-of-words, sequence-, and graph-based models. Experiments on five datasets show that, surprisingly, a wide multilayer perceptron on top of a bag-of-words representation is competitive with recent graph-based approaches, questioning the necessity of graphs synthesized from the text. [...] Fourth, we investigate the connection between text and graph data in document-based recommender systems for citations and subject labels. Experiments on six datasets show that the title as side information improves the performance of autoencoder models. [...] We find that the meaning of item co-occurrence is crucial for the choice of input modalities and an appropriate model. Fifth, we introduce a generic framework for lifelong learning on evolving graphs in which new nodes, edges, and classes appear over time. [...] The results show that by reusing previous parameters in incremental training, it is possible to employ smaller history sizes with only a slight decrease in accuracy compared to training with complete history. Moreover, weighting the binary cross-entropy loss function is crucial to mitigate the problem of class imbalance when detecting newly emerging classes. [...
Grammatical theory
This book introduces formal grammar theories that play a role in current linguistic theorizing (Phrase Structure Grammar, Transformational Grammar/Government & Binding, Generalized Phrase Structure Grammar, Lexical Functional Grammar, Categorial Grammar, Head-Driven Phrase Structure Grammar, Construction Grammar, Tree Adjoining Grammar). The key assumptions are explained and it is shown how the respective theory treats arguments and adjuncts, the active/passive alternation, local reorderings, verb placement, and fronting of constituents over long distances. The analyses are explained with German as the object language.
The second part of the book compares these approaches with respect to their predictions regarding language acquisition and psycholinguistic plausibility. The nativism hypothesis, which assumes that humans posses genetically determined innate language-specific knowledge, is critically examined and alternative models of language acquisition are discussed. The second part then addresses controversial issues of current theory building such as the question of flat or binary branching structures being more appropriate, the question whether constructions should be treated on the phrasal or the lexical level, and the question whether abstract, non-visible entities should play a role in syntactic analyses. It is shown that the analyses suggested in the respective frameworks are often translatable into each other. The book closes with a chapter showing how properties common to all languages or to certain classes of languages can be captured.
The book is a translation of the German book Grammatiktheorie, which was published by Stauffenburg in 2010. The following quotes are taken from reviews:
With this critical yet fair reflection on various grammatical theories, Müller fills what was a major gap in the literature. Karen Lehmann, Zeitschrift für Rezensionen zur germanistischen Sprachwissenschaft, 2012
Stefan Müller’s recent introductory textbook, Grammatiktheorie, is an astonishingly comprehensive and insightful survey for beginning students of the present state of syntactic theory. Wolfgang Sternefeld und Frank Richter, Zeitschrift für Sprachwissenschaft, 2012
This is the kind of work that has been sought after for a while [...] The impartial and objective discussion offered by the author is particularly refreshing. Werner Abraham, Germanistik, 201