Search CORE

182,148 research outputs found

On simple matrix languages versus scattered context languages

Author
Publication venue: 'EDP Sciences'
Publication date: 01/01/1982
Field of study

An approach to computing downward closures

Author: A Ehrenfeucht
AN Maslov
AV Aho
B Courcelle
H Gruber
H Seki
J Dassow
J Dassow
J Leeuwen van
JE Hopcroft
LH Haines
M Jantzen
P Habermehl
PA Abdulla
R Mayr
RH Gilman
T Hayashi
T Smith
Publication venue
Publication date: 01/06/2015
Field of study

The downward closure of a word language is the set of all (not necessarily contiguous) subwords of its members. It is well-known that the downward closure of any language is regular. While the downward closure appears to be a powerful abstraction, algorithms for computing a finite automaton for the downward closure of a given language have been established only for few language classes. This work presents a simple general method for computing downward closures. For language classes that are closed under rational transductions, it is shown that the computation of downward closures can be reduced to checking a certain unboundedness property. This result is used to prove that downward closures are computable for (i) every language class with effectively semilinear Parikh images that are closed under rational transductions, (ii) matrix languages, and (iii) indexed languages (equivalently, languages accepted by higher-order pushdown automata of order 2).Comment: Full version of contribution to ICALP 2015. Comments welcom

arXiv.org e-Print Archive

Crossref

On the Reproducibility and Generalisation of the Linear Transformation of Word Embeddings

Author: Fang Anjie
Macdonald Craig
McCreadie Richard
Ounis Iadh
Yang Xiao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/12/2017
Field of study

Linear transformation is a way to learn a linear relationship between two word embeddings, such that words in the two different embedding spaces can be semantically related. In this paper, we examine the reproducibility and generalisation of the linear transformation of word embeddings. Linear transformation is particularly useful when translating word embedding models in different languages, since it can capture the semantic relationships between two models. We first reproduce two linear transformation approaches, a recent one using orthogonal transformation and the original one using simple matrix transformation. Previous findings on a machine translation task are re-examined, validating that linear transformation is indeed an effective way to transform word embedding models in different languages. In particular, we show that the orthogonal transformation can better relate the different embedding models. Following the verification of previous findings, we then study the generalisation of linear transformation in a multi-language Twitter election classification task. We observe that the orthogonal transformation outperforms the matrix transformation. In particular, it significantly outperforms the random classifier by at least 10% under the F1 metric across English and Spanish datasets. In addition, we also provide best practices when using linear transformation for multi-language Twitter election classification

Enlighten: Research Data (University of Glasgow)

Enlighten

Matrix Languages, Register Machines, Vector Addition Systems

Author: Freund Rudolf
Ibarra Óscar H.
Paun Gheorghe
Yen Hsu-Chen
Publication venue: Fénix Editora
Publication date: 01/01/2005
Field of study

We give a direct and simple proof of the equality of Parikh images of lan- guages generated by matrix grammars with appearance checking with the sets of vectors generated by register machines. As a particular case, we get the equality of the Parikh images of languages generated by matrix grammars without appearance checking with the sets of vectors generated by partially blind register machines. Then, we consider pure matrix grammars (i.e., grammars which do not distinguish terminal and nonterminal symbols), and prove the inclusion of the family of Parikh images of languages generated by such grammars (without appearance checking) in the family of sets of vectors generated by blind register machines, as well as the inclusion of reachability sets of vector addition systems in the family of Parikh images of pure matrix languages. For pure matrix grammars with a certain restriction on the form of matrices, also the converse of the latter inclusion is obtained. Thus, in view of the result from, we obtain the semilin- earity of languages generated by pure matrix grammars (without appearance checking) with alphabets with at most five letters, with the considered restrictions on the form of matrices. A pure matrix grammar with five symbols, but without restrictions on the form of matrices, is produced which generates a non-semilinear language

idUS. Depósito de Investigación Universidad de Sevilla

Computationally efficient min-max MPC

Author: Alamo Teodoro
Camacho Eduardo F.
Rodríguez Ramírez Daniel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

2005 IFAC 16th Triennial World Congress, Prague, Czech RepublicMin-Max MPC (MMMPC) controllers (Campo and Morari, 1987) suffer from a great computational burden that is often circumvented by using upper bounds of the worst possible case of a performance index. These upper bounds are usually computed by means of LMI techniques. In this paper a more efficient approach is shown. This paper proposes a computationally efficient MMMPC control strategy in which the worst case cost is approximated by an upper bound which can be easily computed using simple matrix operations. This implies that the algorithm can be coded easily even in non mathematical oriented programming languages such as those found in industrial embedded control hardware. Simulation examples are given in the paper

idUS. Depósito de Investigación Universidad de Sevilla

Min-Max MPC based on a computationally efficient upper bound of the worst case cost

Author: Alamo Teodoro
Camacho Eduardo F.
Muñoz de la Peña Sequedo David
Rodríguez Ramírez Daniel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

Min-Max MPC (MMMPC) controllers [P.J. Campo, M. Morari, Robust model predictive control, in: Proc. American Control Conference, June 10–12, 1987, pp. 1021–1026] suffer from a great computational burden which limits their applicability in the industry. Sometimes upper bounds of the worst possible case of a performance index have been used to reduce the computational burden. This paper proposes a computationally efficient MMMPC control strategy in which the worst case cost is approximated by an upper bound based on a diagonalization scheme. The upper bound can be computed with O(n3) operations and using only simple matrix operations. This implies that the algorithm can be coded easily even in non-mathematical oriented programming languages such as those found in industrial embedded control hardware. A simulation example is given in the paper

idUS. Depósito de Investigación Universidad de Sevilla

Embedding structure matters: Comparing methods to adapt multilingual vocabularies to new languages

Author: Blevins Terra
Downey C. M.
Goldfine Nora
Steinert-Threlkeld Shane
Publication venue
Publication date: 09/09/2023
Field of study

Pre-trained multilingual language models underpin a large portion of modern NLP tools outside of English. A strong baseline for specializing these models for specific languages is Language-Adaptive Pre-Training (LAPT). However, retaining a large cross-lingual vocabulary and embedding matrix comes at considerable excess computational cost during adaptation. In this study, we propose several simple techniques to replace a cross-lingual vocabulary with a compact, language-specific one. Namely, we address strategies for re-initializing the token embedding matrix after vocabulary specialization. We then provide a systematic experimental comparison of our techniques, in addition to the recently-proposed Focus method. We demonstrate that: 1) Embedding-replacement techniques in the monolingual transfer literature are inadequate for adapting multilingual models. 2) Replacing cross-lingual vocabularies with smaller specialized ones provides an efficient method to improve performance in low-resource languages. 3) Simple embedding re-initialization techniques based on script-wise sub-distributions rival techniques such as Focus, which rely on similarity scores obtained from an auxiliary model

arXiv.org e-Print Archive