562 research outputs found
Experimental Support for a Categorical Compositional Distributional Model of Meaning
Modelling compositional meaning for sentences using empirical distributional
methods has been a challenge for computational linguists. We implement the
abstract categorical model of Coecke et al. (arXiv:1003.4394v1 [cs.CL]) using
data from the BNC and evaluate it. The implementation is based on unsupervised
learning of matrices for relational words and applying them to the vectors of
their arguments. The evaluation is based on the word disambiguation task
developed by Mitchell and Lapata (2008) for intransitive sentences, and on a
similar new experiment designed for transitive sentences. Our model matches the
results of its competitors in the first experiment, and betters them in the
second. The general improvement in results with increase in syntactic
complexity showcases the compositional power of our model.Comment: 11 pages, to be presented at EMNLP 2011, to be published in
Proceedings of the 2011 Conference on Empirical Methods in Natural Language
Processin
Towards Phrase-based Unsupervised Machine Translation: Phrase Representations
Olemasolevad juhendamata masintõlke lähenemised saavutavad küll lootusrikkaid tulemusi, mis on aga halvemad kui juhendatud masintõlke meetodite puhul. Käesolev töö arendab uut fraaside tasemel töötavat lähenemist juhendamata masintõlkele: kuna praegused sõna-põhised lähenemised masintõlkele kasutavad sõnade vektoresitusi, siis selle uue lähenemise juures on vaja vastavaid vektoresitusi fraasidele. Neid esitusi on vaja õppida juhendamatul viisil, arvestades ka fraasidele spetsiifilisi eripärasid nagu mitmesõnalisi väljendeid, ning lisaks peab esituste vektorruum rahuldama teatud nõudeid, et juhendamata masintõlke töötaks. Antud töö defineerib fraasiesituste effektiivsust juhendamata masintõlke kontekstis, loob juhendamata kompositsionaalse modelleerimise raamistiku fraasidele, ning näitab kuidas raamistiku kasutades jõuda effektiivsete fraasiesitusteni.Arendatud skriptid ja treenitud mudelid on jagatud avatud lähtekoodi projektina.Current unsupervised machine translation models despite achieving promising results work quite modestly comparing to the supervised approaches. This work aims to make an important step towards a new research direction of Phrase-based Unsupervised Machine Translation. Since current word-based models rely on representation of words, phrasebased models require appropriate phrase representations. These representations should be learned without supervision, address phrase specific multiword expressions issues, and their embedding space has to follow certain regulations for unsupervised translationto perform reasonable. We specify what makes phrase representations effective in terms of unsupervised machine translation, define unsupervised compositional modeling framework for phrases, and show how to use this framework to satisfy to the proposed requirements thus obtaining effective representations for phrases. We make the code and trained models publicly available as an open source project
A Task-based Evaluation of French Morphological Resources and Tools
Morphology is a key component for many Language Technology applications. However, morphological relations, especially those relying on the derivation and compounding processes, are often addressed in a superficial manner. In this article, we focus on assessing the relevance of deep and motivated morphological knowledge in Natural Language Processing applications. We first describe an annotation experiment whose goal is to evaluate the role of morphology for one task, namely Question Answering (QA). We then highlight the kind of linguistic knowledge that is necessary for this particular task and propose a qualitative analysis of morphological phenomena in order to identify the morphological processes that are most relevant. Based on this study, we perform an intrinsic evaluation of existing tools and resources for French morphology, in order to quantify their coverage. Our conclusions provide helpful insights for using and building appropriate morphological resources and tools that could have a significant impact on the application performance
Compositional Distributional Semantics with Compact Closed Categories and Frobenius Algebras
This thesis contributes to ongoing research related to the categorical
compositional model for natural language of Coecke, Sadrzadeh and Clark in
three ways: Firstly, I propose a concrete instantiation of the abstract
framework based on Frobenius algebras (joint work with Sadrzadeh). The theory
improves shortcomings of previous proposals, extends the coverage of the
language, and is supported by experimental work that improves existing results.
The proposed framework describes a new class of compositional models that find
intuitive interpretations for a number of linguistic phenomena. Secondly, I
propose and evaluate in practice a new compositional methodology which
explicitly deals with the different levels of lexical ambiguity (joint work
with Pulman). A concrete algorithm is presented, based on the separation of
vector disambiguation from composition in an explicit prior step. Extensive
experimental work shows that the proposed methodology indeed results in more
accurate composite representations for the framework of Coecke et al. in
particular and every other class of compositional models in general. As a last
contribution, I formalize the explicit treatment of lexical ambiguity in the
context of the categorical framework by resorting to categorical quantum
mechanics (joint work with Coecke). In the proposed extension, the concept of a
distributional vector is replaced with that of a density matrix, which
compactly represents a probability distribution over the potential different
meanings of the specific word. Composition takes the form of quantum
measurements, leading to interesting analogies between quantum physics and
linguistics.Comment: Ph.D. Dissertation, University of Oxfor
Distributional Tensor Space Model of Natural Language Semantics
We propose a novel Distributional Tensor Space Model of natural language semantics employing 3d order tensors that accounts for order dependent word contexts and assigns to words characteristic matrices such that semantic composition can be realized in a linguistically and cognitively plausible way. The proposed model achieves state-of-the-art results for important tasks of linguistic semantics by using a relatively small text corpus and without any sophisticated preprocessing
Uvid u automatsko izlučivanje metaforičkih kolokacija
Collocations have been the subject of much scientific research over the years. The focus of this research is on a subset of collocations, namely metaphorical collocations. In metaphorical collocations, a semantic shift has taken place in one of the components, i.e., one of the components takes on a transferred meaning. The main goal of this paper is to review the existing literature and provide a systematic overview of the existing research on collocation extraction, as well as the overview of existing methods, measures, and resources. The existing research is classified according to the approach (statistical, hybrid, and distributional semantics) and presented in three separate sections. The insights gained from existing research serve as a first step in exploring the possibility of developing a method for automatic extraction of metaphorical collocations. The methods, tools, and resources that may prove useful for future work are highlighted.Kolokacije su već dugi niz godina tema mnogih znanstvenih istraživanja. U fokusu ovoga istraživanja podskupina je kolokacija koju čine metaforičke kolokacije. Kod metaforičkih je kolokacija kod jedne od sastavnica došlo do semantičkoga pomaka, tj. jedna od sastavnica poprima preneseno značenje. Glavni su ciljevi ovoga rada istražiti postojeću literaturu te dati sustavan pregled postojećih istraživanja na temu izlučivanja kolokacija i postojećih metoda, mjera i resursa. Postojeća istraživanja opisana su i klasificirana prema različitim pristupima (statistički, hibridni i zasnovani na distribucijskoj semantici). Također su opisane različite asocijativne mjere i postojeći načini procjene rezultata automatskoga izlučivanja kolokacija. Metode, alati i resursi koji su korišteni u prethodnim istraživanjima, a mogli bi biti korisni za naš budući rad posebno su istaknuti. Stečeni uvidi u postojeća istraživanja čine prvi korak u razmatranju mogućnosti razvijanja postupka za automatsko izlučivanje metaforičkih kolokacija
- …