147,749 research outputs found
Recommended from our members
Minimally supervised induction of morphology through bitexts
textA knowledge of morphology can be useful for many natural language processing systems. Thus, much effort has been expended in developing accurate computational tools for morphology that lemmatize, segment and generate new forms. The most powerful and accurate of these have been manually encoded, such endeavors being without exception expensive and time-consuming. There have been consequently many attempts to reduce this cost in the development of morphological systems through the development of unsupervised or minimally supervised algorithms and learning methods for acquisition of morphology. These efforts have yet to produce a tool that approaches the performance of manually encoded systems.
Here, I present a strategy for dealing with morphological clustering and segmentation in a minimally supervised manner but one that will be more linguistically informed than previous unsupervised approaches. That is, this study will attempt to induce clusters of words from an unannotated text that are inflectional variants of each other. Then a set of inflectional suffixes by part-of-speech will be induced from these clusters. This level of detail is made possible by a method known as alignment and transfer (AT), among other names, an approach that uses aligned bitexts to transfer linguistic resources developed for one languageâthe source languageâto another languageâthe target. This approach has a further advantage in that it allows a reduction in the amount of training data without a significant degradation in performance making it useful in applications targeted at data collected from endangered languages. In the current study, however, I use English as the source and German as the target for ease of evaluation and for certain typlogical properties of German. The two main tasks, that of clustering and segmentation, are approached as sequential tasks with the clustering informing the segmentation to allow for greater accuracy in morphological analysis.
While the performance of these methods does not exceed the current roster of unsupervised or minimally supervised approaches to morphology acquisition, it attempts to integrate more learning methods than previous studies. Furthermore, it attempts to learn inflectional morphology as opposed to derivational morphology, which is a crucial distinction in linguistics.Linguistic
References to graphical objects in interactive multimodel queries
This thesis describes a computational model for interpreting natural language expressions in an interactive multimodal query system integrating both natural language text
and graphic displays. The primary concern of the model is to interpret expressions that
might involve graphical attributes, and expressions whose referents could be objects
on the screen.Graphical objects on the screen are used to visualise entities in the application domain
and their attributes (in short, domain entities and domain attributes). This is why
graphical objects are treated as descriptions of those domain entities/attributes in
the literature. However, graphical objects and their attributes are visible during the
interaction, and are thus known by the participants of the interaction. Therefore, they
themselves should be part of the mutual knowledge of the interaction.This poses some interesting problems in language processing. As part of the mutual
knowledge, graphical attributes could be used in expressions, and graphical objects
could be referred to by expressions. In consequence, there could be ambiguities about
whether an attribute in an expression belongs to a graphical object or to a domain
entity. There could also be ambiguities about whether the referent of an expression is
a graphical object or a domain entity.The main contributions of this thesis consist of analysing the above ambiguities, deÂŹ
signing, implementing and testing a computational model and a demonstration system
for resolving these ambiguities. Firstly, a structure and corresponding terminology are
set up, so these ambiguities can be clarified as ambiguities derived from referring to
different databases, the screen or the application domain (source ambiguities). Secondly, a meaning representation language is designed which explicitly represents the
information about which database an attribute/entity comes from. Several linguistic
regularities inside and among referring expressions are described so that they can be
used as heuristics in the ambiguity resolution. Thirdly, a computational model based
on constraint satisfaction is constructed to resolve simultaneously some reference ambiguities and source ambiguities. Then, a demonstration system integrating natural
language text and graphics is implemented, whose core is the computational model.This thesis ends with an evaluation of the computational model. It provides some
concrete evidence about the advantages and disadvantages of the above approach
Automatic indexing of multimedia documents as a starting point to annotation process
Automatic text analysis widened the perspective of work on document contents by opening up the studies on the linguistic productions. In this case, we are using annotation as a case study. In our approach, annotation is defined as textual, graphic or sound information, attached to document source (text, photo, audio sequence or video sequence : multimedia). The source of our corpus is from INA databases (ie. Institut National de l'Audiovisuel, Paris). Our research task consisted of identifying what are the appropriate characteristics of a multimedia document, its context and information retrieval in the context of natural language processing (NLP), automatic indexing and knowledge representation. We discuss the crucial role of annotation process in the Knowledge Extraction tools and Management as well as in the design of Information Retrieval Systems. Our focus is more specifically on the new approach in information system design dedicated to âeconomic intelligenceâ
COGNITIVE LINGUISTICS AS A METHODOLOGICAL PARADIGM
A general direction in which cognitive linguistics is heading at the turn of the century is outlined and a revised understanding of cognitive linguistics as a methodological paradigm is suggest. The goal of cognitive linguistics is defined as understanding what language is and what language does to ensure the predominance of homo sapiens as a biological species. This makes cognitive linguistics a biologically oriented empirical science
Domain transfer for deep natural language generation from abstract meaning representations
Stochastic natural language generation systems that are trained from labelled datasets are often domainspecific in their annotation and in their mapping from semantic input representations to lexical-syntactic outputs. As a result, learnt models fail to generalize across domains, heavily restricting their usability beyond single applications. In this article, we focus on the problem of domain adaptation for natural language generation. We show how linguistic knowledge from a source domain, for which labelled data is available, can be adapted to a target domain by reusing training data across domains. As a key to this, we propose to employ abstract meaning representations as a common semantic representation across domains. We model natural language generation as a long short-term memory recurrent neural network encoderdecoder, in which one recurrent neural network learns a latent representation of a semantic input, and a second recurrent neural network learns to decode it to a sequence of words. We show that the learnt representations can be transferred across domains and can be leveraged effectively to improve training on new unseen domains. Experiments in three different domains and with six datasets demonstrate that the lexical-syntactic constructions learnt in one domain can be transferred to new domains and achieve up to 75-100% of the performance of in-domain training. This is based on objective metrics such as BLEU and semantic error rate and a subjective human rating study. Training a policy from prior knowledge from a different domain is consistently better than pure in-domain training by up to 10%
Introduction to the special issue on cross-language algorithms and applications
With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of
Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special
issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment
analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version
- âŠ