Search CORE

5,409 research outputs found

LAF-Fabric: a data analysis tool for Linguistic Annotation Framework with an application to the Hebrew Bible

Author: Kalkman Gino
Naaijer Martijn
Roorda Dirk
van Cranenburgh Andreas
Publication venue
Publication date: 01/01/2014
Field of study

The Linguistic Annotation Framework (LAF) provides a general, extensible stand-off markup system for corpora. This paper discusses LAF-Fabric, a new tool to analyse LAF resources in general with an extension to process the Hebrew Bible in particular. We first walk through the history of the Hebrew Bible as text database in decennium-wide steps. Then we describe how LAF-Fabric may serve as an analysis tool for this corpus. Finally, we describe three analytic projects/workflows that benefit from the new LAF representation: 1) the study of linguistic variation: extract cooccurrence data of common nouns between the books of the Bible (Martijn Naaijer); 2) the study of the grammar of Hebrew poetry in the Psalms: extract clause typology (Gino Kalkman); 3) construction of a parser of classical Hebrew by Data Oriented Parsing: generate tree structures from the database (Andreas van Cranenburgh)

arXiv.org e-Print Archive

MPG.PuRe

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Conversation and critique within the architectural design process: a linkograph analysis

Author: De Meyer Ronald
Derboven Jeroen
Pauwels Pieter
Strobbe Tiemen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Conversation and critique are central to architectural design practice as they function as tools for probing and further improving design ideas. We study the kind of design activities that take place in such conversation and critique within the architectural design process. We use linkographs to characterise the design process taking place during conversation. More precisely, we study conversations between design teachers and design students. In this article, an example design process is considered that takes place via a traditional face-to-face meeting. Using the resulting linkograph, we are able to assess the kind of design activity taking place during such sessions of conversation and critique

Crossref

Ghent University Academic Bibliography

Generating natural language specifications from UML class diagrams

Author: A Abbott
AV Gervasi
CL Heitmeyer
E Brill
E Goldberg
Farid Meziane
G Booch
HM Harmain
K Walden
L Goldin
L Mich
MD Lubars
Nikos Athanasakis
P Martin-Löf
PPS Chen
PPS Chen
Sophia Ananiadou
SW Ambler
W Ahrendt
WC Mann
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Early phases of software development are known to be problematic, difficult to manage and errors occurring during these phases are expensive to correct. Many systems have been developed to aid the transition from informal Natural Language requirements to semistructured or formal specifications. Furthermore, consistency checking is seen by many software engineers as the solution to reduce the number of errors occurring during the software development life cycle and allow early verification and validation of software systems. However, this is confined to the models developed during analysis and design and fails to include the early Natural Language requirements. This excludes proper user involvement and creates a gap between the original requirements and the updated and modified models and implementations of the system. To improve this process, we propose a system that generates Natural Language specifications from UML class diagrams. We first investigate the variation of the input language used in naming the components of a class diagram based on the study of a large number of examples from the literature and then develop rules for removing ambiguities in the subset of Natural Language used within UML. We use WordNet,a linguistic ontology, to disambiguate the lexical structures of the UML string names and generate semantically sound sentences. Our system is developed in Java and is tested on an independent though academic case study

CiteSeerX

University of Salford Institutional Repository

Crossref

The University of Manchester - Institutional Repository

Recommended from our members

Inducing grammars from linguistic universals and realistic amounts of supervision

Author: Garrette Daniel Hunter
Publication venue
Publication date: 20/01/2017
Field of study

The best performing NLP models to date are learned from large volumes of manually-annotated data. For tasks like part-of-speech tagging and grammatical parsing, high performance can be achieved with plentiful supervised data. However, such resources are extremely costly to produce, making them an unlikely option for building NLP tools in under-resourced languages or domains. This dissertation is concerned with reducing the annotation required to learn NLP models, with the goal of opening up the range of domains and languages to which NLP technologies may be applied. In this work, we explore the possibility of learning from a degree of supervision that is at or close to the amount that could reasonably be collected from annotators for a particular domain or language that currently has none. We show that just a small amount of annotation input — even that which can be collected in just a few hours — can provide enormous advantages if we have learning algorithms that can appropriately exploit it. This work presents new algorithms, models, and approaches designed to learn grammatical information from weak supervision. In particular, we look at ways of intersecting a variety of different forms of supervision in complementary ways, thus lowering the overall annotation burden. Sources of information include tag dictionaries, morphological analyzers, constituent bracketings, and partial tree annotations, as well as unannotated corpora. For example, we present algorithms that are able to combine faster-to-obtain type-level annotation with unannotated text to remove the need for slower-to-obtain token-level annotation. Much of this dissertation describes work on Combinatory Categorial Grammar (CCG), a grammatical formalism notable for its use of structured, logic-backed categories that describe how each word and constituent fits into the overall syntax of the sentence. This work shows how linguistic universals intrinsic to the CCG formalism itself can be encoded as Bayesian priors to improve learning.Computer Science

Texas ScholarWorks

Final FLaReNet deliverable: Language Resources for the Future - The Future of Language Resources

Author: Bel N.
Calzolari N.
Choukri Khalid
LS OZ Taal en spraaktechnologie
Mariani J.
Monachini M.
Odijk J.E.J.M.
Piperidis S
Quochi V.
Soria C.
UiL OTS LLI
Publication venue
Publication date: 01/01/2011
Field of study

Language Technologies (LT), together with their backbone, Language Resources (LR), provide an essential support to the challenge of Multilingualism and ICT of the future. The main task of language technologies is to bridge language barriers and to help creating a new environment where information flows smoothly across frontiers and languages, no matter the country, and the language, of origin. To achieve this goal, all players involved need to act as a community able to join forces on a set of shared priorities. However, until now the field of Language Resources and Technology has long suffered from an excess of individuality and fragmentation, with a lack of coherence concerning the priorities for the field, the direction to move, not to mention a common timeframe. The context encountered by the FLaReNet project was thus represented by an active field needing a coherence that can only be given by sharing common priorities and endeavours. FLaReNet has contributed to the creation of this coherence by gathering a wide community of experts and making them participate in the definition of an exhaustive set of recommendations

PUblication MAnagement

Utrecht University Repository

Improving the translation environment for professional translators

Author: Augustinus Liesbeth
Bulté Bram
Buysschaert Joost
Coppers Sven
Daems Joke
Heyman Geert
Hoste Veronique
Lefever Els
Luyten Kris
Macken Lieve
Moens Marie-Francine
Pelemans Joris
Rigouts Terryn Ayla
Steurs Frieda
Tezcan Arda
Van den Bergh Jan
van der Lek-Ciudin Iulianna
Van Eynde Frank
Vanallemeersch Tom
Vandeghinste Vincent
Verwimp Lyan
Wambacq Patrick
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

Multidisciplinary Digital Publishing Institute

Ghent University Academic Bibliography