Search CORE

55,999 research outputs found

Generating natural language specifications from UML class diagrams

Author: A Abbott
AV Gervasi
CL Heitmeyer
E Brill
E Goldberg
Farid Meziane
G Booch
HM Harmain
K Walden
L Goldin
L Mich
MD Lubars
Nikos Athanasakis
P Martin-Löf
PPS Chen
PPS Chen
Sophia Ananiadou
SW Ambler
W Ahrendt
WC Mann
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Early phases of software development are known to be problematic, difficult to manage and errors occurring during these phases are expensive to correct. Many systems have been developed to aid the transition from informal Natural Language requirements to semistructured or formal specifications. Furthermore, consistency checking is seen by many software engineers as the solution to reduce the number of errors occurring during the software development life cycle and allow early verification and validation of software systems. However, this is confined to the models developed during analysis and design and fails to include the early Natural Language requirements. This excludes proper user involvement and creates a gap between the original requirements and the updated and modified models and implementations of the system. To improve this process, we propose a system that generates Natural Language specifications from UML class diagrams. We first investigate the variation of the input language used in naming the components of a class diagram based on the study of a large number of examples from the literature and then develop rules for removing ambiguities in the subset of Natural Language used within UML. We use WordNet,a linguistic ontology, to disambiguate the lexical structures of the UML string names and generate semantically sound sentences. Our system is developed in Java and is tested on an independent though academic case study

CiteSeerX

University of Salford Institutional Repository

Crossref

The University of Manchester - Institutional Repository

Compositional Morphology for Word Representations and Language Modelling

Author: Blunsom Phil
Botha Jan A.
Publication venue
Publication date: 01/01/2014
Field of study

This paper presents a scalable method for integrating compositional morphological representations into a vector-based probabilistic language model. Our approach is evaluated in the context of log-bilinear language models, rendered suitably efficient for implementation inside a machine translation decoder by factoring the vocabulary. We perform both intrinsic and extrinsic evaluations, presenting results on a range of languages which demonstrate that our model learns morphological representations that both perform well on word similarity tasks and lead to substantial reductions in perplexity. When used for translation into morphologically rich languages with large vocabularies, our models obtain improvements of up to 1.2 BLEU points relative to a baseline system using back-off n-gram models.Comment: Proceedings of the 31st International Conference on Machine Learning (ICML

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

On the resolution of ambiguities in the extraction of syntactic categories through chunking

Author: Freudenthal D
Gobet F
Pine JM
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

In recent years, several authors have investigated how co-occurrence statistics in natural language can act as a cue that children may use to extract syntactic categories for the language they are learning. While some authors have reported encouraging results, it is difficult to evaluate the quality of the syntactic categories derived. It is argued in this paper that traditional measures of accuracy are inherently flawed. A valid evaluation metric needs to consider the wellformedness of utterances generated through a production end. This paper attempts to evaluate the quality of the categories derived from co-occurrence statistics through the use of MOSAIC, a computational model of syntax acquisition that has already been used to simulate several phenomena in child language. It is shown that derived syntactic categories that may appear to be of high quality quickly give rise to errors that are not typical of child speech. A solution to this problem is suggested in the form of a chunking mechanism that serves to differentiate between alternative grammatical functions of identical word forms. Results are evaluated in terms of the error rates in utterances produced by the system as well as the quantitative fit to the phenomenon of subject omission

CiteSeerX

Brunel University Research Archive

Coordinate noun phrase disambiguation in a generative parsing model

Author: Hogan Deirdre
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2007
Field of study

In this paper we present methods for improving the disambiguation of noun phrase (NP) coordination within the framework of a lexicalised history-based parsing model. As well as reducing noise in the data, we look at modelling two main sources of information for disambiguation: symmetry in conjunct structure, and the dependency between conjunct lexical heads. Our changes to the baseline model result in an increase in NP coordination dependency f-score from 69.9% to 73.8%, which represents a relative reduction in f-score error of 13%

DCU Online Research Access Service

Recommended from our members

Resolving ambiguities in the extraction of syntactic categories through chunking.

Author: Freudenthal D
Gobet F
Pine J M
Publication venue: Proceedings of the Sixth International Conference on Cognitive Modeling
Publication date: 01/01/2004
Field of study

In recent years, several authors have investigated how co-occurrence statistics in natural language can act as a cue that children may use to extract syntactic categories for the language they are learning. While some authors have reported encouraging results, it is difficult to evaluate the quality of the syntactic categories derived. It is argued in this paper that traditional measures of accuracy are inherently flawed. A valid evaluation metric needs to consider the well-formedness of utterances generated through a production end. This paper attempts to evaluate the quality of the categories derived from cooccurrence statistics through the use of MOSAIC, a computational model of syntax acquisition that has already been used to simulate several phenomena in child language. It will be shown that derived syntactic categories which may appear to be of high quality will quickly give rise to errors which are not typical of child speech. A solution to this problem is suggested in the form of a chunking mechanism which serves to differentiate between alternative grammatical functions of identical word forms. Results are evaluated in terms of the error rates in utterances produced by the system as well as the quantitative fit to the phenomenon of subject omission

Brunel University Research Archive

Prosodic Event Recognition using Convolutional Neural Networks with Context Information

Author: Stehwien Sabrina
Vu Ngoc Thang
Publication venue
Publication date: 02/06/2017
Field of study

This paper demonstrates the potential of convolutional neural networks (CNN) for detecting and classifying prosodic events on words, specifically pitch accents and phrase boundary tones, from frame-based acoustic features. Typical approaches use not only feature representations of the word in question but also its surrounding context. We show that adding position features indicating the current word benefits the CNN. In addition, this paper discusses the generalization from a speaker-dependent modelling approach to a speaker-independent setup. The proposed method is simple and efficient and yields strong results not only in speaker-dependent but also speaker-independent cases.Comment: Interspeech 2017 4 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Data-driven Extraction of Intonation Contour Classes

Author: Reichel Uwe D.
Publication venue
Publication date: 01/01/2007
Field of study

In this paper we introduce the first steps towards a new datadriven method for extraction of intonation events that does not require any prerequisite prosodic labelling. Provided with data segmented on the syllable constituent level it derives local and global contour classes by stylisation and subsequent clustering of the stylisation parameter vectors. Local contour classes correspond to pitch movements connected to one or several syllables and determine the local f0 shape. Global classes are connected to intonation phrases and determine the f0 register. Local classes initially are derived for syllabic segments, which are then concatenated incrementally by means of statistical language modelling of co-occurrence patterns. Due to its generality the method is in principal language independent and potentially capable to deal also with other aspects of prosody than intonation. 1

CiteSeerX

Open Access LMU

The impact of morphological errors in phrase-based statistical machine translation from German and English into Swedish

Author: Täckström Oscar
Publication venue
Publication date: 01/01/2009
Field of study

We have investigated the potential for improvement in target language morphology when translating into Swedish from English and German, by measuring the errors made by a state of the art phrase-based statistical machine translation system. Our results show that there is indeed a performance gap to be filled by better modelling of inflectional morphology and compounding; and that the gap is not filled by simply feeding the translation system with more training data

Publikationer från Uppsala Universitet

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Linguistic factors affecting timing in Korean with application to speech synthesis

Author: Chung H
Huckvale M
Publication venue
Publication date: 01/01/2001
Field of study

UCL Discovery