4,041 research outputs found

    Generating natural language specifications from UML class diagrams

    Get PDF
    Early phases of software development are known to be problematic, difficult to manage and errors occurring during these phases are expensive to correct. Many systems have been developed to aid the transition from informal Natural Language requirements to semistructured or formal specifications. Furthermore, consistency checking is seen by many software engineers as the solution to reduce the number of errors occurring during the software development life cycle and allow early verification and validation of software systems. However, this is confined to the models developed during analysis and design and fails to include the early Natural Language requirements. This excludes proper user involvement and creates a gap between the original requirements and the updated and modified models and implementations of the system. To improve this process, we propose a system that generates Natural Language specifications from UML class diagrams. We first investigate the variation of the input language used in naming the components of a class diagram based on the study of a large number of examples from the literature and then develop rules for removing ambiguities in the subset of Natural Language used within UML. We use WordNet,a linguistic ontology, to disambiguate the lexical structures of the UML string names and generate semantically sound sentences. Our system is developed in Java and is tested on an independent though academic case study

    Information extraction

    Get PDF
    In this paper we present a new approach to extract relevant information by knowledge graphs from natural language text. We give a multiple level model based on knowledge graphs for describing template information, and investigate the concept of partial structural parsing. Moreover, we point out that expansion of concepts plays an important role in thinking, so we study the expansion of knowledge graphs to use context information for reasoning and merging of templates

    Pattern Matching and Discourse Processing in Information Extraction from Japanese Text

    Full text link
    Information extraction is the task of automatically picking up information of interest from an unconstrained text. Information of interest is usually extracted in two steps. First, sentence level processing locates relevant pieces of information scattered throughout the text; second, discourse processing merges coreferential information to generate the output. In the first step, pieces of information are locally identified without recognizing any relationships among them. A key word search or simple pattern search can achieve this purpose. The second step requires deeper knowledge in order to understand relationships among separately identified pieces of information. Previous information extraction systems focused on the first step, partly because they were not required to link up each piece of information with other pieces. To link the extracted pieces of information and map them onto a structured output format, complex discourse processing is essential. This paper reports on a Japanese information extraction system that merges information using a pattern matcher and discourse processor. Evaluation results show a high level of system performance which approaches human performance.Comment: See http://www.jair.org/ for any accompanying file

    Corpora and evaluation tools for multilingual named entity grammar development

    Get PDF
    We present an effort for the development of multilingual named entity grammars in a unification-based finite-state formalism (SProUT). Following an extended version of the MUC7 standard, we have developed Named Entity Recognition grammars for German, Chinese, Japanese, French, Spanish, English, and Czech. The grammars recognize person names, organizations, geographical locations, currency, time and date expressions. Subgrammars and gazetteers are shared as much as possible for the grammars of the different languages. Multilingual corpora from the business domain are used for grammar development and evaluation. The annotation format (named entity and other linguistic information) is described. We present an evaluation tool which provides detailed statistics and diagnostics, allows for partial matching of annotations, and supports user-defined mappings between different annotation and grammar output formats

    GATE -- an Environment to Support Research and Development in Natural Language Engineering

    Get PDF
    We describe a software environment to support research and development in natural language (NL) engineering. This environment -- GATE (General Architecture for Text Engineering) -- aims to advance research in the area of machine processing of natural languages by providing a software infrastructure on top of which heterogeneous NL component modules may be evaluated and refined individually or may be combined into larger application systems. Thus, GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE will promote reuse of component technology, permit specialisation and collaboration in large-scale projects, and allow for the comparison and evaluation of alternative technologies. The first release of GATE is now available

    Coping with Alternate Formulations of Questions and Answers

    Get PDF
    We present in this chapter the QALC system which has participated in the four TREC QA evaluations. We focus here on the problem of linguistic variation in order to be able to relate questions and answers. We present first, variation at the term level which consists in retrieving questions terms in document sentences even if morphologic, syntactic or semantic variations alter them. Our second subject matter concerns variation at the sentence level that we handle as different partial reformulations of questions. Questions are associated with extraction patterns based on the question syntactic type and the object that is under query. We present the whole system thus allowing situating how QALC deals with variation, and different evaluations

    Neural Reranking for Named Entity Recognition

    Full text link
    We propose a neural reranking system for named entity recognition (NER). The basic idea is to leverage recurrent neural network models to learn sentence-level patterns that involve named entity mentions. In particular, given an output sentence produced by a baseline NER model, we replace all entity mentions, such as \textit{Barack Obama}, into their entity types, such as \textit{PER}. The resulting sentence patterns contain direct output information, yet is less sparse without specific named entities. For example, "PER was born in LOC" can be such a pattern. LSTM and CNN structures are utilised for learning deep representations of such sentences for reranking. Results show that our system can significantly improve the NER accuracies over two different baselines, giving the best reported results on a standard benchmark.Comment: Accepted as regular paper by RANLP 201

    Question Generation for French: Collating Parsers and Paraphrasing Questions

    Get PDF
    This article describes a question generation system for French. The transformation of declarative sentences into questions relies on two different syntactic parsers and named entity recognition tools. This makes it possible to further diversify the questions generated and to possibly alleviate the problems inherent to the analysis tools. The system also generates reformulations for the questions based on variations in the question words, inducing answers with different granularities, and nominalisations of action verbs. We evaluate the questions generated for sentences extracted from two different corpora: a corpus of newspaper articles used for the CLEF Question Answering evaluation campaign and a corpus of simplified online encyclopedia articles. The evaluation shows that the system is able to generate a majority of good and medium quality questions. We also present an original evaluation of the question generation system using the question analysis module of a question answering system
    corecore