30 research outputs found

    Handling non-compositionality in multilingual CNLs

    Full text link
    In this paper, we describe methods for handling multilingual non-compositional constructions in the framework of GF. We specifically look at methods to detect and extract non-compositional phrases from parallel texts and propose methods to handle such constructions in GF grammars. We expect that the methods to handle non-compositional constructions will enrich CNLs by providing more flexibility in the design of controlled languages. We look at two specific use cases of non-compositional constructions: a general-purpose method to detect and extract multilingual multiword expressions and a procedure to identify nominal compounds in German. We evaluate our procedure for multiword expressions by performing a qualitative analysis of the results. For the experiments on nominal compounds, we incorporate the detected compounds in a full SMT pipeline and evaluate the impact of our method in machine translation process.Comment: CNL workshop in COLING 201

    Advantages of Salivary DNA in Human Identification

    Get PDF
    Since two and a half decades, in human identification, the short tandem repeat (STR) markers represent the “gold standard.” Besides them, haploid markers such as X-STR and Y-STR are also used to complement the autosomal markers. In human identification, DNA from body fluids, especially saliva, represents an important tool. The aim of this chapter is to present the importance of analyzing X-STR markers in a relatedness case between a sister and her presumptive brother, a carbonized victim using body fluids for their DNA identification. Our laboratory had to establish the relatedness between a woman and her presumptive brother (PB), who was the victim of a car accident explosion. In this case, as reference sample we used saliva collected on swabs from the woman and blood sample from the deceased victim. For the DNA extraction, DNA IQ Casework (Promega, USA) was used. DNA quantification was done with PowerQuant System kit (Promega, USA). Furthermore, the DNA samples were amplified with Investigator 24plex QS (Qiagen, Germany) for the STR markers and Investigator Argus 12-X QS kit (Qiagen, Germany) for the X-STR markers. The amplified DNA products were separated by capillary electrophoresis on a 3500 Genetic Analyzer. In this case, full genetic profiles were obtained for the woman and her presumptive brother on both STR and X-STR markers. Thus, we could confirm a full sibling relationship between them. Since the introduction of DNA in human identification, it represents a useful tool in establishing sibling relationship from different biological samples

    A hybrid system for patent translation

    Get PDF
    This work presents a HMT system for patent translation. The system exploits the high coverage of SMT and the high precision of an RBMT system based on GF to deal with specific issues of the language. The translator is specifically developed to translate patents and it is evaluated in the English-French language pair. Although the number of issues tackled by the grammar are not extremely numerous yet, both manual and automatic evaluations consistently show their preference for the hybrid system in front of the two individual translators.Peer ReviewedPostprint (published version

    Patent translation within the MOLTO project

    Get PDF
    MOLTO is an FP7 European project whose goal is to translate texts between multiple languages in real time with high quality. Patents translation is a case of study where research is focused on simultaneously obtaining a large coverage without loosing quality in the translation. This is achieved by hybridising between a grammar-based multilingual translation system, GF, and a specialised statistical machine translation system. Moreover, both individual systems by themselves already represent a step forward in the translation of patents in the biomedical domain, for which the systems have been trained.Peer ReviewedPostprint (published version

    MT techniques in a retrieval system of semantically enriched patents

    Get PDF
    This paper focuses on how automatic translation techniques integrated in a patent retrieval system increase its capabilities and make possible extended features and functionalities. We describe 1) a novel methodology for natural language to SPARQL translation based on a grammar– ontology interoperability automation and a query grammar for the patents domain; 2) a devised strategy for statisticalbased translation of patents that allows to transfer semantic annotations to the target language; 3) a built-in knowledge representation infrastructure that uses multilingual semantic annotations; and 4) an online application that offers a multilingual search interface over structural knowledge databases (domain ontologies) and multilingual documents (biomedical patents) that have been automatically translated.Peer ReviewedPostprint (published version

    Frontiers of Multilingual Grammar Development

    Get PDF
    The thesis explores a number of ways for developing multilingual grammars written in GF (Grammatical Framework). The goal is to enhance both the coverage of the grammars, in terms of content and number of languages, and to reduce the development effort by automating a larger part of the process. The first direction in grammar development targets the creation of general language resources. These are the starting point for building domain-specific grammars for the language. Developing resource grammars gives a good overview of the effort required and provides a solid base for subsequent experiments in automation. Our work resulted in building computational grammars for Romanian and Swedish. A further development step is multilingual domain-specific grammar creation. The technique we employed is converting structured models into grammars, which preserves the original structure of the model as a backbone of the grammar and uses the general GF resources for a smooth multilingual verbalization of the model. The use cases considered are an upper-domain ontology, a business model and an ontology describing cultural heritage artefacts, each posing a different challenge and illustrating another aspect of the GF grammars-ontology interoperability and its advantages. An orthogonal approach to multilingual grammar development aims at increasing the number of languages from a domain grammar. Our solution is an example-based prototype which partially replaces grammar programming with feedback from native informants and SMT tools (such as Google Translate). Last but not least, as an attempt to not only enhance GF grammars, but also use them in a novel way, we present the grammar-based hybrid system architecture combining GF grammars and SMT systems. This marks some of the first steps in using grammars for translating free text. As a side-effect of the work, we propose a technique for building bilingual GF lexicon resources from SMT phrase tables

    Reasoning and Language Generation in the SUMO Ontology

    No full text
    We describe the representation of SUMO(Suggested Upper-Merged Ontology)in GF(Grammatical Framework). SUMO is the largest open-source ontology, describing over 10,000 concepts and the relations between them. In addition to this, there are axioms that specify the behaviour of relations and the connections between various concepts. The languages that are widely used for encoding ontologies do not have a type system and have mainly descriptive purpose. For checking the consistency of ontologies or generating natural language, other tools are used. GF is a grammar formalism with support for dependent types, and has built-in support for natural language generation and multilingual translation for 16 languages. The benefits of the translation of SUMO to GF are the possibility to perform type-checking on the content of the ontology, and the generation of syntactically correct natural language. The representation of SUMO uses dependent types for flexibility and better control of semantic actions. The current work provides algorithms for type inference and type checking of the translated axioms. From the concepts, relations and axioms from SUMO, we generate constructions in natural language for English, Romanian and French. The resulting GF files are further more translated to a first-order logic format, TPTP-FOF and checked for consistency with an automated theorem prover. The resulting set of axioms can be used for making inferences. The representation of SUMO in GF preserves the expressivity of the original ontology, adding to this the advantages of a type system and built-in support for natural language generation

    Typeful Ontologies with Direct Multilingual Verbalization

    No full text
    We have developed a methodology for the representation of ontologies in a strictly typed language with dependent types. The methodology is supported by an experiment where we translated SUMO (Suggested Upper-Merged Ontology) to GF (Grammatical Framework). The representation of SUMO in GF preserves the expressivity of the original ontology, adding to this the advantages of a type system and built-in support for natural language generation. SUMO is the largest open-source ontology describing over 10,000 concepts and the relations between them, along with a number of first-order axioms, which are further on used in performing automated reasoning on the ontology. GF is a type-theoretical grammar formalism mainly used for natural language applications. Through the logical framework that it incorporates, GF allows a consistent ontology representation, and thanks to its grammatical features the ontology is directly verbalized in a number of controlled natural languages
    corecore