20 research outputs found

    EBMT: Example Based Machine Translation

    Get PDF

    T2O - Recycling thesauri into a multilingual ontology

    Get PDF
    In this article we present T-2O - a workbench to assist the process of translating heterogeneous resources into ontologies, to enrich and add multilingual information, to help programming with them, and to support ontology publishing. T - 2O is an ontology algebra.(undefined

    Makefile : parallel dependency specification language

    Get PDF
    Some processes are not easy to be programmed from scratch for parallel machines (clusters), but can be easily split on simple steps. Makefile::Parallel is a tool which lets users to specify how processes depend on each other. The language syntax resembles the well known Makefile makefiles format, but instead of specifying files or targets dependencies, Makefile::Parallel specifies processes (or jobs) dependencies. The scheduler submits jobs to the cluster scheduler (in our case, Rocks PBS) waiting them to end. When each process finishes, dependencies are calculated and direct dependent jobs are submitted. Makefile::Parallel language includes features to specify parametric rules, used to split and join processes dependencies. Some tasks can be split into n smaller jobs working on different portions of files. At the end, another process can be used to join the results.Partially supported by grant POSI/PLP/43931/2001 from Fundacao para a Cienciae Tecnologia (Portugal), co-financed by POSI

    Automatic parallel corpora and bilingual terminology extraction from parallel WebSites

    Get PDF
    In our days, the notion, the importance and the significance of parallel corpora is so big that needs no special introduction. Unfortunately, public available parallel corpora is somewhat limited in range. There are big corpora about politics or legislation, about medicine and other specific areas, but we miss corpora for other different areas. Currently there is a huge investment on using the Web as a corpus. This article uncovers GWB, a tool that aims automatic construction of parallel corpora from the web. We defend that it is possible to build high quality terminological corpora in an automatic fashion, just by specifying a sensible Internet domain and using an appropriate set of seed keywords. GWB is a web-spider that works in conjunction with a set of other Open-Source tools, defining a pipeline that includes the documents retrieval from the web, alignment at sentence level and its quality analysis, bilingual dictionaries and terminology extraction and construction of off-line dictionaries

    An overview of portuguese wordnets

    Get PDF
    Semantic relations between words are key to building systems that aim to understand and manipulate language. For En- glish, the “de facto” standard for representing this kind of knowledge is Princeton’s WordNet. Here, we describe the wordnet-like resources currently available for Portuguese: their origins, methods of creation, sizes, and usage restrictions. We start tackling the problem of comparing them, but only in quantitative terms. Finally, we sketch ideas for potential collaboration between some of the projects.(undefined

    Extração de combinaçÔes lexicais restritas pela deteção da não composionalidade de expressÔes pluriverbais

    Get PDF
    In this article an evaluation of a method for extracting restricted lexical combinations from parallel corpora by detecting non-compositionality of multiword expressions in translation will be presented. This method presupposes that by finding sequences of words whose translation does not follow a simple word-to-word conversion of the component words, a collocation is probably present. Word bigrams are used.Neste artigo apresentamos uma avaliação sobre um mĂ©todo para extrair combinaçÔes lexicais restritas a partir de corpora paralelos, pela deteção da nĂŁo composicionalidade de expressĂ”es pluriverbais na tradução. Este mĂ©todo baseia-se na presunção de que, encontrando sequĂȘncias de palavras cuja tradução nĂŁo siga a tradução palavra por palavra dos seus componentes, Ă© provĂĄvel estar-se perante uma colocação. SĂŁo usadas palavras brigrama.info:eu-repo/semantics/publishedVersio

    Portuguese-English word alignment: Some experiments

    Get PDF
    In this paper we describe some studies of Portuguese-English word alignment, focusing on (i) measuring the importance of the coupling between dictionaries and corpus; (ii) assessing the relevance of using syntactic information (POS and lemma) or just word forms, and (iii) taking into account the direction of translation. We first provide some motivation for the studies, as well as insist in separating type from token alignment. We then briefly describe the resources employed: the EuroParl and COMPARA corpora, and the alignment tools, NATools, introducing some measures to evaluate the two kinds of dictionaries obtained. We then present the results of several experiments, comparing sizes, overlap, translation fertility and alignment density of the several bilingual resources built. We also describe preliminary data as far as quality of the resulting dictionaries or alignment results is concernedThis work was done in the scope of the Linguateca project, contract no. 339/1.3/C/NAC, jointly funded by the Portuguese government and the European Union. We thank Jose Joao Dias de Almeida for relevant comments during the development of these tools
    corecore