Search CORE

21 research outputs found

UIMA in the Biocuration Workflow: A coherent framework for cooperation between biologists and computational linguists

Author: Bart Mellebeek
Carlos Rodriguez-Penagos
Laura Ines Furlong
Publication venue
Publication date: 24/04/2009
Field of study

As collaborating partners, Barcelona Media Innovation Centre and GRIB (Universitat Pompeu Fabra) seek to combine strengths from Computational Linguistics and Biomedicine to produce a robust Text Mining system to generate data that will help biocurators in their daily work. The first version of this system will focus on the discovery of relationships between genes, SNPs (Single Nucleotide Polymorphisms) and diseases from the literature.

A first challenge that we were faced with during the setup of this project is the fact that most current tools that support the curation workflow are complex, ad-hoc built applications which sometimes make difficult the interoperability and results sharing between research groups from different and unrelated expert fields. Often, biologists (even computer-savvy ones) are hard pressed to use and adapt sophisticated Natural Language Processing systems, and computational linguists are challenged by the intricacies of biology in applying their processing pipelines to elicit knowledge from texts. The flow of knowledge (needed to develop a usable, practical tool) to and from the parties involved in the development of such systems is not always easy or straightforward.

The modular and versatile architecture of UIMA (Unstructed Information Management Architecture) provides a framework to address these challenges. UIMA is a component architecture and software framework implementation (including a UIMA SDK) to develop applications that analyse large volumes of unstructured information, and has been increasingly adopted by a significant part of the BioNLP community that needs industrial-grade and robust applications to exploit the whole bibliome. The use of UIMA to develop Text Mining applications useful for curation purposes allows the combination of diverse expertises which is beyond the individual know-how of biologists, computer scientists or linguists in isolation. A good synergy and circulation of knowledge between these experts is fundamental to the development of a successful curation tool

Nature Precedings

TransBooster: boosting the performance of wide-coverage machine translation systems

Author: Khasin Anna
Mellebeek Bart
van Genabith Josef
Way Andy
Publication venue
Publication date: 01/01/2005
Field of study

We propose the design, implementation and evaluation of a novel and modular approach to boost the translation performance of existing, wide-coverage, freely available machine translation systems based on reliable and fast automatic decomposition of the translation input and corresponding composition of translation output. We provide details of our method, and experimental results compared to the MT systems SYSTRAN and Logomedia. While many avenues for further experimentation remain, to date we fall just behind the baseline systems on the full 800-sentence testset, but in certain cases our method causes the translation quality obtained via the MT systems to improve

DCU Online Research Access Service

Multi-engine machine translation by recursive sentence decomposition

Author: Mellebeek Bart
Owczarzak Karolina
van Genabith Josef
Way Andy
Publication venue
Publication date: 01/01/2006
Field of study

In this paper, we present a novel approach to combine the outputs of multiple MT engines into a consensus translation. In contrast to previous Multi-Engine Machine Translation (MEMT) techniques, we do not rely on word alignments of output hypotheses, but prepare the input sentence for multi-engine processing. We do this by using a recursive decomposition algorithm that produces simple chunks as input to the MT engines. A consensus translation is produced by combining the best chunk translations, selected through majority voting, a trigram language model score and a confidence score assigned to each MT engine. We report statistically significant relative improvements of up to 9% BLEU score in experiments (English→Spanish) carried out on an 800-sentence test set extracted from the Penn-II Treebank

CiteSeerX

Irish Universities

DCU Online Research Access Service

Improving online machine translation systems

Author: Khasin Anna
Mellebeek Bart
Owczarzak Karolina
van Genabith Josef
Way Andy
Publication venue
Publication date: 01/01/2005
Field of study

In (Mellebeek et al., 2005), we proposed the design, implementation and evaluation of a novel and modular approach to boost the translation performance of existing, wide-coverage, freely available machine translation systems, based on reliable and fast automatic decomposition of the translation input and corresponding composition of translation output. Despite showing some initial promise, our method did not improve on the baseline Logomedia1 and Systran2 MT systems. In this paper, we improve on the algorithm presented in (Mellebeek et al., 2005), and on the same test data, show increased scores for a range of automatic evaluation metrics. Our algorithm now outperforms Logomedia, obtains similar results to SDL3 and falls tantalisingly short of the performance achieved by Systran

CiteSeerX

Irish Universities

DCU Online Research Access Service

A syntactic skeleton for statistical machine translation

Author: Groves Declan
Mellebeek Bart
Owczarzak Karolina
van Genabith Josef
Way Andy
Publication venue
Publication date: 01/01/2006
Field of study

We present a method for improving statistical machine translation performance by using linguistically motivated syntactic information. Our algorithm recursively decomposes source language sentences into syntactically simpler and shorter chunks, and recomposes their translation to form target language sentences. This improves both the word order and lexical selection of the translation. We report statistically significant relative improvementsof 3.3% BLEU score in an experiment (English!Spanish) carried out on an 800-sentence test set extracted from the Europarl corpus

CiteSeerX

Irish Universities

DCU Online Research Access Service

Wrapper syntax for example-based machine translation

Author: Groves Declan
Mellebeek Bart
Owczarzak Karolina
van Genabith Josef
Way Andy
Publication venue
Publication date: 01/01/2006
Field of study

TransBooster is a wrapper technology designed to improve the performance of wide-coverage machine translation systems. Using linguistically motivated syntactic information, it automatically decomposes source language sentences into shorter and syntactically simpler chunks, and recomposes their translation to form target language sentences. This generally improves both the word order and lexical selection of the translation. To date, TransBooster has been successfully applied to rule-based MT, statistical MT, and multi-engine MT. This paper presents the application of TransBooster to Example-Based Machine Translation. In an experiment conducted on test sets extracted from Europarl and the Penn II Treebank we show that our method can raise the BLEU score up to 3.8% relative to the EBMT baseline. We also conduct a manual evaluation, showing that TransBooster-enhanced EBMT produces a better output in terms of fluency than the baseline EBMT in 55% of the cases and in terms of accuracy in 53% of the cases

Irish Universities

DCU Online Research Access Service

TransBooster:black box optimisation of machine translation systems

Author: Mellebeek Bart
Publication venue: Dublin City University. School of Computing
Publication date: 01/01/2007
Field of study

Machine Translation (MT) systems tend to underperform when faced with long, linguistically complex sentences. Rule-based systems often trade a broad but shallow linguistic coverage for a deep, fine-grained analysis since hand-crafting rules based on detailed linguistic analyses is time-consuming, error-prone and expensive. Most datadriven systems lack the necessary syntactic knowledge to effectively deal with non-local grammatical phenomena. Therefore, both rule-based and data-driven MT systems are better at handling short, simple sentences than linguistically complex ones. This thesis proposes a new and modular approach to help MT systems improve then output quality by reducing the number of complexities in the input. Instead of trying to reinvent the wheel by proposing yet another approach to MT, we build on the strengths of existing MT paradigms while trying to remedy their shortcomings as much as possible. We do this by developing TransBooster, a wrapper technology that reduces the complexity of the MT input by a recursive decomposition algorithm which produces simple input chunks that are spoon-fed to a baseline MT system TransBooster is not an MT system itself: it does not perform automatic translation, but operates on top of an existing MT system, gulding it through the input and trying to help the baseline system to improve the quality of its own translations through automatic complexity reduction. In this dissertation, we outline the motivation behind TransBooster, explain its development in depth and investigate its impact on the three most important paradigms in the field Rule-based, Example-based and Statistical MT. In addition, we use the Trans-Booster architecture as a promising alternative to current Multi-Engine MT techniques. We evaluate TransBooster on the language pair Engl~sh-+Spanish with a combination of automatic and manual evaluation metrics, prov~ding a rigorous analysis of the potential and shortcomings of our approach

Irish Universities

DCU Online Research Access Service

Dublin City University at CLEF 2004: experiments with the ImageCLEF St Andrew's collection

Author: Groves Declan
Jones Gareth J.F.
Khasin Anna
Lam-Adesina Adenike M.
Mellebeek Bart
Way Andy
Publication venue
Publication date: 01/01/2004
Field of study

For the CLEF 2004 ImageCLEF St Andrew's Collection task the Dublin City University group carried out three sets of experiments: standard cross-language information retrieval (CLIR) runs using topic translation via machine translation (MT), combination of this run with image matching results from the VIPER system, and a novel document rescoring approach based on automatic MT evaluation metrics. Our standard MT-based CLIR works well on this task. Encouragingly combination with image matching lists is also observed to produce small positive changes in the retrieval output. However, rescoring using the MT evaluation metrics in their current form significantly reduced retrieval effectiveness

DCU Online Research Access Service

MaTrEx: machine translation using examples

Author: Armstrong Stephen
Flanagan Marian
Graham Yvette
Groves Declan
Mellebeek Bart
Morrissey Sara
Stroppa Nicolas
Way Andy
Publication venue
Publication date: 01/01/2006
Field of study

Irish Universities

DCU Online Research Access Service

Negociações de significados sobre aspectos do raciocínio proporcional e identidade profissional da cop - paem

Author: Groves Declan
Jones Gareth J.F.
Khasin Anna
Lam-Adesina Adenike M.
Mellebeek Bart
Way Andy
Publication venue: SEMUR
Publication date: 01/01/2004
Field of study

Nesse artigo apresentamos resultados parciais de uma pesquisa em desenvolvimento no contexto de uma comunidade de prática, formada por pesquisadores e professores de matemática da educação básica, que busca evidenciar aprendizagens e elementos desse contexto que colaboram para o desenvolvimento da identidade profissional do professor. Nossas considerações resultam da análise deum episódio, parte de uma ação desenvolvida pelos membros da comunidade, relacionado ao empreendimento estudo do Raciocínio Proporcional, em que um dos participantes propõe aos demais um problema com potencial para mobilizar o raciocínio relativo, analisa as estratégias de resolução dos demais, aponta e justifica evidências dessa mobilização quando ocorrem. A análise das transcrições dos áudios gravados nos encontros semanais dos membros da comunidade e dos registros escritos dos participantes evidenciou negociações de significado a respeito de conhecimentos profissionais do professor, da visão de si e da profissão de professor; e aspectos de agência e vulnerabilidade do participante que propõe a tarefa, ao se colocar em uma posição mais central na comunidade. Essas evidências indicam aprendizagens dos participantes que nos permitem inferir que ações dessa natureza no contexto de formação docente podem colaborar para o desenvolvimento da identidade profissional do professor

Irish Universities

Funes