Search CORE

2,964 research outputs found

Phrase level segmentation and labelling of machine translation errors

Author: Blain Frédéric
Logacheva Varvara
Specia Lucia
Publication venue: European Language Resources Association (ELRA)
Publication date: 25/08/2020
Field of study

© 2016 The Authors. Published by European Language Resources Association (ELRA). This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://www.aclweb.org/anthology/L16-1356/This paper presents our work towards a novel approach for Quality Estimation (QE) of machine translation based on sequences of adjacent words, the so-called phrases. This new level of QE aims to provide a natural balance between QE at word and sentence-level, which are either too fine grained or too coarse levels for some applications. However, phrase-level QE implies an intrinsic challenge: how to segment a machine translation into sequence of words (contiguous or not) that represent an error. We discuss three possible segmentation strategies to automatically extract erroneous phrases. We evaluate these strategies against annotations at phrase-level produced by humans, using a new dataset collected for this purpose.The authors would like to thanks all the annotators who helped to create the first version of gold-standard annotations at phrase-level. This work was supported by the QT21 (H2020 No. 645452, Lucia Specia, Fred´ eric Blain) and EX-PERT (EU FP7 Marie Curie ITN No. 317471, Varvara Logacheva) projects

Wolverhampton Intellectual Repository and E-theses

Human Feedback in Statistical Machine Translation

Author: Logacheva Varvara
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/04/2017
Field of study

The thesis addresses the challenge of improving Statistical Machine Translation (SMT) systems via feedback given by humans on translation quality. The amount of human feedback available to systems is inherently low due to cost and time limitations. One of our goals is to simulate such information by automatically generating pseudo-human feedback. This is performed using Quality Estimation (QE) models. QE is a technique for predicting the quality of automatic translations without comparing them to oracle (human) translations, traditionally at the sentence or word levels. QE models are trained on a small collection of automatic translations manually labelled for quality, and then can predict the quality of any number of unseen translations. We propose a number of improvements for QE models in order to increase the reliability of pseudo-human feedback. These include strategies to artificially generate instances for settings where QE training data is scarce. We also introduce a new level of granularity for QE: the level of phrases. This level aims to improve the quality of QE predictions by better modelling inter-dependencies among errors at word level, and in ways that are tailored to phrase-based SMT, where the basic unit of translation is a phrase. This can thus facilitate work on incorporating human feedback during the translation process. Finally, we introduce approaches to incorporate pseudo-human feedback in the form of QE predictions in SMT systems. More specifically, we use quality predictions to select the best translation from a number of alternative suggestions produced by SMT systems, and integrate QE predictions into an SMT system decoder in order to guide the translation generation process

White Rose E-theses Online

Generating semantically enriched diagnostics for radiological images using machine learning

Author: Gasimova Aydan
Publication venue: Computing, Imperial College London
Publication date: 01/01/2023
Field of study

Development of Computer Aided Diagnostic (CAD) tools to aid radiologists in pathology detection and decision making relies considerably on manually annotated images. With the advancement of deep learning techniques for CAD development, these expert annotations no longer need to be hand-crafted, however, deep learning algorithms require large amounts of data in order to generalise well. One way in which to access large volumes of expert-annotated data is through radiological exams consisting of images and reports. Using past radiological exams obtained from hospital archiving systems has many advantages: they are expert annotations available in large quantities, covering a population-representative variety of pathologies, and they provide additional context to pathology diagnoses, such as anatomical location and severity. Learning to auto-generate such reports from images presents many challenges such as the difficulty in representing and generating long, unstructured textual information, accounting for spelling errors and repetition or redundancy, and the inconsistency across different annotators. In this thesis, the problem of learning to automate disease detection from radiological exams is approached from three directions. Firstly, a report generation model is developed such that it is conditioned on radiological image features. Secondly, a number of approaches are explored aimed at extracting diagnostic information from free-text reports. Finally, an alternative approach to image latent space learning from current state-of-the-art is developed that can be applied to accelerated image acquisition.Open Acces

Spiral - Imperial College Digital Repository

What´s in the "pure" prosody?

Author: Strom Volker
Widera Christina
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1996
Field of study

Detectors for accents and phrase boundaries have been developed which derive prosodic features from the speech signal and its fundamental frequency to support other modules of a speech understanding system in an early analysis stage, or in cases where no word hypotheses are available.The detectors underlying Gaussian distribution classifiers were trained with 50 minutes and tested with 30 minutes of spontaneous speech, yielding recognition rates of 74% for accents and 86% for phrase boundaries. Since this material was prosodically hand labelled, the question was, which labels for phrase boundaries and accentuation were only guided by syntactic or semantic knowledge, and which ones are really prosodically marked.Therefore a small test subset has been resynthesized in such a way that comprehensibility was lost, but the prosodic characteristics were kept. This subset has been re-labelled by 11listeners with nearly the same accuracy as the detectors

CiteSeerX

Universaar

Acronym

Discourse structure and language technology

Author: Agarwal
Al-Saif
Al-Saif
Asher
B. WEBBER
Baldridge
Barzilay
Barzilay
Bex
Buch-Kromann
Buch-Kromann
Bunt
Burchardt
Burstein
Callison-Birch
Chambers
Chen
Chiarcos
Choi
Dale
Daume
Do
Eales
Egg
Eisenstein
Elsner
Elsner
Elwell
Finlayson
Foster
Galley
Ghorbel
Ghosh
Ghosh
Grosz
Grosz
Grosz
Gu
Guo
Halliday
Hardmeier
Hardt
Hearst
Higgins
Hirohata
Holler
Hovy
Ide
Kan
Kingsbury
Koppel
Lee
Lee
Liakata
Lin
Lochbaum
Louis
M. EGG
Maamouri
Malioutov
Mandler
Marcu
Marcu
Marcu
Marcus
Martin
Maslennikov
McDonald
McKnight
Meyer
Mladová
Moore
Moore
Moore
Moser
Nagard
Oza
Palau
Pang
Pang
Paris
Patwardhan
Petukhova
Petukhova
Pitler
Pitler
Polanyi
Polanyi
Polanyi
Prasad
Prasad
Prasad
Prasad
Propp
Purver
Purver
Sagae
Sagae
Say
Schank
Sibun
Soricut
Stede
Subba
Taboada
Teufel
Thione
Tonelli
Turney
V. KORDONI
Versley
Voll
Walker
Wang
Webber
Webber
Wellner
Woods
Zeyrek
Zeyrek
Zeyrek
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 08/12/2011
Field of study

This publication is with permission of the rights owner freely accessible due to an Alliance licence and a national licence (funded by the DFG, German Research Foundation) respectively.An increasing number of researchers and practitioners in Natural Language Engineering face the prospect of having to work with entire texts, rather than individual sentences. While it is clear that text must have useful structure, its nature may be less clear, making it more difficult to exploit in applications. This survey of work on discourse structure thus provides a primer on the bases of which discourse is structured along with some of their formal properties. It then lays out the current state-of-the-art with respect to algorithms for recognizing these different structures, and how these algorithms are currently being used in Language Technology applications. After identifying resources that should prove useful in improving algorithm performance across a range of languages, we conclude by speculating on future discourse structure-enabled technology.Peer Reviewe

Crossref

Edinburgh Research Explorer

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

A Taxonomy of Deep Convolutional Neural Nets for Computer Vision

Author: Babu R. Venkatesh
Kruthiventi Srinivas S S
Mopuri Konda Reddy
Prabhu Nikita
Sarvadevabhatla Ravi Kiran
Srinivas Suraj
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2016
Field of study

Traditional architectures for solving computer vision problems and the degree of success they enjoyed have been heavily reliant on hand-crafted features. However, of late, deep learning techniques have offered a compelling alternative -- that of automatically learning problem-specific features. With this new paradigm, every problem in computer vision is now being re-examined from a deep learning perspective. Therefore, it has become important to understand what kind of deep networks are suitable for a given problem. Although general surveys of this fast-moving paradigm (i.e. deep-networks) exist, a survey specific to computer vision is missing. We specifically consider one form of deep networks widely used in computer vision - convolutional neural networks (CNNs). We start with "AlexNet" as our base CNN and then examine the broad variations proposed over time to suit different applications. We hope that our recipe-style survey will serve as a guide, particularly for novice practitioners intending to use deep-learning techniques for computer vision.Comment: Published in Frontiers in Robotics and AI (http://goo.gl/6691Bm

arXiv.org e-Print Archive

Frontiers - Publisher Connector

PersoNER: Persian named-entity recognition

Author: Abdous M
Borzeshi EZ
Piccardi M
Poostchi H
Publication venue
Publication date: 01/01/2016
Field of study

© 1963-2018 ACL. Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present and provide ArmanPerosNERCorpus, the first manually-annotated Persian NER corpus. Then, we introduce PersoNER, an NER pipeline for Persian that leverages a word embedding and a sequential max-margin classifier. The experimental results show that the proposed approach is capable of achieving interesting MUC7 and CoNNL scores while outperforming two alternatives based on a CRF and a recurrent neural network

OPUS - University of Technology Sydney