1,749 research outputs found

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    Quantifying Brain Microstructure with Diffusion MRI: An Assessment of Microscopic Anisotropy Imaging

    Get PDF
    Diffusion-weighted magnetic resonance imaging is routinely used for quantifying microstructural properties of brain tissue in both health and disease for its ability to be sensitive to the displacements of water molecules on a microscopic level. Significant effort has been put into the development of methods that provide more information on tissue microstructure than conventional diffusion tensor imaging. Multidimensional diffusion encoding methods render the signal sensitive to the displacements of water molecules that occur along two or three dimensions and can resolve some degeneracies in data acquired with single diffusion encoding methods that measure diffusion along a single dimension. The aim of this thesis is to study the state-of-the-art microstructural imaging methods and to assess their robustness in estimating microscopic diffusion anisotropy, i.e., the average anisotropy of the microscopic diffusion environments irrespective of their orientation dispersion, prior to their adoption in the wider neuroscience research community and possible deployment in clinical studies. First, a massively parallel Monte Carlo random walk simulator is presented. Second, the reproducibility of three commonly used microstructural models is quantified and the shortcomings of such single diffusion encoding methods in estimating microscopic diffusion anisotropy are addressed. Third, the challenges of estimating microscopic diffusion anisotropy in the human brain using double diffusion encoding are addressed using animal imaging experiments and simulations. The results support the feasibility of double diffusion encoding in human neuroimaging but raise hitherto overlooked precision issues when measuring microscopic diffusion anisotropy. Fourth, the accuracy and precision of microscopic diffusion anisotropy estimation using q-space trajectory encoding, a multidimensional diffusion encoding method specifically developed with the limitations of clinical whole-body scanners in mind, are assessed using imaging experiments and simulations. The results suggest that although broken model assumptions and time-dependent diffusion may bias the estimates, the effect of time-dependent diffusion on the estimated microscopic diffusion anisotropy is small in human white matter

    Apprentissage discriminant des modèles continus en traduction automatique

    Get PDF
    Over the past few years, neural network (NN) architectures have been successfully applied to many Natural Language Processing (NLP) applications, such as Automatic Speech Recognition (ASR) and Statistical Machine Translation (SMT).For the language modeling task, these models consider linguistic units (i.e words and phrases) through their projections into a continuous (multi-dimensional) space, and the estimated distribution is a function of these projections. Also qualified continuous-space models (CSMs), their peculiarity hence lies in this exploitation of a continuous representation that can be seen as an attempt to address the sparsity issue of the conventional discrete models. In the context of SMT, these echniques have been applied on neural network-based language models (NNLMs) included in SMT systems, and oncontinuous-space translation models (CSTMs). These models have led to significant and consistent gains in the SMT performance, but are also considered as very expensive in training and inference, especially for systems involving large vocabularies. To overcome this issue, Structured Output Layer (SOUL) and Noise Contrastive Estimation (NCE) have been proposed; the former modifies the standard structure on vocabulary words, while the latter approximates the maximum-likelihood estimation (MLE) by a sampling method. All these approaches share the same estimation criterion which is the MLE ; however using this procedure results in an inconsistency between theobjective function defined for parameter stimation and the way models are used in the SMT application. The work presented in this dissertation aims to design new performance-oriented and global training procedures for CSMs to overcome these issues. The main contributions lie in the investigation and evaluation of efficient training methods for (large-vocabulary) CSMs which aim~:(a) to reduce the total training cost, and (b) to improve the efficiency of these models when used within the SMT application. On the one hand, the training and inference cost can be reduced (using the SOUL structure or the NCE algorithm), or by reducing the number of iterations via a faster convergence. This thesis provides an empirical analysis of these solutions on different large-scale SMT tasks. On the other hand, we propose a discriminative training framework which optimizes the performance of the whole system containing the CSM as a component model. The experimental results show that this framework is efficient to both train and adapt CSM within SMT systems, opening promising research perspectives.Durant ces dernières années, les architectures de réseaux de neurones (RN) ont été appliquées avec succès à de nombreuses applications en Traitement Automatique de Langues (TAL), comme par exemple en Reconnaissance Automatique de la Parole (RAP) ainsi qu'en Traduction Automatique (TA).Pour la tâche de modélisation statique de la langue, ces modèles considèrent les unités linguistiques (c'est-à-dire des mots et des segments) à travers leurs projections dans un espace continu (multi-dimensionnel), et la distribution de probabilité à estimer est une fonction de ces projections.Ainsi connus sous le nom de "modèles continus" (MC), la particularité de ces derniers se trouve dans l'exploitation de la représentation continue qui peut être considérée comme une solution au problème de données creuses rencontré lors de l'utilisation des modèles discrets conventionnels.Dans le cadre de la TA, ces techniques ont été appliquées dans les modèles de langue neuronaux (MLN) utilisés dans les systèmes de TA, et dans les modèles continus de traduction (MCT).L'utilisation de ces modèles se sont traduit par d'importantes et significatives améliorations des performances des systèmes de TA. Ils sont néanmoins très coûteux lors des phrases d'apprentissage et d'inférence, notamment pour les systèmes ayant un grand vocabulaire.Afin de surmonter ce problème, l'architecture SOUL (pour "Structured Output Layer" en anglais) et l'algorithme NCE (pour "Noise Contrastive Estimation", ou l'estimation contrastive bruitée) ont été proposés: le premier modifie la structure standard de la couche de sortie, alors que le second cherche à approximer l'estimation du maximum de vraisemblance (MV) par une méthode d’échantillonnage.Toutes ces approches partagent le même critère d'estimation qui est la log-vraisemblance; pourtant son utilisation mène à une incohérence entre la fonction objectif définie pour l'estimation des modèles, et la manière dont ces modèles seront utilisés dans les systèmes de TA.Cette dissertation vise à concevoir de nouvelles procédures d'entraînement des MC, afin de surmonter ces problèmes.Les contributions principales se trouvent dans l'investigation et l'évaluation des méthodes d'entraînement efficaces pour MC qui visent à: (i) réduire le temps total de l'entraînement, et (ii) améliorer l'efficacité de ces modèles lors de leur utilisation dans les systèmes de TA.D'un côté, le coût d'entraînement et d'inférence peut être réduit (en utilisant l'architecture SOUL ou l'algorithme NCE), ou la convergence peut être accélérée.La dissertation présente une analyse empirique de ces approches pour des tâches de traduction automatique à grande échelle.D'un autre côté, nous proposons un cadre d'apprentissage discriminant qui optimise la performance du système entier ayant incorporé un modèle continu.Les résultats expérimentaux montrent que ce cadre d'entraînement est efficace pour l'apprentissage ainsi que pour l'adaptation des MC au sein des systèmes de TA, ce qui ouvre de nouvelles perspectives prometteuses

    Sensible energy accounting with abstract metering for multicore systems

    Get PDF
    Chip multicore processors (CMPs) are the preferred processing platform across different domains such as data centers, real-time systems, and mobile devices. In all those domains, energy is arguably the most expensive resource in a computing system. Accurately quantifying energy usage in a multicore environment presents a challenge as well as an opportunity for optimization. Standard metering approaches are not capable of delivering consistent results with shared resources, since the same task with the same inputs may have different energy consumption based on the mix of co-running tasks. However, it is reasonable for data-center operators to charge on the basis of estimated energy usage rather than time since energy is more correlated with their actual cost. This article introduces the concept of Sensible Energy Accounting (SEA). For a task running in a multicore system, SEA accurately estimates the energy the task would have consumed running in isolation with a given fraction of the CMP shared resources. We explain the potential benefits of SEA in different domains and describe two hardware techniques to implement it for a shared last-level cache and on-core resources in SMT processors. Moreover, with SEA, an energy-aware scheduler can find a highly efficient on-chip resource assignment, reducing by up to 39% the total processor energy for a 4-core system.Peer ReviewedPostprint (author's final draft

    Some Key Developments in Computational Electromagnetics and their Attribution

    No full text
    Key developments in computational electromagnetics are proposed. Historical highlights are summarized concentrating on the two main approaches of differential and integral methods. This is seen as timely as a retrospective analysis is needed to minimize duplication and to help settle questions of attribution

    Model-Checking-based vs. SMT-based Consistency Analysis of Industrial Embedded Systems Requirements: Application and Experience

    Get PDF
    Industry relies predominantly on manual peer-review techniques for assessing the correctness of system specifications. However, with the ever increasing size, complexity and intricacy of the specifications, it becomes difficult to assure their correctness with respect to certain criteria such as consistency. To cope with this challenge, a set of techniques based on formal methods, called \textit{sanity checks} have been proposed to automatically assess the quality of system specifications in a systematic and rigorous manner. The predominant way of assessing the sanity of system specifications is by model checking, which in literature is reported to be expensive for analysis as it takes a long time for the procedure to terminate. Recently, another approach for checking the consistency of a system's specification using Satisfiability Modulo Theories has been proposed in order to reduce the analysis time. In this paper, we compare the two approaches for consistency analysis, by applying them on a relevant industrial use case, using the same definition for consistency and the same set of requirements. The comparison is carried out with respect to: i) time for generating the model and the latter's complexity, and ii) consistency analysis time. Contrary to the currently available data, our preliminary results show no significant difference in analysis time when applied on the same system specification under the same definition of consistency, but show significant difference in the time of creating the model for analysis
    corecore