13,677 research outputs found

    A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena

    Get PDF
    Word reordering is one of the most difficult aspects of statistical machine translation (SMT), and an important factor of its quality and efficiency. Despite the vast amount of research published to date, the interest of the community in this problem has not decreased, and no single method appears to be strongly dominant across language pairs. Instead, the choice of the optimal approach for a new translation task still seems to be mostly driven by empirical trials. To orientate the reader in this vast and complex research area, we present a comprehensive survey of word reordering viewed as a statistical modeling challenge and as a natural language phenomenon. The survey describes in detail how word reordering is modeled within different string-based and tree-based SMT frameworks and as a stand-alone task, including systematic overviews of the literature in advanced reordering modeling. We then question why some approaches are more successful than others in different language pairs. We argue that, besides measuring the amount of reordering, it is important to understand which kinds of reordering occur in a given language pair. To this end, we conduct a qualitative analysis of word reordering phenomena in a diverse sample of language pairs, based on a large collection of linguistic knowledge. Empirical results in the SMT literature are shown to support the hypothesis that a few linguistic facts can be very useful to anticipate the reordering characteristics of a language pair and to select the SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic

    MPI+X: task-based parallelization and dynamic load balance of finite element assembly

    Get PDF
    The main computing tasks of a finite element code(FE) for solving partial differential equations (PDE's) are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI+X paradigm. Although we will describe algorithms in the FE context, a similar strategy can be straightforwardly applied to other discretization methods, like the finite volume method. The matrix assembly consists of a loop over the elements of the MPI partition to compute element matrices and right-hand sides and their assemblies in the local system to each MPI partition. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop parallelism using OpenMP. Several strategies have been proposed in the literature to implement this loop parallelism, like coloring or substructuring techniques to circumvent the race condition that appears when assembling the element system into the local system. The main drawback of the first technique is the decrease of the IPC due to bad spatial locality. The second technique avoids this issue but requires extensive changes in the implementation, which can be cumbersome when several element loops should be treated. We propose an alternative, based on the task parallelism of the element loop using some extensions to the OpenMP programming model. The taskification of the assembly solves both aforementioned problems. In addition, dynamic load balance will be applied using the DLB library, especially efficient in the presence of hybrid meshes, where the relative costs of the different elements is impossible to estimate a priori. This paper presents the proposed methodology, its implementation and its validation through the solution of large computational mechanics problems up to 16k cores

    Clustering-Based Materialized View Selection in Data Warehouses

    Full text link
    Materialized view selection is a non-trivial task. Hence, its complexity must be reduced. A judicious choice of views must be cost-driven and influenced by the workload experienced by the system. In this paper, we propose a framework for materialized view selection that exploits a data mining technique (clustering), in order to determine clusters of similar queries. We also propose a view merging algorithm that builds a set of candidate views, as well as a greedy process for selecting a set of views to materialize. This selection is based on cost models that evaluate the cost of accessing data using views and the cost of storing these views. To validate our strategy, we executed a workload of decision-support queries on a test data warehouse, with and without using our strategy. Our experimental results demonstrate its efficiency, even when storage space is limited

    Validating plans with continuous effects

    Get PDF
    A critical element in the use of PDDL2.1, the modelling language developed for the International Planning Competition series, has been the common understanding of the semantics of the language. The fact that this has been implemented in plan validation software was vital to the progress of the competition. However, the validation of plans using actions with continuous effects presents new challenges (that precede the challenges presented by planning with those effects). In this paper we review the need for continuous effects, their semantics and the problems that arise in validation of plans that include them. We report our progress in implementing the semantics in an extended version of the plan validation software

    A generic framework for the analysis and specialization of logic programs

    Get PDF
    The relationship between abstract interpretation and partial deduction has received considerable attention and (partial) integrations have been proposed starting from both the partial deduction and abstract interpretation perspectives. In this work we present what we argĂŒe is the first fully described generic algorithm for efñcient and precise integration of abstract interpretation and partial deduction. Taking as starting point state-of-the-art algorithms for context-sensitive, polyvariant abstract interpretation and (abstract) partial deduction, we present an algorithm which combines the best of both worlds. Key ingredients include the accurate success propagation inherent to abstract interpretation and the powerful program transformations achievable by partial deduction. In our algorithm, the calis which appear in the analysis graph are not analyzed w.r.t. the original definition of the procedure but w.r.t. specialized definitions of these procedures. Such specialized definitions are obtained by applying both unfolding and abstract executability. Our framework is parametric w.r.t. different control strategies and abstract domains. Different combinations of such parameters correspond to existing algorithms for program analysis and specialization. Simultaneously, our approach opens the door to the efñcient computation of strictly more precise results than those achievable by each of the individual techniques. The algorithm is now one of the key components of the CiaoPP analysis and specialization system

    A Factoid Question Answering System for Vietnamese

    Full text link
    In this paper, we describe the development of an end-to-end factoid question answering system for the Vietnamese language. This system combines both statistical models and ontology-based methods in a chain of processing modules to provide high-quality mappings from natural language text to entities. We present the challenges in the development of such an intelligent user interface for an isolating language like Vietnamese and show that techniques developed for inflectional languages cannot be applied "as is". Our question answering system can answer a wide range of general knowledge questions with promising accuracy on a test set.Comment: In the proceedings of the HQA'18 workshop, The Web Conference Companion, Lyon, Franc

    Text Summarization by Sentence Extraction and Syntactic Pruning

    Full text link
    Nous prĂ©sentons une mĂ©thode hybride pour le rĂ©sumĂ© de texte, en combinant l'extraction de phrases et l'Ă©lagage syntaxique des phrases extraites. L'Ă©lagage syntaxique est effectuĂ© sur la base d’une analyse complĂšte des phrases selon un parseur de dĂ©pendances, analyse rĂ©alisĂ©e par la grammaire dĂ©veloppĂ©e au sein d'un logiciel commercial de correction grammaticale, le Correcteur 101. Des sous-arbres de l'analyse syntaxique sont supprimĂ©s quand ils sont identifiĂ©s par les relations ciblĂ©es. L'analyse est rĂ©alisĂ©e sur un corpus de divers textes. Le taux de rĂ©duction des phrases extraites est d’en moyenne environ 74%, tout en conservant la grammaticalitĂ© ou la lisibilitĂ© dans une proportion de plus de 64%. Étant donnĂ© ces premiers rĂ©sultats sur un ensemble limitĂ© de relations syntaxiques, cela laisse entrevoir des possibilitĂ©s pour une application de rĂ©sumĂ© automatique de texte.CRSN
    • 

    corecore