13,677 research outputs found
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
MPI+X: task-based parallelization and dynamic load balance of finite element assembly
The main computing tasks of a finite element code(FE) for solving partial
differential equations (PDE's) are the algebraic system assembly and the
iterative solver. This work focuses on the first task, in the context of a
hybrid MPI+X paradigm. Although we will describe algorithms in the FE context,
a similar strategy can be straightforwardly applied to other discretization
methods, like the finite volume method. The matrix assembly consists of a loop
over the elements of the MPI partition to compute element matrices and
right-hand sides and their assemblies in the local system to each MPI
partition. In a MPI+X hybrid parallelism context, X has consisted traditionally
of loop parallelism using OpenMP. Several strategies have been proposed in the
literature to implement this loop parallelism, like coloring or substructuring
techniques to circumvent the race condition that appears when assembling the
element system into the local system. The main drawback of the first technique
is the decrease of the IPC due to bad spatial locality. The second technique
avoids this issue but requires extensive changes in the implementation, which
can be cumbersome when several element loops should be treated. We propose an
alternative, based on the task parallelism of the element loop using some
extensions to the OpenMP programming model. The taskification of the assembly
solves both aforementioned problems. In addition, dynamic load balance will be
applied using the DLB library, especially efficient in the presence of hybrid
meshes, where the relative costs of the different elements is impossible to
estimate a priori. This paper presents the proposed methodology, its
implementation and its validation through the solution of large computational
mechanics problems up to 16k cores
Clustering-Based Materialized View Selection in Data Warehouses
Materialized view selection is a non-trivial task. Hence, its complexity must
be reduced. A judicious choice of views must be cost-driven and influenced by
the workload experienced by the system. In this paper, we propose a framework
for materialized view selection that exploits a data mining technique
(clustering), in order to determine clusters of similar queries. We also
propose a view merging algorithm that builds a set of candidate views, as well
as a greedy process for selecting a set of views to materialize. This selection
is based on cost models that evaluate the cost of accessing data using views
and the cost of storing these views. To validate our strategy, we executed a
workload of decision-support queries on a test data warehouse, with and without
using our strategy. Our experimental results demonstrate its efficiency, even
when storage space is limited
Validating plans with continuous effects
A critical element in the use of PDDL2.1, the modelling language developed for the International Planning Competition series, has been the common understanding of the semantics of the language. The fact that this has been implemented in plan validation software was vital to the progress of the competition. However, the validation of plans using actions with continuous effects presents new challenges (that precede the challenges presented by planning with those effects). In this paper we review the need for continuous effects, their semantics and the problems that arise in validation of plans that include them. We report our progress in implementing the semantics in an extended version of the plan validation software
A generic framework for the analysis and specialization of logic programs
The relationship between abstract interpretation and partial
deduction has received considerable attention and (partial) integrations have been proposed starting from both the partial deduction and abstract interpretation perspectives. In this work we present what we argĂŒe is the first fully described generic algorithm for efñcient and precise integration of abstract interpretation and partial deduction. Taking as starting point state-of-the-art algorithms for context-sensitive, polyvariant abstract interpretation and (abstract) partial deduction, we present
an algorithm which combines the best of both worlds. Key ingredients include the accurate success propagation inherent to abstract interpretation and the powerful program transformations achievable by partial deduction. In our algorithm, the calis which appear in the analysis graph
are not analyzed w.r.t. the original definition of the procedure but w.r.t. specialized definitions of these procedures. Such specialized definitions are obtained by applying both unfolding and abstract executability. Our framework is parametric w.r.t. different control strategies and abstract domains. Different combinations of such parameters correspond to existing algorithms for program analysis and specialization. Simultaneously, our approach opens the door to the efñcient computation of strictly more
precise results than those achievable by each of the individual techniques.
The algorithm is now one of the key components of the CiaoPP analysis
and specialization system
A Factoid Question Answering System for Vietnamese
In this paper, we describe the development of an end-to-end factoid question
answering system for the Vietnamese language. This system combines both
statistical models and ontology-based methods in a chain of processing modules
to provide high-quality mappings from natural language text to entities. We
present the challenges in the development of such an intelligent user interface
for an isolating language like Vietnamese and show that techniques developed
for inflectional languages cannot be applied "as is". Our question answering
system can answer a wide range of general knowledge questions with promising
accuracy on a test set.Comment: In the proceedings of the HQA'18 workshop, The Web Conference
Companion, Lyon, Franc
Text Summarization by Sentence Extraction and Syntactic Pruning
Nous prĂ©sentons une mĂ©thode hybride pour le rĂ©sumĂ© de texte, en combinant l'extraction de phrases et l'Ă©lagage syntaxique des phrases extraites. L'Ă©lagage syntaxique est effectuĂ© sur la base dâune analyse complĂšte des phrases selon un parseur de dĂ©pendances, analyse rĂ©alisĂ©e par la grammaire dĂ©veloppĂ©e au sein d'un logiciel commercial de correction grammaticale, le Correcteur 101. Des sous-arbres de l'analyse syntaxique sont supprimĂ©s quand ils sont identifiĂ©s par les relations ciblĂ©es. L'analyse est rĂ©alisĂ©e sur un corpus de divers textes. Le taux de rĂ©duction des phrases extraites est dâen moyenne environ 74%, tout en conservant la grammaticalitĂ© ou la lisibilitĂ© dans une proportion de plus de 64%. Ătant donnĂ© ces premiers rĂ©sultats sur un ensemble limitĂ© de relations syntaxiques, cela laisse entrevoir des possibilitĂ©s pour une application de rĂ©sumĂ© automatique de texte.CRSN
- âŠ