13,295 research outputs found
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
Syntax Error Handling in Scannerless Generalized LR Parsers
This thesis is about a master's project as part of the one year master study
'Software-engineering'. This project is about methods for improving the quality
of reporting and handling of syntax errors that are produced by a scannerless
generalized left-to-right rightmost (SGLR) parser, and is done at Centrum voor
Wiskunde en Informatica (CWI) in Amsterdam.
SGLR is a parsing algorithm developed as part of Generic Language Technol-
ogy Project at SEN1, one of the themes at CWI. SGLR is based on the GLR
algorithm developed by Tomita.
SGLR parsers are able to recognize arbitrary context-free grammars, which
enables grammar modularization. Because SGLR does not use a separate scan-
ner, also layout and comments are incorporated into the parse tree. This makes
SGLR a powerful tool for code analysis and code transformations. A drawback
is the way SGLR handles syntax errors.
When a syntax error is detected, the current implementation of SGLR halts the
parsing process and reports back to the user the point of error detection only.
The text at the point of error detection is not necessarily the text that has to
be changed to repair the error.
This thesis describes three kinds of information that could be reported to the
user, and how they could be derived from the parse process when an error is
detected. These are:
- The structure of the already parsed part of the input in the form of a partial
parse tree.
- A listing of expected symbols; those tokens or token sequences that are accept-
able instead of the erroneous text.
- The current parser state which could be translated into language dependent
informative messages.
Also two ways of recovering from an error condition are described. These are
non-correcting recovery methods that enable SGLR to always return a parse
tree that can be unparsed into the original input sentence.
- A method that halts parsing but incorporates the remainder of the input into
the parse tree.
- A method that resumes parsing by means of substring parsing.
During the course of the project the described approaches have been imple-
mented and incorporated in the implementation of SGLR as used by the Meta-
Environment, some fully, some more or less prototyped
Bayesian Optimization for Probabilistic Programs
We present the first general purpose framework for marginal maximum a
posteriori estimation of probabilistic program variables. By using a series of
code transformations, the evidence of any probabilistic program, and therefore
of any graphical model, can be optimized with respect to an arbitrary subset of
its sampled variables. To carry out this optimization, we develop the first
Bayesian optimization package to directly exploit the source code of its
target, leading to innovations in problem-independent hyperpriors, unbounded
optimization, and implicit constraint satisfaction; delivering significant
performance improvements over prominent existing packages. We present
applications of our method to a number of tasks including engineering design
and parameter optimization
TLAD 2010 Proceedings:8th international workshop on teaching, learning and assesment of databases (TLAD)
This is the eighth in the series of highly successful international workshops on the Teaching, Learning and Assessment of Databases (TLAD 2010), which once again is held as a workshop of BNCOD 2010 - the 27th International Information Systems Conference. TLAD 2010 is held on the 28th June at the beautiful Dudhope Castle at the Abertay University, just before BNCOD, and hopes to be just as successful as its predecessors.The teaching of databases is central to all Computing Science, Software Engineering, Information Systems and Information Technology courses, and this year, the workshop aims to continue the tradition of bringing together both database teachers and researchers, in order to share good learning, teaching and assessment practice and experience, and further the growing community amongst database academics. As well as attracting academics from the UK community, the workshop has also been successful in attracting academics from the wider international community, through serving on the programme committee, and attending and presenting papers.This year, the workshop includes an invited talk given by Richard Cooper (of the University of Glasgow) who will present a discussion and some results from the Database Disciplinary Commons which was held in the UK over the academic year. Due to the healthy number of high quality submissions this year, the workshop will also present seven peer reviewed papers, and six refereed poster papers. Of the seven presented papers, three will be presented as full papers and four as short papers. These papers and posters cover a number of themes, including: approaches to teaching databases, e.g. group centered and problem based learning; use of novel case studies, e.g. forensics and XML data; techniques and approaches for improving teaching and student learning processes; assessment techniques, e.g. peer review; methods for improving students abilities to develop database queries and develop E-R diagrams; and e-learning platforms for supporting teaching and learning
TLAD 2010 Proceedings:8th international workshop on teaching, learning and assesment of databases (TLAD)
This is the eighth in the series of highly successful international workshops on the Teaching, Learning and Assessment of Databases (TLAD 2010), which once again is held as a workshop of BNCOD 2010 - the 27th International Information Systems Conference. TLAD 2010 is held on the 28th June at the beautiful Dudhope Castle at the Abertay University, just before BNCOD, and hopes to be just as successful as its predecessors.The teaching of databases is central to all Computing Science, Software Engineering, Information Systems and Information Technology courses, and this year, the workshop aims to continue the tradition of bringing together both database teachers and researchers, in order to share good learning, teaching and assessment practice and experience, and further the growing community amongst database academics. As well as attracting academics from the UK community, the workshop has also been successful in attracting academics from the wider international community, through serving on the programme committee, and attending and presenting papers.This year, the workshop includes an invited talk given by Richard Cooper (of the University of Glasgow) who will present a discussion and some results from the Database Disciplinary Commons which was held in the UK over the academic year. Due to the healthy number of high quality submissions this year, the workshop will also present seven peer reviewed papers, and six refereed poster papers. Of the seven presented papers, three will be presented as full papers and four as short papers. These papers and posters cover a number of themes, including: approaches to teaching databases, e.g. group centered and problem based learning; use of novel case studies, e.g. forensics and XML data; techniques and approaches for improving teaching and student learning processes; assessment techniques, e.g. peer review; methods for improving students abilities to develop database queries and develop E-R diagrams; and e-learning platforms for supporting teaching and learning
- …