7,972 research outputs found

    A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena

    Get PDF
    Word reordering is one of the most difficult aspects of statistical machine translation (SMT), and an important factor of its quality and efficiency. Despite the vast amount of research published to date, the interest of the community in this problem has not decreased, and no single method appears to be strongly dominant across language pairs. Instead, the choice of the optimal approach for a new translation task still seems to be mostly driven by empirical trials. To orientate the reader in this vast and complex research area, we present a comprehensive survey of word reordering viewed as a statistical modeling challenge and as a natural language phenomenon. The survey describes in detail how word reordering is modeled within different string-based and tree-based SMT frameworks and as a stand-alone task, including systematic overviews of the literature in advanced reordering modeling. We then question why some approaches are more successful than others in different language pairs. We argue that, besides measuring the amount of reordering, it is important to understand which kinds of reordering occur in a given language pair. To this end, we conduct a qualitative analysis of word reordering phenomena in a diverse sample of language pairs, based on a large collection of linguistic knowledge. Empirical results in the SMT literature are shown to support the hypothesis that a few linguistic facts can be very useful to anticipate the reordering characteristics of a language pair and to select the SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic

    Statistical Machine Translation between Myanmar Sign Language and Myanmar Written Text

    Get PDF
    This paper contributes the first evaluation of the quality of automatic translation between Myanmar sign language (MSL) and Myanmar written text, in both directions. Our developing MSL-Myanmar parallel corpus was used for translations and the experiments were carried out using three different statistical machine translation (SMT) approaches: phrase-based, hierarchical phrase-based, and the operation sequence model. In addition, three different segmentation schemes were studies, these were syllable segmentation, word segmentation and sign unit based word segmentation. The results show that the highest quality machine translation was attained with syllable segmentations for both MSL and Myanmar written text

    Genre Archive: Bibliography

    Full text link
    The Genre Archive, created by the English Language Institute at The University of Michigan, is a collection of around one thousand papers dealing in nearly all cases with some aspect or aspects of non-literary genres. The Archive was assembled by John Swales, his graduate students, and the visiting scholars who came to the institute, often supported by the H. Joan Morley Scholarship Fund, with the assistance of the staff of the ELI Library. The earliest papers are from the 1950s and the latest from 2007, but the majority are from the 1985 to 2005 period. Some are published papers; others dissertations or theses, or parts thereof; some are manuscripts, sometimes drafts of later publications and sometimes term papers or other coursework. Many of the last group have no date (n.d). This bibliography lists the papers contained in the Archive in alphabetical order by author, and then by year of publication. A few of the entries are highlighted in yellow, indicating that these papers themselves are currently missing. The Genre Archive exists solely in paper form and is housed at the ELI offices. Access to the Archive is available by appointment only. Researchers interested in visiting the Archive should email [email protected]. Unfortunately, we are not able to accept requests for scanned copies by mail or email or to otherwise circulate the contents of the Archive. (Introduction by John Swales)http://deepblue.lib.umich.edu/bitstream/2027.42/134394/1/ELI Genre Archive Bibliography 10-12-16.dochttp://deepblue.lib.umich.edu/bitstream/2027.42/134394/2/ELI Genre Archive Bibliography.pdf-1Description of ELI Genre Archive Bibliography 10-12-16.doc : Genre Archive: Bibliography (Word Version)Description of ELI Genre Archive Bibliography.pdf : Genre Archive: Bibliography (pdf version

    Research on Effective Designs and Evaluation for Speech Interface Systems

    Get PDF
    制度:新 ; 報告番号:乙2305号 ; 学位の種類:博士(工学) ; 授与年月日:2011/2/25 ; 早大学位記番号:新564

    Dialogue Act Recognition Approaches

    Get PDF
    This paper deals with automatic dialogue act (DA) recognition. Dialogue acts are sentence-level units that represent states of a dialogue, such as questions, statements, hesitations, etc. The knowledge of dialogue act realizations in a discourse or dialogue is part of the speech understanding and dialogue analysis process. It is of great importance for many applications: dialogue systems, speech recognition, automatic machine translation, etc. The main goal of this paper is to study the existing works about DA recognition and to discuss their respective advantages and drawbacks. A major concern in the DA recognition domain is that, although a few DA annotation schemes seem now to emerge as standards, most of the time, these DA tag-sets have to be adapted to the specificities of a given application, which prevents the deployment of standardized DA databases and evaluation procedures. The focus of this review is put on the various kinds of information that can be used to recognize DAs, such as prosody, lexical, etc., and on the types of models proposed so far to capture this information. Combining these information sources tends to appear nowadays as a prerequisite to recognize DAs
    corecore