54 research outputs found

    Statistical Phrase-based Translation

    Get PDF
    We propose a new phrase-based translation model and decoding algorithm that enables us to evaluate and compare several, previously proposed phrase-based translation models

    Cross-lingual C*ST*RD: English access to Hindi information

    Get PDF
    We present C*ST*RD, a cross-language information delivery system that supports cross-language information retrieval, information space visualization and navigation, machine translation, and text summarization of single documents and clusters of documents. C*ST*RD was assembled and trained within 1 month, in the context of DARPA’s Surprise Language Exercise, that selected as source a heretofore unstudied language, Hindi. Given the brief time, we could not create deep Hindi capabilities for all the modules, but instead experimented with combining shallow Hindi capabilities, or even English-only modules, into one integrated system. Various possible configurations, with different tradeoffs in processing speed and ease of use, enable the rapid deployment of C*ST*RD to new languages under various conditions

    A Comparison of Alignment Models for Statistical Machine Translation

    No full text
    In this paper, we present and cmnt)are various align- ment nodels for statistical machine translation. We propose to measure the quality of au aligmnent model using the quality of the Viterbi alignment compared to a manually-produced alignment and describe a refined annotation scheme to produce suitable reference alignments. We also compare the inpact of diirent aligmnent models on the translation quality of a statistical machine translation system

    AnEcientMethodforDeterminingBilingualWordClasses

    No full text
    Instatisticalnaturallanguageprocessingwealwaysfacetheproblemofsparse data.Onewaytoreducethisproblemis togroupwordsintoequivalenceclasses whichisastandardmethodinstatistical languagemodeling.Inthispaperwedescribeamethodtodeterminebilingual wordclassessuitableforstatisticalmachinetranslation.Wedevelopanoptimizationcriterionbasedonamaximumlikelihoodapproachanddescribeaclusteringalgorithm.Wewillshowthatthe usageofthebilingualwordclassesweget canimprovestatisticalmachinetranslation. 1Introduction Wordclassesareoftenusedinlanguagemodelling tosolvetheproblemofsparsedata.Variousclusteringtechniqueshavebeenproposed(Brownet al.,1992;JardinoandAdda,1993;Martinetal., 1998)whichperformautomaticwordclustering optimizingamaximum-likelihoodcriterionwith iterativeclusteringalgorithms. Intheeldofstatisticalmachinetranslation wealsofacetheproblemofsparsedata.Our aimistousewordclassesinstatisticalmachine translationtoallowformorerobuststatistical translationmodels.Anaiveapproachfordoing thiswouldbetheuseofmono-linguallyoptimized wordclassesinsourceandtargetlanguage.Unfortunatelywecannotexpecttheseindependently optimizedclassestobecorrespondent.Thereforemono-linguallyoptimizedwordclassesdonot seemtobeusefulformachinetranslation(seealso (FungandWu,1995)).Wedenebilingualword clusteringastheprocessofformingcorrespondingwordclassessuitableformachinetranslation purposesforapairoflanguagesusingaparallel trainingcorpus. Thedescribedmethodtodeterminebilingual wordclassesisanextensionandimprovement ofthemethodmentionedin(OchandWeber

    Minimum Error Rate Training in Statistical Machine Translation

    No full text
    Often, the training procedure for statistical machine translation models is based on maximum likelihood or related criteria. A general problem of this approach is that there is only a loose relation to the final translation quality on unseen text. In this paper, we analyze various training criteria which directly optimize translation quality

    Statistical multi-source translation

    No full text
    • …
    corecore