163 research outputs found
Statistical morphological disambiguation with application to disambiguation of pronunciations in Turkish /
The statistical morphological disambiguation of agglutinative languages suffers from data sparseness. In this study, we introduce the notion of distinguishing tag sets (DTS) to overcome the problem. The morphological analyses of words are modeled with DTS and the root major part-of-speech tags. The disambiguator based on the introduced representations performs the statistical morphological disambiguation of Turkish with a recall of as high as 95.69 percent. In text-to-speech systems and in developing transcriptions for acoustic speech data, the problem occurs in disambiguating the pronunciation of a token in context, so that the correct pronunciation can be produced or the transcription uses the correct set of phonemes. We apply the morphological disambiguator to this problem of pronunciation disambiguation and achieve 99.54 percent recall with 97.95 percent precision. Most text-to-speech systems perform phrase level accentuation based on content word/function word distinction. This approach seems easy and adequate for some right headed languages such as English but is not suitable for languages such as Turkish. We then use a a heuristic approach to mark up the phrase boundaries based on dependency parsing on a basis of phrase level accentuation for Turkish TTS synthesizers
Proceedings of the VIIth GSCP International Conference
The 7th International Conference of the Gruppo di Studi sulla Comunicazione Parlata, dedicated to the memory of Claire Blanche-Benveniste, chose as its main theme Speech and Corpora. The wide international origin of the 235 authors from 21 countries and 95 institutions led to papers on many different languages. The 89 papers of this volume reflect the themes of the conference: spoken corpora compilation and annotation, with the technological connected fields; the relation between prosody and pragmatics; speech pathologies; and different papers on phonetics, speech and linguistic analysis, pragmatics and sociolinguistics. Many papers are also dedicated to speech and second language studies. The online publication with FUP allows direct access to sound and video linked to papers (when downloaded)
Getting Past the Language Gap: Innovations in Machine Translation
In this chapter, we will be reviewing state of the art machine translation systems, and will discuss innovative methods for machine translation, highlighting the most promising techniques and applications. Machine translation (MT) has benefited from a revitalization in the last 10 years or so, after a period of relatively slow activity. In 2005 the field received a jumpstart when a powerful complete experimental package for building MT systems from scratch became freely available as a result of the unified efforts of the MOSES international consortium. Around the same time, hierarchical methods had been introduced by Chinese researchers, which allowed the introduction and use of syntactic information in translation modeling. Furthermore, the advances in the related field of computational linguistics, making off-the-shelf taggers and parsers readily available, helped give MT an additional boost. Yet there is still more progress to be made. For example, MT will be enhanced greatly when both syntax and semantics are on board: this still presents a major challenge though many advanced research groups are currently pursuing ways to meet this challenge head-on. The next generation of MT will consist of a collection of hybrid systems. It also augurs well for the mobile environment, as we look forward to more advanced and improved technologies that enable the working of Speech-To-Speech machine translation on hand-held devices, i.e. speech recognition and speech synthesis. We review all of these developments and point out in the final section some of the most promising research avenues for the future of MT
Satellite Workshop On Language, Artificial Intelligence and Computer Science for Natural Language Processing Applications (LAICS-NLP): Discovery of Meaning from Text
This paper proposes a novel method to disambiguate important words from a collection of documents. The
hypothesis that underlies this approach is that there is a
minimal set of senses that are significant in characterizing a context. We extend Yarowsky’s one sense
per discourse [13] further to a collection of related
documents rather than a single document. We perform
distributed clustering on a set of features representing
each of the top ten categories of documents in the
Reuters-21578 dataset. Groups of terms that have a
similar term distributional pattern across documents were
identified. WordNet-based similarity measurement was
then computed for terms within each cluster. An
aggregation of the associations in WordNet that was
employed to ascertain term similarity within clusters has
provided a means of identifying clusters’ root senses
- …