245 research outputs found
Multi-Module G2P Converter for Persian Focusing on Relations between Words
In this paper, we investigate the application of end-to-end and multi-module
frameworks for G2P conversion for the Persian language. The results demonstrate
that our proposed multi-module G2P system outperforms our end-to-end systems in
terms of accuracy and speed. The system consists of a pronunciation dictionary
as our look-up table, along with separate models to handle homographs, OOVs and
ezafe in Persian created using GRU and Transformer architectures. The system is
sequence-level rather than word-level, which allows it to effectively capture
the unwritten relations between words (cross-word information) necessary for
homograph disambiguation and ezafe recognition without the need for any
pre-processing. After evaluation, our system achieved a 94.48% word-level
accuracy, outperforming the previous G2P systems for Persian.Comment: 10 pages, 4 figure
Phonetic study and text mining of Spanish for English to Spanish translation system
Projecte realitzat en col.laboració amb el centre University of Southern Californi
MDL-based Models for Transliteration Generation
Peer reviewe
Phonetic study and text mining of Spanish for English to Spanish translation system
Projecte realitzat en col.laboració amb el centre University of Southern Californi
Foreign Language Translation of Chemical Nomenclature by Computer
Chemical compound names remain the primary method for conveying molecular structures between chemists and researchers. In research articles, patents, chemical catalogues, government legislation, and textbooks, the use of IUPAC and traditional compound names is universal, despite efforts to introduce more machine-friendly representations such as identifiers and line notations. Fortunately, advances in computing power now allow chemical names to be parsed and generated (read and written) with almost the same ease as conventional connection tables. A significant complication, however, is that although the vast majority of chemistry uses English nomenclature, a significant fraction is in other languages. This complicates the task of filing and analyzing chemical patents, purchasing from compound vendors, and text mining research articles or Web pages. We describe some issues with manipulating chemical names in various languages, including British, American, German, Japanese, Chinese, Spanish, Swedish, Polish, and Hungarian, and describe the current state-of-the-art in software tools to simplify the process
- …