326 research outputs found

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Comparative Evaluation of Translation Memory (TM) and Machine Translation (MT) Systems in Translation between Arabic and English

    Get PDF
    In general, advances in translation technology tools have enhanced translation quality significantly. Unfortunately, however, it seems that this is not the case for all language pairs. A concern arises when the users of translation tools want to work between different language families such as Arabic and English. The main problems facing ArabicEnglish translation tools lie in Arabic’s characteristic free word order, richness of word inflection – including orthographic ambiguity – and optionality of diacritics, in addition to a lack of data resources. The aim of this study is to compare the performance of translation memory (TM) and machine translation (MT) systems in translating between Arabic and English.The research evaluates the two systems based on specific criteria relating to needs and expected results. The first part of the thesis evaluates the performance of a set of well-known TM systems when retrieving a segment of text that includes an Arabic linguistic feature. As it is widely known that TM matching metrics are based solely on the use of edit distance string measurements, it was expected that the aforementioned issues would lead to a low match percentage. The second part of the thesis evaluates multiple MT systems that use the mainstream neural machine translation (NMT) approach to translation quality. Due to a lack of training data resources and its rich morphology, it was anticipated that Arabic features would reduce the translation quality of this corpus-based approach. The systems’ output was evaluated using both automatic evaluation metrics including BLEU and hLEPOR, and TAUS human quality ranking criteria for adequacy and fluency.The study employed a black-box testing methodology to experimentally examine the TM systems through a test suite instrument and also to translate Arabic English sentences to collect the MT systems’ output. A translation threshold was used to evaluate the fuzzy matches of TM systems, while an online survey was used to collect participants’ responses to the quality of MT system’s output. The experiments’ input of both systems was extracted from ArabicEnglish corpora, which was examined by means of quantitative data analysis. The results show that, when retrieving translations, the current TM matching metrics are unable to recognise Arabic features and score them appropriately. In terms of automatic translation, MT produced good results for adequacy, especially when translating from Arabic to English, but the systems’ output appeared to need post-editing for fluency. Moreover, when retrievingfrom Arabic, it was found that short sentences were handled much better by MT than by TM. The findings may be given as recommendations to software developers

    Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

    Get PDF
    Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

    Analyzing Qualitative Data with MAXQDA

    Get PDF
    “To begin at the beginning” is the opening line of the play Under Milk Wood by Welsh poet Dylan Thomas. So, we also want to start here at the beginning and start with some information about the history of the analysis software MAXQDA. This story is quite long; it begins in 1989 with a first version of the software, then just called “MAX,” for the operating system DOS and a book in the German language. The book’s title was Text Analysis Software for the Social Sciences. Introduction to MAX and Textbase Alpha written by Udo Kuckartz, published by Gustav Fischer in 1992. Since then, there have been many changes and innovations: technological, conceptual, and methodological. MAXQDA has its roots in social science methodology; the original name MAX was reference to the sociologist Max Weber, whose methodology combined quantitative and qualitative methods, explanation, and understanding in a way that was unique at the time, the beginning of the twentieth century. Since the first versions, MAX (later named winMAX and MAXQDA) has always been a very innovative analysis software. In 1994, it was one of the first programs with a graphical user interface; since 2001, it has used Rich Text Format with embedded graphics and objects. Later, MAXQDA was the first QDA program (QDA stands for qualitative data analysis) with a special version for Mac computers that included all analytical functions. Since autumn 2015, MAXQDA has been available in almost identical versions for Windows and Mac, so that users can switch between operating systems without having to familiarize themselves with a new interface or changed functionality. This compatibility and feature equality between Mac and Windows versions is unique and greatly facilitates team collaboration. MAXQDA has also come up with numerous innovations in the intervening years: a logically and very intuitively designed user interface, very versatile options for memos and comments, numerous visualization options, the summary grid as a middle level of analysis between primary data and categories, and much more, for instance, transcription, geolinks, weight scores for coding, analysis of PDF files, and Twitter analysis. Last but not least, the mixed methods features are worth mentioning, in which MAXQDA has long played a pioneering role. This list already shows that today MAXQDA is much more than text analysis software: the first chapter of this book contains a representation of the data types that MAXQDA can analyze today (in version 2018) and shows which file formats can be processed. The large variety of data types is contrasted by an even greater number o

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields

    e-Proceedings of the 5th International Conference on Linguistics, Literature and Culture (ICLLIC) 2019: Change and Preservation in Language and Culture in Asia

    Get PDF
    The present e-proceedings “Change and Preservation in Language and Culture in Asia” has been made possible thanks to the commitment of individuals who contributed much time and energy in assisting with a number of technical matters from the beginning until the final stages of the publication process. We would also like to express our appreciation for the contribution of the English Language Studies section and administrative staff of the School of Humanities for assisting with various matters. Finally, thanks must be given to the authors of the extended abstracts in this publication for their willingness to share the findings of their work in progress with other academics and researchers in the areas of language, linguistics and cultur
    corecore