586 research outputs found

    Toward a Corpus of Cantonese Verbal Comments and their Classification by Multi-dimensional Analysis

    Get PDF
    The information explosion in modern days across various media calls for effective opinion mining for timely digestion of public views and appropriate follow-up actions. Current studies on sentiment analysis have primarily focused on uncovering aspects like subjectivity, sentiment and credibility from written data, while spoken data are less addressed. This paper reports on our pilot work on constructing a corpus of Cantonese verbal comments and making use of multi-dimensional analysis to characterise different opinion types therein. Preliminary findings on the dimensions identified and their association with various communicative functions are presented, with an outlook on their potential application in subjectivity analysis and opinion classification.

    Multi-modal response generation.

    Get PDF
    Wong Ka Ho.Thesis submitted in: October 2005.Thesis (M.Phil.)--Chinese University of Hong Kong, 2006.Includes bibliographical references (leaves 163-170).Abstracts in English and Chinese.Abstract --- p.2Acknowledgements --- p.5Chapter 1 --- Introduction --- p.10Chapter 1.1 --- Multi-modal and Multi-media --- p.10Chapter 1.2 --- Overview --- p.11Chapter 1.3 --- Thesis Goal --- p.13Chapter 1.4 --- Thesis Outline --- p.15Chapter 2 --- Background --- p.16Chapter 2.1 --- Multi-modal Fission --- p.17Chapter 2.2 --- Multi-modal Data collection --- p.21Chapter 2.2.1 --- Collection Time --- p.21Chapter 2.2.2 --- Annotation and Tools --- p.21Chapter 2.2.3 --- Knowledge of Multi-modal Using --- p.21Chapter 2.3 --- Text-to-audiovisual Speech System --- p.22Chapter 2.3.1 --- Different. Approaches to Generate a Talking Heading --- p.23Chapter 2.3.2 --- Sub-tasks in Animating a Talking Head --- p.25Chapter 2.4 --- Modality Selection --- p.27Chapter 2.4.1 --- Rules-based approach --- p.27Chapter 2.4.2 --- Plan-based approach --- p.28Chapter 2.4.3 --- Feature-based approach --- p.29Chapter 2.4.4 --- Corpus-based approach --- p.30Chapter 2.5 --- Summary --- p.30Chapter 3 --- Information Domain --- p.31Chapter 3.1 --- Multi-media Information --- p.31Chapter 3.2 --- "Task Goals, Dialog Acts, Concepts and Information Type" --- p.32Chapter 3.2.1 --- Task Goals and Dialog Acts --- p.32Chapter 3.2.2 --- Concepts and Information Type --- p.36Chapter 3.3 --- User's Task and Scenario --- p.37Chapter 3.4 --- Chapter Summary --- p.38Chapter 4 --- Multi-modal Response Data Collection --- p.41Chapter 4.1 --- Data Collection Setup --- p.42Chapter 4.1.1 --- Multi-modal Input Setup --- p.43Chapter 4.1.2 --- Multi-modal Output Setup --- p.43Chapter 4.2 --- Procedure --- p.45Chapter 4.2.1 --- Precaution --- p.45Chapter 4.2.2 --- Recording --- p.50Chapter 4.2.3 --- Data Size and Type --- p.50Chapter 4.3 --- Annotation --- p.52Chapter 4.3.1 --- Extensible Multi-Modal Markup Language --- p.52Chapter 4.3.2 --- "Mobile, Multi-biometric and Multi-modal Annotation" --- p.53Chapter 4.4 --- Problems in the Wizard-of-Oz Setup --- p.56Chapter 4.4.1 --- Lack of Knowledge --- p.57Chapter 4.4.2 --- Time Deficiency --- p.57Chapter 4.4.3 --- Information Availability --- p.58Chapter 4.4.4 --- Operation Delay --- p.59Chapter 4.4.5 --- Lack of Modalities --- p.59Chapter 4.5 --- Data Optimization --- p.61Chapter 4.5.1 --- Precaution --- p.61Chapter 4.5.2 --- Procedures --- p.61Chapter 4.5.3 --- Data Size in Expert Design Responses --- p.63Chapter 4.6 --- Analysis and Discussion --- p.65Chapter 4.6.1 --- Multi-modal Usage --- p.67Chapter 4.6.2 --- Modality Combination --- p.67Chapter 4.6.3 --- Deictic term --- p.68Chapter 4.6.4 --- Task Goal and Dialog Acts --- p.71Chapter 4.6.5 --- Information Type --- p.72Chapter 4.7 --- Chapter Summary --- p.74Chapter 5 --- Text-to-Audiovisual Speech System --- p.76Chapter 5.1 --- Phonemes and Visemes --- p.77Chapter 5.2 --- Three-dimensional Facial Animation --- p.82Chapter 5.2.1 --- Three-dimensional (3D) Face Model --- p.82Chapter 5.2.2 --- The Blending Process for Animation --- p.84Chapter 5.2.3 --- Connectivity between Visemes --- p.85Chapter 5.3 --- User Perception Experiments --- p.87Chapter 5.4 --- Applications and Extension --- p.89Chapter 5.4.1 --- Multilingual Extension and Potential Applications --- p.89Chapter 5.5 --- Talking Head in Multi-modal Dialogue System --- p.90Chapter 5.5.1 --- Prosody --- p.93Chapter 5.5.2 --- Body Gesture --- p.94Chapter 5.6 --- Chapter Summary --- p.94Chapter 6 --- Modality Selection and Implementation --- p.98Chapter 6.1 --- Multi-modal Response Examples --- p.98Chapter 6.1.1 --- Single Concept-value Example --- p.99Chapter 6.1.2 --- Two Concept-values with Different Information Types --- p.102Chapter 6.1.3 --- Multiple Concept-values with Same Information Types Example --- p.103Chapter 6.2 --- Heuristic Rules for Modality Selection --- p.105Chapter 6.2.1 --- General Principles --- p.106Chapter 6.2.2 --- Heuristic rules --- p.107Chapter 6.2.3 --- Temporal Coordination for Synchronization --- p.109Chapter 6.2.4 --- Physical Layout --- p.110Chapter 6.2.5 --- Deictic Term --- p.111Chapter 6.2.6 --- Example --- p.111Chapter 6.3 --- Spoken Content Generation --- p.113Chapter 6.4 --- Chapter Summary --- p.115Chapter 7 --- Conclusions and Future Work --- p.117Chapter 7.1 --- Summary --- p.117Chapter 7.2 --- Contributions --- p.118Chapter 7.3 --- Future work --- p.119Chapter A --- XML Schema for M3 Markup Language --- p.123Chapter B --- M3ML Examples --- p.128Chapter C --- Domain-Specific Task Goals in the Hong Kong Tourism Do- main --- p.131Chapter D --- Dialog Acts for User Request in the Hong Kong Tourism Do- main --- p.133Chapter E --- Dialog Acts for System Response in the Hong Kong Tourism Domain --- p.137Chapter F --- Information Type and Concepts --- p.141Chapter G --- Concepts --- p.143Bibliography --- p.14

    Associates or zamestnanci? Language choice, attitudes and code-switching practices: The case of workplace email communication in Slovakia

    Full text link
    [ES] El cambio de co'digo esta' sujeto a una gran variedad de factores que dependen del medio de comunicacio'n y de la situacio'n comunicativa. En general, segu'n la lingu¿i'stica, el cambio de co'digo ocurre cuando un hablante alterna entre dos o ma's lenguas o variedades de una lengua en una misma conversacio'n. Estas pra'cticas comunicativas se han tratado en muchos contextos, lenguas y contacto entre culturas. Sin embargo, no hay estudios sobre el tema en el contexto eslovaco, de ahi' la relevancia del trabajo recogido en esta tesis doctoral que tiene como fin u'ltimo paliar esta escasez de estudios lingu¿i'sticos. En concreto, aqui' analizamos el cambio de co'digo exclusivamente en comunicaciones realizadas por correo electro'nico en un entorno laboral multilingu¿e y multicultural. El estudio se centra en la eleccio'n de lengua, las actitudes hacia una lengua y la mezcla de lenguas en las comunicaciones entre los compan¿eros de trabajo de una multinacional hotelera radicada en Eslovaquia. El ana'lisis examina u'nicamente los correos escritos en eslovaco que presentan cambios de co'digo al ingle's. Debido a la naturaleza multidisciplinar de este proyecto de investigacio'n, asi' como a su cara'cter dual, es decir, el examen de las actitudes hacia una lengua por una parte y las pra'cticas de cambio de co'digo por otra, esta tesis plantea varias preguntas de investigacio'n y tiene una serie de objetivos que pasamos a detallar. El objetivo principal del estudio cuantitativo basado en el cuestionario disen¿ado es examinar hasta que' punto los encuestados son conscientes de los cambios de co'digo al ingle's durante sus comunicaciones, en particular durante sus conversaciones electro'nicas, y determinar las razones por las que se lleva a cabo este cambio de co'digo, sacando a la luz las actitudes hacia este feno'meno lingu¿i'stico. Adema's, dado que los conocimientos que aportan los cuestionarios son generalmente limitados, se ha llevado a cabo un ana'lisis del discurso para observar ma's detalladamente el alcance de los cambios de lengua, los tipos y las funciones que presentan. La metodologi'a empleada, que sigue el me'todo mixto de investigacio'n, se utiliza para analizar las motivaciones y las razones por las que nuestros participantes prefieren usar el ingle's en lugar de su lengua nativa. Asi' pues, esta tesis doctoral recoge el primer ana'lisis completo de este tipo sobre el cambio de co'digo eslovaco/ingle's en la comunicacio'n electro'nica que examina conversaciones aute'nticas por Internet dentro de una corporacio'n. Palabras clave: cambio de co'digo, comunicacio'n electro'nica, correo electro'nico, comunicacio'n en entornos laborales, actitudes hacia la lengua[CA] El canvi de codi esta¿ subjecte a una gran varietat de factors que depenen del mitja¿ de comunicacio' i de la situacio' comunicativa. En general, segons la lingu¿i'stica, el canvi de codi ocorre quan un parlant alterna entre dues o me's llengu¿es o varietats d'una llengua en una mateixa conversa. Aquestes pra¿ctiques comunicatives han estat tractades en molts contexts, llengu¿es i contacte entre cultures. No obstant aixo¿, no hi ha estudis sobre el tema en el context eslovac, d'aqui' la relleva¿ncia del treball recollit en aquesta tesi doctoral que te' com a finalitat u'ltima pal.liar l'escassetat d'estudis lingu¿i'stics sobre el tema. En concret, aci' analitzem el canvi de codi exclusivament en comunicacions realitzades per correu electro¿nic en un entorn laboral multilingu¿e i multicultural. L'estudi se centra en l'eleccio' de llengua, les actituds cap a una llengua i la mescla de llengu¿es en les comunicacions entre els companys de treball d'una multinacional hotelera radicada a Eslova¿quia. L'ana¿lisi examina u'nicament els correus escrits en eslovac que presenten canvis de codi a l'angle's. A causa de la naturalesa multidisciplina¿ria d'aquest projecte d'investigacio', aixi' com al seu cara¿cter dual, e's a dir, l'examen de les actituds cap a una llengua per una part i les pra¿ctiques de canvi de codi per altra, aquesta tesi planteja diverses preguntes d'investigacio' i te' una se¿rie d'objectius que detallarem a continuacio'. L'objectiu principal de l'estudi quantitatiu basat en el qu¿estionari dissenyat e's examinar fins a quin punt les persones enquestades so'n conscients del canvi de codi a l'angle's durant les seues comunicacions, en particular durant les seues converses electro¿niques, i determinar les raons per les quals es duu a terme aquest canvi de codi, traient a la llum les actituds cap a aquest fenomen lingu¿i'stic. A me's, ate's que els coneixements que aporten els qu¿estionaris so'n generalment limitats, s'ha realitzat una ana¿lisi del discurs per a observar me's detalladament l'abast dels canvis de llengua, els tipus i les funcions que representen. La metodologia emprada, que segueix el me¿tode mixt d'investigacio', s'utilitza per a analitzar les motivacions i les raons per les quals els nostres participants prefereixen fer u's de l'angle's en comptes de la seua llengua nativa. Per tant, aquesta tesi doctoral recull la primera ana¿lisi completa d'aquest tipus sobre el canvi de codi eslovac/angle's en la comunicacio' electro¿nica que examina converses aute¿ntiques per Internet dins d'una corporacio'. Paraules clau: canvi de codi, comunicacio' electro¿nica, correu electro¿nic, comunicacio' en entorns laborals, actituds cap a la llengua.[EN] Code-switching (CS) is subject to the wide range of interrelations between medium and situation factors. Generally, from a linguistic point of view, CS occurs when a speaker alternates between two or more languages, or language varieties, in the course of a single conversation. The practice has been noticed all around the world in many contexts, language and culture contact situations. Hence, based on earlier studies of CS phenomenon, but shifting towards a more specific environment, the workplace, the present study aims to fill a considerable gap in scholarly knowledge about the online/ written CS practices of Slovak native speakers in the context of workplace email communication. Therefore, the study focuses on language choice, language attitudes and CS practices among colleagues in a multilingual workplace environment of a multinational hospitality company in Slovakia, focusing solely on the participants' workplace interactions, in particular their email messages written in Slovak (the national language) with switches to English. Due to the interdisciplinary nature of this research project, as well as its dual focus on language attitudes on the one hand and actual CS practices on the other, this thesis addresses a number of research questions and provides a series of analyses centring around the following objectives. The main focus of the quantitative, questionnaire- based study is to examine the participants' metalinguistic awareness of the extent of switching to English during their communication (particularly focusing on their CMC interactions) and to determine their reasons for doing so, while uncovering the attitudes they hold towards this phenomenon. Furthermore, as the depth of knowledge obtained through questionnaire survey is limited, the corpus analysis of email interactions is conducted in order to investigate more closely the extent of switching and the types, forms and functions of CS involved. Employing a mixed method approach in the process, motivations and reasons why our participants choose English over their native language are examined. Hence, the study represents the first comprehensive analysis of its kind on Slovak-English CS in CMC using authentic naturally-occurring computer-mediated corporate interactions. Keywords: code-switching, CMC, email, workplace communication, attitudesThe traineeship was partly funded by Erasmus+ grant, for which I am grateful.Lengyelová, A. (2019). Associates or zamestnanci? Language choice, attitudes and code-switching practices: The case of workplace email communication in Slovakia [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/124352TESI

    Natural Language Processing: Emerging Neural Approaches and Applications

    Get PDF
    This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains

    Corpus-Based Research on Chinese Language and Linguistics

    Get PDF
    This volume collects papers presenting corpus-based research on Chinese language and linguistics, from both a synchronic and a diachronic perspective. The contributions cover different fields of linguistics, including syntax and pragmatics, semantics, morphology and the lexicon, sociolinguistics, and corpus building. There is now considerable emphasis on the reliability of linguistic data: the studies presented here are all grounded in the tenet that corpora, intended as collections of naturally occurring texts produced by a variety of speakers/writers, provide a more robust, statistically significant foundation for linguistic analysis. The volume explores not only the potential of using corpora as tools allowing access to authentic language material, but also the challenges involved in corpus interrogation, analysis, and building

    Development of Mandarin by English-Mandarin bilingual children in a Chinese childcare centre : the role of input

    Get PDF
    This study investigates the development of Mandarin in English-Mandarin bilingual children compared to their monolingual peers in a childcare centre in mainland China. It also examines the qualitative and quantitative aspects of input of the bilingual children’s Mandarin development. Many previous studies examined bilingual children acquiring two languages in one-parent-one-language input conditions (e.g. De Houwer, 1990; Döpke, 2000; Meisel, 1990a; Paradis & Genesee, 1996; Ronjat, 1913; Silva-Corvalán, 2014; Thordardottir, 2014; Unsworth, 2014; Volterra & Taeschner, 1978), and most focused on children developing two Indo-European languages (cf. W Li, 2010; Yip & Matthews, 2007). However, the most common situation of children growing up bilingually takes place in immigration contexts where they are exposed to one language one environment mode (Qi, Di Biase, & Campbell, 2006). To date, this type of bilingual development has not received much attention. In recent years a growing number of native English-speaking people came to work or study in mainland China and their children became bilingual in context-bound one-language-one-environment situations, similar to most other children growing up in immigrant families. This means these children acquire English at home and Mandarin elsewhere e.g., at childcare centres. The effect of teachers’ input at childcare centres on the mainstream language development of bilingual children has rarely been studied. Research questions thus follow: a) How do these bilingual children develop their Mandarin in the childcare centres? b) What role does input from teachers play in these children’s Mandarin development? In order to address these questions, I carried out a multiple case study on seven three-to-five-year-old English-Mandarin bilingual children in a Chinese childcare centre. The main data includes a corpus of recordings of speech produced by the bilingual children, their monolingual peers and their teachers in the same childcare centre in the course of various activities over a four-month school term. I also collected supplementary data supplied by parents and teachers of the bilingual children, including questionnaires and interviews with them. In addition, some elicited comprehension and production tasks were proposed to bilingual children and their monolingual peers to complement the corpus data. Two linguistic domains, namely, Mandarin noun classifier and prepositional phrase (PP) with zài ‘at’, are targeted for investigation as they manifest significant typological differences from the bilingual children’s other language –English. For each domain of investigation, three types of comparison between bilingual children and their monolingual peers are focused on: (a) whether there exists different or similar patterns of acquisition, (b) whether different input conditions result in different patterns of acquisition, and (c) whether the input from teachers influences acquisition pathways in these two domains. Results reveal that bilingual children show a pattern similar to their monolingual peers in the acquisition of Mandarin noun classifiers. Both bilinguals and monolinguals predominantly use (and overuse) the general classifier gè while use of specific classifiers is rare. Children in either group rarely omit an obligatory classifier. Bilingual children’s patterns of classifier acquisition are not as variable as their input, measured by the cumulative length of Mandarin exposure (CLME). Moreover, the pattern of teachers’ use of classifiers appears to significantly influence both bilingual and monolingual children’s acquisition within this domain. Firstly, that specific classifiers are quite rarely used by children is reflected in the input from teachers. Secondly, teachers never omit a classifier when it is obligatory, which may help children to know the obligatory use in practice. Thirdly, cases of children’s overuse of the general classifier gè have also been found in teachers’ productions, although the rate of overusing is much lower than that of the children. However, in the domain of Mandarin locative PP headed by zài ‘at’, it is found that bilingual children follow a different pattern compared to their monolingual peers. Bilinguals show a strong preference for postverbal locative PP with zài, but preverbal and postverbal zài-PPs are equally divided in monolinguals. Moreover, about half of the postverbal zài-PP utterances produced by bilinguals are non-target. In sharp contrast, non-target postverbal zài-PP sentences were not observed in the monolingual children’s productions. Comparison among the bilinguals found that although children with a larger amount of Mandarin exposure generally develop to a more advanced stage than others, those who use Mandarin more often do not lag behind even when their exposure time is less than others. Results from the analysis of teachers’ input show that teachers’ frequency of use of postverbal zài-PP may influence the children’s production in this domain. However, teachers’ use of zai-PPs has consistently shown to be target-like. The results show that there is a possibility that bilingual children’s non-target placement of zài-PPs reflects cross-linguistic influence from the structure of the English prepositional phrase. Findings from this research will offer new insights about language contact and interaction in bilingual development. They will also shed light on the nature of input in the challenging aspects of bilingual children’s linguistic development

    Authorial Identity of Non-Native Writers of Academic English in the ‘Soft Sciences’: An Analysis of Textographies and Interactional Resources

    Get PDF
    [ES] La publicación científica consolida conocimientos y habilidades de diferentes áreas, es decir, la disciplina, la retórica, el registro, el género, las habilidades de investigación y los procesos de publicación (Bozu y Canto Herrera, 2009; Charlotte y Irwin, 2019; Fazel, 2013). La integración de estos elementos hace que la publicación académica sea un esfuerzo desafiante, especialmente cuando se realiza en inglés como segunda lengua o lengua extranjera. En España, se requiere de los profesores universitarios la producción de publicaciones científicas y, a menudo, este requisito solo puede cumplirse produciendo textos en inglés. Sin embargo, el conocimiento y las habilidades de los académicos a veces se han dado por sentado (Bräuer, 2012; Natale, 2013, Carlino, 2004) y se han realizado escasas investigaciones para indagar los patrones retóricos que los académicos muestran en sus publicaciones (Novelo Atwood, 2019 o Getkham, 2013 son dos excepciones). Con base en la premisa de que cada escrito contiene manifestaciones de las necesidades, intereses y objetivos del autor individual (Ivanič y Moss, 2004), este proyecto de tesis se centra en explorar la identidad del autor. En este estudio, analizo cómo los profesores universitarios de las ‘ciencias blandas’ expresan su identidad autoral en sus publicaciones en inglés. A partir de las diversas concepciones de identidad autoral, propongo una estrategia metodológica que enfatiza la interacción con el lector (Thompson y Thetela, 1995; Thompson, 1996, 2001, 2004) y las historias profesionales de los participantes (Dressen- Hammouda, 2014; Swales, 1998) y lo aplico a un corpus de textos publicados en inglés. El corpus consta de 70 textos de los campos de: Lingüística, Estudios Culturales, Bibliometría, Filosofía, Psicología, Educación y Economía. La estrategia metodológica incluye el análisis de dos recursos interaccionales: Hipotético-Real y Concesión y tiene como objetivo facilitar el análisis de la identidad del autor en relación con el contexto socio-profesional de los participantes. El análisis permitió obtener el conjunto de selecciones de recursos interaccionales realizadas por los autores y relacionarlo con sus textografías. Como resultado, se generó una categorización de los recursos interaccionales y se estableció una correlación entre el uso de estos recursos y las experiencias de inmersión total de los participantes en países de habla inglesa. Finalmente, incluyo algunas implicaciones pedagógicas para la escritura académica L2, sugiriendo ayudar a los escritores novatos a ser conscientes de la amplia gama de opciones disponibles para manifestar sus identidades autorales al interactuar con sus lectores potenciales
    corecore