1,105 research outputs found

    Mandarin Chinese Teacher Education Issues and solutions

    Get PDF
    Mandarin Chinese is the most widely spoken language in the world, and in a rapidly globalizing environment, speaking it is an increasingly important skill for young people in the UK. 'Mandarin Chinese Teacher Education' stems from the work of the UCL Institute of Education Confucius Institute, which supports the development of Mandarin Chinese as a language on offer in schools as part of the mainstream curriculum. This edited collection brings together researchers, teachers involved in action research and student-teachers, in an effort to address the current lack of literature specifically aimed at supporting Chinese language teachers. It features: • practical ideas for teachers of Chinese to implement in their own classrooms • evaluation of differing strategies and approaches unique to teaching Chinese • examples of using action research to help teachers reflect on their own practice while informing practice across the discipline. The book will be useful for PGCE Mandarin students, teacher trainers and those involved in the development of Mandarin Chinese in schools across the UK and further afield

    A Rule-based Methodology and Feature-based Methodology for Effect Relation Extraction in Chinese Unstructured Text

    Get PDF
    The Chinese language differs significantly from English, both in lexical representation and grammatical structure. These differences lead to problems in the Chinese NLP, such as word segmentation and flexible syntactic structure. Many conventional methods and approaches in Natural Language Processing (NLP) based on English text are shown to be ineffective when attending to these language specific problems in late-started Chinese NLP. Relation Extraction is an area under NLP, looking to identify semantic relationships between entities in the text. The term “Effect Relation” is introduced in this research to refer to a specific content type of relationship between two entities, where one entity has a certain “effect” on the other entity. In this research project, a case study on Chinese text from Traditional Chinese Medicine (TCM) journal publications is built, to closely examine the forms of Effect Relation in this text domain. This case study targets the effect of a prescription or herb, in treatment of a disease, symptom or body part. A rule-based methodology is introduced in this thesis. It utilises predetermined rules and templates, derived from the characteristics and pattern observed in the dataset. This methodology achieves the F-score of 0.85 in its Named Entity Recognition (NER) module; 0.79 in its Semantic Relationship Extraction (SRE) module; and the overall performance of 0.46. A second methodology taking a feature-based approach is also introduced in this thesis. It views the RE task as a classification problem and utilises mathematical classification model and features consisting of contextual information and rules. It achieves the F-scores of: 0.73 (NER), 0.88 (SRE) and overall performance of 0.41. The role of functional words in the contemporary Chinese language and in relation to the ERs in this research is explored. Functional words have been found to be effective in detecting the complex structure ER entities as rules in the rule-based methodology

    Addressing the grammar needs of Chinese EAP students: an account of a CALL materials development project

    Get PDF
    This study investigated the grammar needs of Chinese EAP Foundation students and developed electronic self-access grammar materials for them. The research process consisted of three phases. In the first phase, a corpus linguistics based error analysis was conducted, in which 50 student essays were compiled and scrutinized for formal errors. A tagging system was specially devised and employed in the analysis. The EA results, together with an examination of Foundation tutors’ perceptions of error frequency and gravity led me to prioritise article errors for treatment; in the second phase, remedial materials were drafted based on the EA results and insights drawn from my investigations into four research areas (article pedagogy, SLA theory, grammar teaching approaches and CALL methodologies) and existing grammar materials; in the third phase, the materials were refined and evaluated for their effectiveness as a means of improving the Chinese Foundation students’ use of the article. Findings confirm the claim that L2 learner errors are systematic in nature and lend support to the value of Error Analysis. L1 transfer appears to be one of the main contributing factors in L2 errors. The salient errors identified in the Chinese Foundation corpus show that mismanagement of the article system is the most frequent cause of grammatical errors; Foundation tutors, however, perceive article errors to be neither frequent nor serious. An examination of existing materials reveals that the article is given low priority in ELT textbooks and treatments provided in pedagogical grammar books are inappropriate in terms of presentation, language and exercise types. The devised remedial materials employ both consciousness-raising activities and production exercises, using EAP language and authentic learner errors. Preliminary evaluation results suggest that the EA-informed customised materials have the potential to help learners to perform better in proofreading article errors in academic texts

    Wiktionary: The Metalexicographic and the Natural Language Processing Perspective

    Get PDF
    Dictionaries are the main reference works for our understanding of language. They are used by humans and likewise by computational methods. So far, the compilation of dictionaries has almost exclusively been the profession of expert lexicographers. The ease of collaboration on the Web and the rising initiatives of collecting open-licensed knowledge, such as in Wikipedia, caused a new type of dictionary that is voluntarily created by large communities of Web users. This collaborative construction approach presents a new paradigm for lexicography that poses new research questions to dictionary research on the one hand and provides a very valuable knowledge source for natural language processing applications on the other hand. The subject of our research is Wiktionary, which is currently the largest collaboratively constructed dictionary project. In the first part of this thesis, we study Wiktionary from the metalexicographic perspective. Metalexicography is the scientific study of lexicography including the analysis and criticism of dictionaries and lexicographic processes. To this end, we discuss three contributions related to this area of research: (i) We first provide a detailed analysis of Wiktionary and its various language editions and dictionary structures. (ii) We then analyze the collaborative construction process of Wiktionary. Our results show that the traditional phases of the lexicographic process do not apply well to Wiktionary, which is why we propose a novel process description that is based on the frequent and continual revision and discussion of the dictionary articles and the lexicographic instructions. (iii) We perform a large-scale quantitative comparison of Wiktionary and a number of other dictionaries regarding the covered languages, lexical entries, word senses, pragmatic labels, lexical relations, and translations. We conclude the metalexicographic perspective by finding that the collaborative Wiktionary is not an appropriate replacement for expert-built dictionaries due to its inconsistencies, quality flaws, one-fits-all-approach, and strong dependence on expert-built dictionaries. However, Wiktionary's rapid and continual growth, its high coverage of languages, newly coined words, domain-specific vocabulary and non-standard language varieties, as well as the kind of evidence based on the authors' intuition provide promising opportunities for both lexicography and natural language processing. In particular, we find that Wiktionary and expert-built wordnets and thesauri contain largely complementary entries. In the second part of the thesis, we study Wiktionary from the natural language processing perspective with the aim of making available its linguistic knowledge for computational applications. Such applications require vast amounts of structured data with high quality. Expert-built resources have been found to suffer from insufficient coverage and high construction and maintenance cost, whereas fully automatic extraction from corpora or the Web often yields resources of limited quality. Collaboratively built encyclopedias present a viable solution, but do not cover well linguistically oriented knowledge as it is found in dictionaries. That is why we propose extracting linguistic knowledge from Wiktionary, which we achieve by the following three main contributions: (i) We propose the novel multilingual ontology OntoWiktionary that is created by extracting and harmonizing the weakly structured dictionary articles in Wiktionary. A particular challenge in this process is the ambiguity of semantic relations and translations, which we resolve by automatic word sense disambiguation methods. (ii) We automatically align Wiktionary with WordNet 3.0 at the word sense level. The largely complementary information from the two dictionaries yields an aligned resource with higher coverage and an enriched representation of word senses. (iii) We represent Wiktionary according to the ISO standard Lexical Markup Framework, which we adapt to the peculiarities of collaborative dictionaries. This standardized representation is of great importance for fostering the interoperability of resources and hence the dissemination of Wiktionary-based research. To this end, our work presents a foundational step towards the large-scale integrated resource UBY, which facilitates a unified access to a number of standardized dictionaries by means of a shared web interface for human users and an application programming interface for natural language processing applications. A user can, in particular, switch between and combine information from Wiktionary and other dictionaries without completely changing the software. Our final resource and the accompanying datasets and software are publicly available and can be employed for multiple different natural language processing applications. It particularly fills the gap between the small expert-built wordnets and the large amount of encyclopedic knowledge from Wikipedia. We provide a survey of previous works utilizing Wiktionary, and we exemplify the usefulness of our work in two case studies on measuring verb similarity and detecting cross-lingual marketing blunders, which make use of our Wiktionary-based resource and the results of our metalexicographic study. We conclude the thesis by emphasizing the usefulness of collaborative dictionaries when being combined with expert-built resources, which bears much unused potential

    How definite are we about the English article system? Chinese learners, L1 interference and the teaching of articles in English for academic purposes programmes.

    Get PDF
    Omission and overspecification of the/a/an/Ø are among the most frequently occurring grammatical errors made in English academic writing by Chinese first language (L1) university students (Chuang & Nesi, 2006; Lee & Chen, 2009). However, in the context of competing demands in the English for academic purposes (EAP) syllabus and conflicting evidence about the effectiveness of error correction, EAP tutors are often unsure about whether article use should or could be a focus and whether such errors should be corrected or ignored. With the aim of informing pedagogy, this study investigates: whether explicit teaching or correction improves accuracy; which article uses present the most challenges for Chinese students; the causes of error and whether a focus on article form can be integrated within a modern genre based/student centred approach in EAP. First, a questionnaire survey investigates how EAP teachers in higher education explicitly teach or correct English article use. Second, the effect of explicit teaching and correction on English article accuracy is investigated in a longitudinal experiment with a control group. Analysis of this study’s post-study measures raise questions about the sustained benefits of written correction or decontextualised rule-based approaches. Third, findings are presented from a corpus-based study which includes an inductive and deductive analysis of the errors made by Chinese students. Finally, in a fourth study hypotheses are tested using a multiple-choice test (n=455) and the main findings are presented: 1) that general referential article accuracy is significantly affected by proficiency level, genre and students’ familiarity with the topic; 2) Chinese students are most challenged by generic and non-referential contexts of use which may be partly attributable to the lack of positive L1 transfer effects; 3) overspecification of definite articles is a frequent problem that sometimes gives Chinese B2 level students’ writing an ‘informal tone’; and 4) higher nominal density of pre-qualified noun phrases in academic writing is significantly associated with higher error rates. Several practical recommendations are presented which integrate an occasional focus on article form with whole text teaching, autonomous proofreading skills, register awareness, and genre-based approaches to EAP pedagogy

    The effect of etymological elaboration on L2 idiom acquisition and retention in an online environment (WebCT)

    Get PDF
    Although research on the effect of etymological elaboration (provision of information about a word\u27s origin and background in instruction) on L2 idiom acquisition has showed that it is a useful mnemonic approach that can help L2 learners retain target idioms (Boers, Demecheleer, Eyckmans, 2000, 2004, 2007 ), most previous studies were conducted in a pencil and paper based situation and few made use of computer technology and internet as a vehicle to deliver such an instructional approach. With a wide use of web-based learning tools (Moodle, WebCT, etc) in American universities, Research data on the online application of etymological elaboration and its effect are far from sufficient compared with those from classroom experiments. Therefore, a study on the actual effect of etymological elaboration in an online learning environment is necessary to supplement previous studies by providing more information about the effect of such an instructional approach in different media. This current study, grounded on cognitive learning theories and web-based learning framework, was designed in an online, autonomous learning manner. Specifically, it intended to measure L2 learners\u27 acquisition and retention of target idioms through two different instructional approaches, etymological elaboration and traditional rote learning, in an online learning management system (LMS), WebCT. Three research questions were addressed in the study: 1) Can online learning contribute to students\u27 L2 idiom acquisition? 2) Can an etymological elaboration approach be effective in facilitating students\u27 L2 idiom retention in an online environment? 3) What are the strengths and drawbacks of learning idioms online according to learners\u27 learning experience? Seventy Chinese sophomores in Anhui university, China, participated in the study and their productive and receptive knowledge of target idioms were measured through data collected from pretests, post-tests, delayed post-tests. In addition, an online questionnaire survey was distributed to the participants to look into their actual online learning experience. Results of the study indicated that while online learning was an effective way to facilitate L2 idiom acquisition for both instructional approaches, etymological elaboration did not produce an overall significant effect on the retention of target idioms over traditional rote learning approach except in the retention test of productive knowledge of target idioms

    Statistical Parsing by Machine Learning from a Classical Arabic Treebank

    Get PDF
    Research into statistical parsing for English has enjoyed over a decade of successful results. However, adapting these models to other languages has met with difficulties. Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order. Classical Arabic is the ancient form of Arabic, and is understudied in computational linguistics, relative to its worldwide reach as the language of the Quran. The thesis is based on seven publications that make significant contributions to knowledge relating to annotating and parsing Classical Arabic. Classical Arabic has been studied in depth by grammarians for over a thousand years using a traditional grammar known as i’rāb (إعغاة ). Using this grammar to develop a representation for parsing is challenging, as it describes syntax using a hybrid of phrase-structure and dependency relations. This work aims to advance the state-of-the-art for hybrid parsing by introducing a formal representation for annotation and a resource for machine learning. The main contributions are the first treebank for Classical Arabic and the first statistical dependency-based parser in any language for ellipsis, dropped pronouns and hybrid representations. A central argument of this thesis is that using a hybrid representation closely aligned to traditional grammar leads to improved parsing for Arabic. To test this hypothesis, two approaches are compared. As a reference, a pure dependency parser is adapted using graph transformations, resulting in an 87.47% F1-score. This is compared to an integrated parsing model with an F1-score of 89.03%, demonstrating that joint dependency-constituency parsing is better suited to Classical Arabic. The Quran was chosen for annotation as a large body of work exists providing detailed syntactic analysis. Volunteer crowdsourcing is used for annotation in combination with expert supervision. A practical result of the annotation effort is the corpus website: http://corpus.quran.com, an educational resource with over two million users per year

    Subtitling Cookery Programmes from English to Traditional Chinese: Taiwan as a Case Study

    Get PDF
    Within translation studies, the topic of food and translation is still underresearched and the understanding of subtitling cookery programmes is also very limited. This research hopes to bridge the current research gap, by providing an overview of the classification of food-related texts and their embedded linguistic characteristics, thus expanding the knowledge in regard to this matter. // The research also sets out to understand the nature of two specific linguistic items, sensory language and culture-specific references, as well as the translation strategies used in subtitling them. The methodological foundation of the thesis builds upon the framework of the Toury’s(1995) notion of translation norms, realised by Pedersen’s analytical model. In addition, the framework of product experience is also consulted to help contextualise sensory language and its classification. // The analysis is carried out on a corpus of 480 minutes which includes two cookery programmes and represents two formats: the modern format, Jamie’s 15-Minute Meals (Channel 4, 2012) and the cooking competition format, The Taste (Channel 4, 2014). All sensory language and culture-specific references present in the corpus have been identified and their translation from English into Traditional Chinese analysed from both quantitative and qualitative viewpoints in an attempt to reveal the prevalent translational trends
    corecore