151 research outputs found

    Verb-object constructions in Mandarin : a comparison with Ewe

    Get PDF

    Exploring Chinese Verbal Lexicon Developmental Trend with Semantic Space

    Get PDF

    Exploring Methods for Building Dialects-Mandarin Code-Mixing Corpora: A Case Study in Taiwanese Hokkien

    Full text link
    In natural language processing (NLP), code-mixing (CM) is a challenging task, especially when the mixed languages include dialects. In Southeast Asian countries such as Singapore, Indonesia, and Malaysia, Hokkien-Mandarin is the most widespread code-mixed language pair among Chinese immigrants, and it is also common in Taiwan. However, dialects such as Hokkien often have a scarcity of resources and the lack of an official writing system, limiting the development of dialect CM research. In this paper, we propose a method to construct a Hokkien-Mandarin CM dataset to mitigate the limitation, overcome the morphological issue under the Sino-Tibetan language family, and offer an efficient Hokkien word segmentation method through a linguistics-based toolkit. Furthermore, we use our proposed dataset and employ transfer learning to train the XLM (cross-lingual language model) for translation tasks. To fit the code-mixing scenario, we adapt XLM slightly. We found that by using linguistic knowledge, rules, and language tags, the model produces good results on CM data translation while maintaining monolingual translation quality.Comment: The paper was accepted by EMNLP 2022 finding

    Measuring the Semantic Specificity in Mandarin Verbs: A Corpus-based Quantitative Survey

    Get PDF
    [[abstract]]The purpose of this thesis is to study semantic specificity in Chinese based on corpus-based statistical and computational methods. The analysis begins with single verbs and does primitive tests with resultative verb compounds in Chinese. The verbs studied in this work include one hundred and fifty head verbs collected in the M3 project. As a prerequisite, these one hundred and fifty head verbs were tagged as generic or specific type following the three criteria proposed in literatures: the specification of agent/instrument, the limitation of objects and their types, and the confinement on the action denotation to only physical action. The next step is to measure semantic specificity with quantitative data. To specify the use of verbs by statistics, it relies on counting the frequency, the number of senses of a verb and the range of co-occurrence objects. Two major analyses, Principle Component Analysis (PCA) and Multinomial Logistic Model, are adopted to assess the predictive power of variables and to predict the probability of different verb categories. In addition, the vector-based model in Latent Semantic Analysis (LSA) is applied to justify the concept of semantic specificity. A distributional model based on Academia Sinica Balanced Corpus (ASBC) with LSA is built to investigate the semantic space variation depending on the semantic specificity. By measuring the vector distance, the semantic similarity between words is calculated. The word-space model is used to measure the semantic loads of single verbs and explore the semantic information on Chinese resultative verb compounds (RVCs).

    Corpus-Based Research on Chinese Language and Linguistics

    Get PDF
    This volume collects papers presenting corpus-based research on Chinese language and linguistics, from both a synchronic and a diachronic perspective. The contributions cover different fields of linguistics, including syntax and pragmatics, semantics, morphology and the lexicon, sociolinguistics, and corpus building. There is now considerable emphasis on the reliability of linguistic data: the studies presented here are all grounded in the tenet that corpora, intended as collections of naturally occurring texts produced by a variety of speakers/writers, provide a more robust, statistically significant foundation for linguistic analysis. The volume explores not only the potential of using corpora as tools allowing access to authentic language material, but also the challenges involved in corpus interrogation, analysis, and building

    A reevaluation of so-called passive constructions in ancient Chinese : from Pre-Qin to the Han dynasty

    Get PDF
    While there have been written many linguistic studies on the passive voice in Chinese, many aspects of this field of research have remained controversial, such as the emergence of various constructions, their exact syntactic, semantic, and pragmatic features, as well as the question from which period onward we can talk about a “mature” passive (i.e., passive voice). Three main opinions are presented in current scholarship. Ma, in a pioneering work from 1898 (reprinted in 2007: 160), defined the Chinese passive construction as a construction with “a patient appearing in the subject position” without clearly defining the “subject” or discussing the construction (外动字之行,有施有受。受者居宾次,常也。如受者居主次,则为受动字,明其以受者为主也。). Much later, Gao 1949 (reprinted in 2011: 226-227) argued that none of the explanations that have been provided in scholarship so far validated the assumption that the constructions could be treated similarly to the passive voice found in many western languages (汉语具有动词功能的词,实在并没有施动和受动的分别), while other recent studies have labeled the Chinese structures that had overt syntactic markers as passive structures. In order to contribute to this fundamental and long-lasting scholarly debate, this comprehensive study provides a review of the diachronic development of the so-called Chinese passive from the pre-Qin era to the end of the Han dynasty. Part 1 reviews the studies of passive in Chinese and also introduces the definition of passive in a cross-linguistic perspective. Especially, some relevant terminology, in particular, “passive sense”, “passive voice”, “passive function” and “passive construction”, are distinguished in order to better understand the passive in Ancient Chinese. Meanwhile, three important factors that could trigger a passive interpretation in Ancient Chinese are introduced as a general background of this dissertation. Part 2 examines two types of notional passive (i.e., PV construction) in Ancient Chinese, i.e., Type 1 and Type 2. It is found that most notional passives were in fact the intransitive use of labile verbs (i.e., Type 1) that could only be interpreted as a passive depending on the context. Meanwhile, in some special contexts, a few verbs with strong transitive features are also found in the notional passive construction (i.e., Type 2), which is rarely observed cross-linguistically. Type 2 should be understood as a special situation of Type 1 in which the event expressed by the verb is not likely to occur spontaneously. Part 3 focuses on the diachronic development of the four lexical items traditionally regarded as “passive markers”: jian见, bei 被, wei 为and yu于, and concludes that all are ambiguous for both passive and non-passive interpretations, since a passive interpretation is determined by the context rather than by these markers themselves, which were also used in active sentences and could also be assembled to constitute new structures and variations. Therefore, it was concluded that there was no consistent syntactic marker that specifically expressed the passive voice in Ancient Chinese. Part 4 examines whether the ke construction was a passive construction in Archaic Chinese by reviewing the formation of the ke (and ke yi) constructions, as well as the nan (yi), yi (yi) and zu (yi) constructions. It was concluded that these were more likely to be interpreted as serial verb constructions with deontic modality and a generic reading with middle characteristics that possibly also expressed a passive meaning. However, it was concluded that ke, nan, yi and zu could not justifiably be defined as passive markers. Part 5 concludes that in Chinese it is important to differentiate between the passive voice and a passive sense. From a translation perspective, some so-called passive structures were found to express passive meanings and were translated as such into English and other languages. However, as the passive meaning appeared to be pragmatically rather than syntactically determined, none of the alleged passives in Ancient Chinese can be qualified as passive voice in accordance with a syntactic definition of passive. In general, the degree of grammaticalization of the passive markers in Archaic Chinese was quite low and they are better explained from a functional grammar viewpoint rather than a transformational generative grammar perspective

    Verbal attacks in Taiwan's political talk shows

    Get PDF
    [[abstract]]The phenomenon of political talk show is a hotly discussed, debated, and often criticized issue. Previous studies have pointed out that entertainment and confrontation are two main features of the political talk show. However, these studies did not probe into how entertainment or confrontation is achieved by linguistic devices in discourse. This present paper is a data-driven study of how the participants in political talk shows employ verbal attack to degrade others and create entertaining effects. We chose three political talk shows in Taiwan as our databank: 2100 People All Talk (2100 全民開講), News Night Club (新聞夜總會), and News Google (新聞孤狗). The linguistic tokens for verbal aggression in the three programs are collected and then analyzed. The result showed that verbal attack tokens covered almost all levels of linguistics, including phonology, morphology, lexicology, syntax, semantics, and pragmatics. These diversified verbal attacks not only enable the host and guests to denigrate their opponents’ ability and personality but also build conversational humor and reinforce in-group solidarity. In terms of the frequency of verbal attack, the three talk shows did not show salient difference, but they were different when it comes to the quality and style of the verbal vilification. In general, News Night Club was found to be more humorous, more diversified and less formal in language use, while the verbal attacks in 2100 People All Talk were more direct. Additionally, the atmosphere in 2100 People All Talk was more formal and conflicting. For example, the same linguistic strategy made by the same speaker in News Night Club created collected laughter while the effect of humor did not show up in 2100 People All Talk. We also found out that talk shows which have similar political bias do not necessarily reveal similar styles of rhetorical devices. Although News Night Club and 2100 People All Talk are both anti-DPP talk shows, their styles in terms of verbal attack were not the same. At last, the quality and quantity of verbal attack in News Google are usually between News Night Club and 2100 People All Talk.

    Processing Filler-Gap Dependencies in Mandarin Chinese: An Effect of Language Exposure?

    Full text link
    This study investigates how speakers of Mandarin Chinese process filler-gap dependencies in potentially ambiguous fronted wh-questions. The study recruited native speakers of Mandarin with different degrees of English proficiency. In the experiment, participants were first presented with a wh-in-situ question and then a wh-ex-situ alteration of it that has the wh-phrase fronted to the beginning of the sentence. Participants were asked to judge and rate whether the two sentences could express a similar meaning or not. The results show that the movement of the wh-phrase zai nali (‘where’) is generally accepted by Mandarin speakers, despite Mandarin being a wh-in-situ language by default, and that this movement is licensed by the focus marker shi (which can be deleted at PF). It also hints that Mandarin speakers might be in favor of an active filler strategy that has been found cross-linguistically. The findings also suggest that language exposure (English) could affect one’s acceptability judgments under the assumption that there is in fact a shared syntax available to both languages

    A historical and sociolinguistic approach to language change in Mandarin Chinese: Corpus evidence for the development of YOU-MEI-YOU

    Get PDF
    This dissertation introduces corpus-based analyses of a syntactic construction in Standard Mandarin, YOU-MEI-YOU (or ‘have-not-have’)+VP, which is used to form perfective questions. The purpose of the study is to (i) find evidence for the claim that preverbal YOU-MEI-YOU, i.e. YOU-MEI-YOU found in the new construction, is grammaticalizing into an auxiliary unit, and (ii) to investigate its historical development, including the stage of development that it has reached and its distribution over time. Using data from two databases, the present study first looks at the percentage of preverbal YOU-MEI-YOU conveying a certain grammatical meaning, i.e. sentence type and aspect. Next, the study compares the percentage of three linguistic features of this construction, namely, the grammatical meaning(s) conveyed by preverbal YOU-MEI-YOU, the general types of complement it takes, and the specific types of VP complement it takes, between different 20-year periods. The study also makes a comparison of the frequency of use of preverbal YOU-MEI-YOU between different 10-year periods. The results of the first type of analysis show that preverbal YOU-MEI-YOU helps to form constructions conveying either grammatical meaning in the majority of the clauses, lending support to the claim that it is grammaticalizing into an auxiliary unit. The diachronic comparisons of the three features of the new construction indicate that preverbal YOU-MEI-YOU has reached Stage III as outlined in Heine (1993). The comparison of the frequency of use between different time periods shows no upward trend in the use of (auxiliary) preverbal YOU-MEI-YOU

    Verb-Object Constructions in Mandarin: a comparison with Ewe

    Full text link
    corecore