373 research outputs found

    Entropy and Redundancy of Japanese Lexical and Syntactic Compound Verbs

    Get PDF
    The present study investigated Japanese lexical and syntactic compound verbs (V1+V2) using Shannon's concept of entropy and redundancy calculated using corpora from the Mainichi Newspaper and a collection of selected novels. Comparing combinations of a V2 verb with various V1 verbs, syntactic compounds were higher in entropy than lexical ones while neither differed in redundancy. This result suggests that V2 verbs of syntactic compounds are likely to combine with a wider range of V1 verbs than those of lexical compounds. Two exceptional V2 verbs, komu and ageru, both of which create lexical compounds, showed a wide variety of combinations with V1 and therefore act like prefixes in English. Comparing V2 verbs in the two corpora, the V2 eru, which adds the meaning of "possibility" to a V1, functions like the auxiliary verb "can" in English and seems to be a favored expression in newspapers. In contrast, the V2 komu, adds the meaning of "internal movement" similar to the preposition "into" in English and appears to be preferred in the novels to enrich the expression of lexical compounds. In general, both lexical and syntactic compounds were used similarly in both corpora

    A Corpus Investigation of the Right-hand Head Rule Applied to Japanese Affixes

    Get PDF
    The present study investigates differences between Japanese prefixes and suffixes using editions of the Asashi Newspaper published between 1985 and 1998 (Amano & Kondo, 2000). The right-hand head rule (e.g., Kageyama, 1982; Kageyama, 1999; Namiki, 1982; Nishigauchi, 2004; Williams, 1981) predicts that prefixes would be attached to a wide variety of nouns while suffixes would be regularly attached to a smaller group of nouns. Twenty-four frequently-used affixes consisting of 12 prefixes and 12 suffixes were compared according to 7 corpus features, including printed-frequency, productivity, accumulative productivity, commonality, coalescence degree, Herdan's logarithmic function of type-token ratio (log TTR), and entropy. Although a series of Mann-Whitney U-tests calculated for the six corpus features of printed-frequency, productivity, accumulative productivity, commonality, coalescence degree and log TTR did not reveal any differences between the 12 prefixes and the 12 suffixes, the t-test for entropy indicated a significant difference. This suggests that the prefixes were more randomly or chaotically attached to nouns than the suffixes. Although the present findings are limited only to the selected 24 affixes, the result supported the right-hand head rule

    Cross-linguistic trade-offs and causal relationships between cues to grammatical subject and object, and the problem of efficiency-related explanations

    Get PDF
    Cross-linguistic studies focus on inverse correlations (trade-offs) between linguistic variables that reflect different cues to linguistic meanings. For example, if a language has no case marking, it is likely to rely on word order as a cue for identification of grammatical roles. Such inverse correlations are interpreted as manifestations of language users’ tendency to use language efficiently. The present study argues that this interpretation is problematic. Linguistic variables, such as the presence of case, or flexibility of word order, are aggregate properties, which do not represent the use of linguistic cues in context directly. Still, such variables can be useful for circumscribing the potential role of communicative efficiency in language evolution, if we move from cross-linguistic trade-offs to multivariate causal networks. This idea is illustrated by a case study of linguistic variables related to four types of Subject and Object cues: case marking, rigid word order of Subject and Object, tight semantics and verb-medial order. The variables are obtained from online language corpora in thirty languages, annotated with the Universal Dependencies. The causal model suggests that the relationships between the variables can be explained predominantly by sociolinguistic factors, leaving little space for a potential impact of efficient linguistic behavior

    CLiFF Notes: Research in the Language Information and Computation Laboratory of The University of Pennsylvania

    Get PDF
    This report takes its name from the Computational Linguistics Feedback Forum (CLIFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania. It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science, Psychology, and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition. With 48 individual contributors and six projects represented, this is the largest LINC Lab collection to date, and the most diverse

    Windows into Sensory Integration and Rates in Language Processing: Insights from Signed and Spoken Languages

    Get PDF
    This dissertation explores the hypothesis that language processing proceeds in "windows" that correspond to representational units, where sensory signals are integrated according to time-scales that correspond to the rate of the input. To investigate universal mechanisms, a comparison of signed and spoken languages is necessary. Underlying the seemingly effortless process of language comprehension is the perceiver's knowledge about the rate at which linguistic form and meaning unfold in time and the ability to adapt to variations in the input. The vast body of work in this area has focused on speech perception, where the goal is to determine how linguistic information is recovered from acoustic signals. Testing some of these theories in the visual processing of American Sign Language (ASL) provides a unique opportunity to better understand how sign languages are processed and which aspects of speech perception models are in fact about language perception across modalities. The first part of the dissertation presents three psychophysical experiments investigating temporal integration windows in sign language perception by testing the intelligibility of locally time-reversed sentences. The findings demonstrate the contribution of modality for the time-scales of these windows, where signing is successively integrated over longer durations (~ 250-300 ms) than in speech (~ 50-60 ms), while also pointing to modality-independent mechanisms, where integration occurs in durations that correspond to the size of linguistic units. The second part of the dissertation focuses on production rates in sentences taken from natural conversations of English, Korean, and ASL. Data from word, sign, morpheme, and syllable rates suggest that while the rate of words and signs can vary from language to language, the relationship between the rate of syllables and morphemes is relatively consistent among these typologically diverse languages. The results from rates in ASL also complement the findings in perception experiments by confirming that time-scales at which phonological units fluctuate in production match the temporal integration windows in perception. These results are consistent with the hypothesis that there are modality-independent time pressures for language processing, and discussions provide a synthesis of converging findings from other domains of research and propose ideas for future investigations

    Hemispheric lateralisation in the recognition of Chinese characters

    Get PDF

    Word Knowledge and Word Usage

    Get PDF
    Word storage and processing define a multi-factorial domain of scientific inquiry whose thorough investigation goes well beyond the boundaries of traditional disciplinary taxonomies, to require synergic integration of a wide range of methods, techniques and empirical and experimental findings. The present book intends to approach a few central issues concerning the organization, structure and functioning of the Mental Lexicon, by asking domain experts to look at common, central topics from complementary standpoints, and discuss the advantages of developing converging perspectives. The book will explore the connections between computational and algorithmic models of the mental lexicon, word frequency distributions and information theoretical measures of word families, statistical correlations across psycho-linguistic and cognitive evidence, principles of machine learning and integrative brain models of word storage and processing. Main goal of the book will be to map out the landscape of future research in this area, to foster the development of interdisciplinary curricula and help single-domain specialists understand and address issues and questions as they are raised in other disciplines

    Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

    Get PDF
    Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

    Neural Combinatory Constituency Parsing

    Get PDF
    東京都立大学Tokyo Metropolitan University博士(情報科学)doctoral thesi

    Multiword expressions

    Get PDF
    Multiword expressions (MWEs) are a challenge for both the natural language applications and the linguistic theory because they often defy the application of the machinery developed for free combinations where the default is that the meaning of an utterance can be predicted from its structure. There is a rich body of primarily descriptive work on MWEs for many European languages but comparative work is little. The volume brings together MWE experts to explore the benefits of a multilingual perspective on MWEs. The ten contributions in this volume look at MWEs in Bulgarian, English, French, German, Maori, Modern Greek, Romanian, Serbian, and Spanish. They discuss prominent issues in MWE research such as classification of MWEs, their formal grammatical modeling, and the description of individual MWE types from the point of view of different theoretical frameworks, such as Dependency Grammar, Generative Grammar, Head-driven Phrase Structure Grammar, Lexical Functional Grammar, Lexicon Grammar
    corecore