81 research outputs found
The Antonym Construction: A Comparison between English and Mandarin
All languages have antonym pairs but may differ in the ways of using them. The use of antonymy in the form of antonym co-occurrence has been examined and compared between English and Mandarin with the conclusion that antonym pairs could co-occur on lexical level in Mandarin but not in English. That might be refuted with the identification of the antonym co-occurrence on lexical level in English like frenemy (friend-enemy) and humblebrag.
Therefore, this study identified and collected the items of antonym co-occurrence on lexical level from in-use English and Mandarin to examine and compare within the framework of Construction Grammar. The collected items were curated for antonymy consistency and the status of being lexicalized. The final sample included 105 English and 161 Mandarin antonym constructs. The two collections were examined and compared from the perspectives of form-meaning schema, headedness, syntactic categories, and inheritance links.
In addition to the typological differences between English and Mandarin, the observation demonstrates that the antonym constructions in both languages make use of the unity and contrast inherent in antonymy to communicate the meanings more than a binary contrast. Both can be nominalized or adverbialized, have the property of neutralized headedness, and are a complex of multi-inheritance links across lexical and phrasal levels.
Construction Grammar proves effective in facilitating this original joint analysis of the English and the Mandarin antonym constructions. Such effectiveness is credited to observing the antonym constructs as a form-meaning pair in use. Construction is thus proposed as a parameter in future contrastive studies. With the universality of the understanding and use of antonymy on lexical level confirmed between English and Mandarin, further research including more languages will be worthwhile in verifying such cognitive and linguistic universal
The algebra of lexical semantics
Abstract. The current generative theory of the lexicon relies primar-ily on tools from formal language theory and mathematical logic. Here we describe how a different formal apparatus, taken from algebra and automata theory, resolves many of the known problems with the gener-ative lexicon. We develop a finite state theory of word meaning based on machines in the sense of Eilenberg [11], a formalism capable of de-scribing discrepancies between syntactic type (lexical category) and se-mantic type (number of arguments). This mechanism is compared both to the standard linguistic approaches and to the formalisms developed in AI/KR. 1 Problem Statement In developing a formal theory of lexicography our starting point will be the informal practice of lexicography, rather than the more immediately related for-mal theories of Artificial Intelligence (AI) and Knowledge Representation (KR). Lexicography is a relatively mature field, with centuries of work experience an
Terminologies, Lexical Hierarchies and other Configurations
The focus of the monograph is on hierarchical systems of lexical items, particularly in scientific terminologies. It includes research outcomes from the dissertation Lexical Hierarchies in the Scientific Terminology, supplemented with a broad introduction to the typology of lexical and semantic relations and to a variety of branching and non-branching hierarchies, proportional series and other types of lexical configurations. It analyses the principles of formation of terminological classificatory hierarchies and identifies sense relations between items at superordinate and subordinate levels, and those at the same level. Specific morphological and onomatological properties of different languages influence consistency of corresponding lexical hierarchies, altghough the conceptual systems are identical
Posession as an operational dimension of language
In this study I want to show, above all, that the linguistic expression of POSSESSION is not a given but represents a problem to be solved by the human mind. We must recognize from the outset that linguistic POSSESSION presupposes conceptual or notional POSSESSION, and I shall say more about the latter in Chapter 3. Certain varieties of linguistic structures in the particular languages are united by the fact that they serve the common purpose of expressing notional POS SESSION. But this cannot be their sole common denominator. How would we otherwise be able to recognize, to understand, to learn and to translate a particular linguistic structure as representing POSSESSION? There must be a properly linguistic common denominator, an invariant, that makes this possible. The invariant must be present both within a particular language and in cross-language comparison. What is the nature of such an invariant? As I intend to show, it consists in operational programs and functional principles corresponding to the purpose of expressing notional POSSESSION. The structures of possessivity which we find in the languages of the world represent the traces of these operations, and from the traces it becomes possible to reconstruct stepwise the operations and functions
Possessivity, subject and object
The basic question is whether POSSESSOR and POSSESSUM are on the same level as the roles of VALENCE, two additional roles as it were. My research on POSSESSION has shown (Seiler 1981:7 ff.) that this is not the case, that there is a difference in principle between POSSESSION and VALENCE. However, there are multiple interactions between the two domains, and these interactions shall constitute the object of the following inquiry. It is hoped that this will contribute to a better understanding both of POSSESSION and of VALENCE
Recommended from our members
Acquiring and Harnessing Verb Knowledge for Multilingual Natural Language Processing
Advances in representation learning have enabled natural language processing models to derive non-negligible linguistic information directly from text corpora in an unsupervised fashion. However, this signal is underused in downstream tasks, where they tend to fall back on superficial cues and heuristics to solve the problem at hand. Further progress relies on identifying and filling the gaps in linguistic knowledge captured in their parameters. The objective of this thesis is to address these challenges focusing on the issues of resource scarcity, interpretability, and lexical knowledge injection, with an emphasis on the category of verbs.
To this end, I propose a novel paradigm for efficient acquisition of lexical knowledge leveraging native speakers’ intuitions about verb meaning to support development and downstream performance of NLP models across languages. First, I investigate the potential of acquiring semantic verb classes from non-experts through manual clustering. This subsequently informs the development of a two-phase semantic dataset creation methodology, which combines semantic clustering with fine-grained semantic similarity judgments collected through spatial arrangements of lexical stimuli. The method is tested on English and then applied to a typologically diverse sample of languages to produce the first large-scale multilingual verb dataset of this kind. I demonstrate its utility as a diagnostic tool by carrying out a comprehensive evaluation of state-of-the-art NLP models, probing representation quality across languages and domains of verb meaning, and shedding light on their deficiencies. Subsequently, I directly address these shortcomings by injecting lexical knowledge into large pretrained language models. I demonstrate that external manually curated information about verbs’ lexical properties can support data-driven models in tasks where accurate verb processing is key. Moreover, I examine the potential of extending these benefits from resource-rich to resource-poor languages through translation-based transfer. The results emphasise the usefulness of human-generated lexical knowledge in supporting NLP models and suggest that time-efficient construction of lexicons similar to those developed in this work, especially in under-resourced languages, can play an important role in boosting their linguistic capacity.ESRC Doctoral Fellowship [ES/J500033/1], ERC Consolidator Grant LEXICAL [648909
Lexical database enrichment through semi-automated morphological analysis
Derivational morphology proposes meaningful connections between words and is largely unrepresented in lexical databases. This thesis presents a project to enrich a lexical database with morphological links and to evaluate their contribution to disambiguation. A lexical database with sense distinctions was required. WordNet was chosen because of its free availability and widespread use. Its suitability was assessed through critical evaluation with respect to specifications and criticisms, using a transparent, extensible model. The identification of serious shortcomings suggested a portable enrichment methodology, applicable to alternative resources. Although 40% of the most frequent words are prepositions, they have been largely ignored by computational linguists, so addition of prepositions was also required. The preferred approach to morphological enrichment was to infer relations from phenomena discovered algorithmically. Both existing databases and existing algorithms can capture regular morphological relations, but cannot capture exceptions correctly; neither of them provide any semantic information. Some morphological analysis algorithms are subject to the fallacy that morphological analysis can be performed simply by segmentation. Morphological rules, grounded in observation and etymology, govern associations between and attachment of suffixes and contribute to defining the meaning of morphological relationships. Specifying character substitutions circumvents the segmentation fallacy. Morphological rules are prone to undergeneration, minimised through a variable lexical validity requirement, and overgeneration, minimised by rule reformulation and restricting monosyllabic output. Rules take into account the morphology of ancestor languages through co-occurrences of morphological patterns. Multiple rules applicable to an input suffix need their precedence established. The resistance of prefixations to segmentation has been addressed by identifying linking vowel exceptions and irregular prefixes. The automatic affix discovery algorithm applies heuristics to identify meaningful affixes and is combined with morphological rules into a hybrid model, fed only with empirical data, collected without supervision. Further algorithms apply the rules optimally to automatically pre-identified suffixes and break words into their component morphemes. To handle exceptions, stoplists were created in response to initial errors and fed back into the model through iterative development, leading to 100% precision, contestable only on lexicographic criteria. Stoplist length is minimised by special treatment of monosyllables and reformulation of rules. 96% of words and phrases are analysed. 218,802 directed derivational links have been encoded in the lexicon rather than the wordnet component of the model because the lexicon provides the optimal clustering of word senses. Both links and analyser are portable to an alternative lexicon. The evaluation uses the extended gloss overlaps disambiguation algorithm. The enriched model outperformed WordNet in terms of recall without loss of precision. Failure of all experiments to outperform disambiguation by frequency reflects on WordNet sense distinctions
Lexical database enrichment through semi-automated morphological analysis
Derivational morphology proposes meaningful connections between words and is largely unrepresented in lexical databases. This thesis presents a project to enrich a lexical database with morphological links and to evaluate their contribution to disambiguation. A lexical database with sense distinctions was required. WordNet was chosen because of its free availability and widespread use. Its suitability was assessed through critical evaluation with respect to specifications and criticisms, using a transparent, extensible model. The identification of serious shortcomings suggested a portable enrichment methodology, applicable to alternative resources. Although 40% of the most frequent words are prepositions, they have been largely ignored by computational linguists, so addition of prepositions was also required. The preferred approach to morphological enrichment was to infer relations from phenomena discovered algorithmically. Both existing databases and existing algorithms can capture regular morphological relations, but cannot capture exceptions correctly; neither of them provide any semantic information. Some morphological analysis algorithms are subject to the fallacy that morphological analysis can be performed simply by segmentation. Morphological rules, grounded in observation and etymology, govern associations between and attachment of suffixes and contribute to defining the meaning of morphological relationships. Specifying character substitutions circumvents the segmentation fallacy. Morphological rules are prone to undergeneration, minimised through a variable lexical validity requirement, and overgeneration, minimised by rule reformulation and restricting monosyllabic output. Rules take into account the morphology of ancestor languages through co-occurrences of morphological patterns. Multiple rules applicable to an input suffix need their precedence established. The resistance of prefixations to segmentation has been addressed by identifying linking vowel exceptions and irregular prefixes. The automatic affix discovery algorithm applies heuristics to identify meaningful affixes and is combined with morphological rules into a hybrid model, fed only with empirical data, collected without supervision. Further algorithms apply the rules optimally to automatically pre-identified suffixes and break words into their component morphemes. To handle exceptions, stoplists were created in response to initial errors and fed back into the model through iterative development, leading to 100% precision, contestable only on lexicographic criteria. Stoplist length is minimised by special treatment of monosyllables and reformulation of rules. 96% of words and phrases are analysed. 218,802 directed derivational links have been encoded in the lexicon rather than the wordnet component of the model because the lexicon provides the optimal clustering of word senses. Both links and analyser are portable to an alternative lexicon. The evaluation uses the extended gloss overlaps disambiguation algorithm. The enriched model outperformed WordNet in terms of recall without loss of precision. Failure of all experiments to outperform disambiguation by frequency reflects on WordNet sense distinctions.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Metonymy in Mind, Language, and Communication
The typical view considers metonymy as an intra-domain mapping which involves the source providing mental access to the target within the domain, with PART-WHOLE as the prototypical relation. This commonly held view of metonymy in Cognitive Linguistics pays attention to what happens after the domain, or rather, the WHOLE, has been established. How the WHOLE is formed seems to be missing. Based on the research results of cognitive science, especially in cognitive psychology, developmental psychology, and neuroscience, metonymy is tentatively argued to be an innate cognitive mechanism involving PART-WHOLE FORMING, PART-WHOLE/PART RELATING processes. The PART-WHOLE FORMING process establishes the WHOLE from the PART: It picks up some prominent element(s) in an interactive process to form a patterned experience, and the PART-WHOLE/PART RELATING process relates PART to PART, relating PART to the WHOLE and vice versa. The PART-WHOLE/PART RELATING process is made possible by the PART-WHOLE FORMING process. Metonymic operations usually precede metaphoric operation. Metaphor is essentially grounded on metonymy.
Metonymy as a cognitive mechanism is most noticeably realized in language. It operates in various aspects of language and language use. The experience pattern (i.e. the WHOLE) formulated through the PART-WHOLE FORMING process is found to underpin the process of grammaticalization, the development of meaning prototype, and to motivate such daily language use as football nicknaming and to bring in certain cognitive and communicative functions. In the light of this view of metonymy, grammaticalization is considered from the conceptual perspective as a process from the general/global to the specific/local, or from focus on one specific aspect to focus on another particular aspect within the global WHOLE, rather than the usually held concrete-to-abstract process. This also applies to word meaning prototypes. Word meaning develops due to the dynamic of meaning prototypes. When considered from the conceptual perspective, meaning prototypes generally develop from the general/global to the specific/local with the change and specification of contextual situations. The cognitive analysis of football nicknames also suggests that metonymy is overwhelming and provides the requisite basis for metaphor.
The PART-WHOLE FORMING and the PART-WHOLE/PART RELATING processes of metonymy as an inherent cognitive mechanism often interact in the mind, which is evidenced in language and may be best illustrated through analysis of interactive communication in general, and dialogic discourse in particular. Metonymy in interaction is embodied in its functions and operations in dialogue and its contribution to the dialogue as a discourse entity. Metonymy operates in dialogic discourse in various patterns of GENERAL-SPECIFIC scheme. It operates in the development of dialogue and helps structure the dialogic discourse, making it a coherent discourse entity; it makes meaning out of the local utterance and relates it to the whole dialogue; it underlies the decision-making process, helping make a final decision among alternatives. It also motivates the problem-solving process, helping formulate and organise replies to the questions posed by the counterpart in dialogic discourse, and facilitating the solution of daily problems
- …