9,705 research outputs found

    Lexical information from a minimalist point of view

    Get PDF
    Simplicity as a methodological orientation applies to linguistic theory just as to any other field of research: ‘Occam’s razor’ is the label for the basic heuristic maxim according to which an adequate analysis must ultimately be reduced to indispensible specifications. In this sense, conceptual economy has been a strict and stimulating guideline in the development of Generative Grammar from the very beginning. Halle’s (1959) argument discarding the level of taxonomic phonemics in order to unify two otherwise separate phonological processes is an early characteristic example; a more general notion is that of an evaluation metric introduced in Chomsky (1957, 1975), which relates the relative simplicity of alternative linguistic descriptions systematically to the quest for explanatory adequacy of the theory underlying the descriptions to be evaluated. Further proposals along these lines include the theory of markedness developed in Chomsky and Halle (1968), Kean (1975, 1981), and others, the notion of underspecification proposed e.g. in Archangeli (1984), Farkas (1990), the concept of default values and related notions. An important step promoting this general orientation was the idea of Principles and Parameters developed in Chomsky (1981, 1986), which reduced the notion of language particular rule systems to universal principles, subject merely to parametrization with restricted options, largely related to properties of particular lexical items. On this account, the notion of a simplicity metric is to be dispensed with, as competing analyses of relevant data are now supposed to be essentially excluded by the restrictive system of principles

    Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean

    Full text link
    A new tightly coupled speech and natural language integration model is presented for a TDNN-based continuous possibly large vocabulary speech recognition system for Korean. Unlike popular n-best techniques developed for integrating mainly HMM-based speech recognition and natural language processing in a {\em word level}, which is obviously inadequate for morphologically complex agglutinative languages, our model constructs a spoken language system based on a {\em morpheme-level} speech and language integration. With this integration scheme, the spoken Korean processing engine (SKOPE) is designed and implemented using a TDNN-based diphone recognition module integrated with a Viterbi-based lexical decoding and symbolic phonological/morphological co-analysis. Our experiment results show that the speaker-dependent continuous {\em eojeol} (Korean word) recognition and integrated morphological analysis can be achieved with over 80.6% success rate directly from speech inputs for the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer processing of oriental language journa

    Thematic Annotation: extracting concepts out of documents

    Get PDF
    Contrarily to standard approaches to topic annotation, the technique used in this work does not centrally rely on some sort of -- possibly statistical -- keyword extraction. In fact, the proposed annotation algorithm uses a large scale semantic database -- the EDR Electronic Dictionary -- that provides a concept hierarchy based on hyponym and hypernym relations. This concept hierarchy is used to generate a synthetic representation of the document by aggregating the words present in topically homogeneous document segments into a set of concepts best preserving the document's content. This new extraction technique uses an unexplored approach to topic selection. Instead of using semantic similarity measures based on a semantic resource, the later is processed to extract the part of the conceptual hierarchy relevant to the document content. Then this conceptual hierarchy is searched to extract the most relevant set of concepts to represent the topics discussed in the document. Notice that this algorithm is able to extract generic concepts that are not directly present in the document.Comment: Technical report EPFL/LIA. 81 pages, 16 figure

    On the nature of the lexicon: the status of rich lexical meanings

    Get PDF
    The main goal of this paper is to show that there are many phenomena that pertain to the construction of truth-conditional compounds that follow characteristic patterns, and whose explanation requires appealing to knowledge structures organized in specific ways. We review a number of phenomena, ranging from non-homogenous modification and privative modification to polysemy and co-predication that indicate that knowledge structures do play a role in obtaining truth-conditions. After that, we show that several extant accounts that invoke rich lexical meanings to explain such phenomena face problems related to inflexibility and lack of predictive power. We review different ways in which one might react to such problems as regards lexical meanings: go richer, go moderately richer, go thinner, and go moderately thinner. On the face of it, it looks like moderate positions are unstable, given the apparent lack of a clear cutoff point between the semantic and the conceptual, but also that a very thin view and a very rich view may turn out to be indistinguishable in the long run. As far as we can see, the most pressing open questions concern this last issue: can there be a principled semantic/world knowledge distinction? Where could it be drawn: at some upper level (e.g. enriched qualia structures) or at some basic level (e.g. constraints)? How do parsimony considerations affect these two different approaches? A thin meanings approach postulates intermediate representations whose role is not clear in the interpretive process, while a rich meanings approach to lexical meaning seems to duplicate representations: the same representations that are stored in the lexicon would form part of conceptual representations. Both types of parsimony problems would be solved by assuming a direct relation between word forms and (parts of) conceptual or world knowledge, leading to a view that has been attributed to Chomsky (e.g. by Katz 1980) in which there is just syntax and encyclopedic knowledge

    Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art

    Get PDF
    Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover

    Morphological Productivity in the Lexicon

    Full text link
    In this paper we outline a lexical organization for Turkish that makes use of lexical rules for inflections, derivations, and lexical category changes to control the proliferation of lexical entries. Lexical rules handle changes in grammatical roles, enforce type constraints, and control the mapping of subcategorization frames in valency-changing operations. A lexical inheritance hierarchy facilitates the enforcement of type constraints. Semantic compositions in inflections and derivations are constrained by the properties of the terms and predicates. The design has been tested as part of a HPSG grammar for Turkish. In terms of performance, run-time execution of the rules seems to be a far better alternative than pre-compilation. The latter causes exponential growth in the lexicon due to intensive use of inflections and derivations in Turkish.Comment: 10 pages LaTeX, {lingmacros,avm,psfig}.sty, 1 figure, 1 bibtex fil

    Usage Effects on the Cognitive Routinization of Chinese Resultative Verbs

    Get PDF
    The present study adopts a corpus-oriented usage-based approach to the grammar of Chinese resultative verbs. Zooming in on a specific class of V-kai constructions, this paper aims to elucidate the effect of frequency in actual usage events on shaping the linguistic representations of resultative verbs. Specifically, it will be argued that while high token frequency results in more lexicalized V-kai complex verbs, high type frequency gives rise to more schematized V-kai constructions. The routinized patterns pertinent to V-kai resultative verbs varying in their extent of specificity and generality accordingly serve as a representative illustration of the continuum between lexicon and grammar that characterizes a usage-based conception of language

    On the universal structure of human lexical semantics

    Full text link
    How universal is human conceptual structure? The way concepts are organized in the human brain may reflect distinct features of cultural, historical, and environmental background in addition to properties universal to human cognition. Semantics, or meaning expressed through language, provides direct access to the underlying conceptual structure, but meaning is notoriously difficult to measure, let alone parameterize. Here we provide an empirical measure of semantic proximity between concepts using cross-linguistic dictionaries. Across languages carefully selected from a phylogenetically and geographically stratified sample of genera, translations of words reveal cases where a particular language uses a single polysemous word to express concepts represented by distinct words in another. We use the frequency of polysemies linking two concepts as a measure of their semantic proximity, and represent the pattern of such linkages by a weighted network. This network is highly uneven and fragmented: certain concepts are far more prone to polysemy than others, and there emerge naturally interpretable clusters loosely connected to each other. Statistical analysis shows such structural properties are consistent across different language groups, largely independent of geography, environment, and literacy. It is therefore possible to conclude the conceptual structure connecting basic vocabulary studied is primarily due to universal features of human cognition and language use.Comment: Press embargo in place until publicatio

    Lexical Flexibility, Natural Language, and Ontology

    Get PDF
    The Realist that investigates questions of ontology by appeal to the quantificational structure of language assumes that the semantics for the privileged language of ontology is externalist. I argue that such a language cannot be (some variant of) a natural language, as some Realists propose. The flexibility exhibited by natural language expressions noted by Chomsky and others cannot obviously be characterized by the rigid models available to the externalist. If natural languages are hostile to externalist treatments, then the meanings of natural language expressions serve as poor guides for ontological investigation, insofar as their meanings will fail to determine the referents of their constituents. This undermines the Realist’s use of natural languages to settle disputes in metaphysics

    Nominalization – lexical and syntactic aspects

    Get PDF
    The main tenet of the present paper is the thesis that nominalization – like other cases of derivational morphology – is an essentially lexical phenomenon with well defined syntactic (and semantic) conditions and consequences. More specifically, it will be argued that the relation between a verb and the noun derived from it is subject to both systematic and idiosyncratic conditions with respect to lexical as well as syntactic aspects
    • 

    corecore