9,705 research outputs found
Lexical information from a minimalist point of view
Simplicity as a methodological orientation applies to linguistic theory just as to any other field of research: âOccamâs razorâ is the label for the basic heuristic maxim according to which an adequate analysis must ultimately be reduced to indispensible specifications. In this sense, conceptual economy has been a strict and stimulating guideline in the development of Generative Grammar from the very beginning. Halleâs (1959) argument discarding the level of taxonomic phonemics in order to unify two otherwise separate phonological processes is an early characteristic example; a more general notion is that of an evaluation metric introduced in Chomsky (1957, 1975), which relates the relative simplicity of alternative linguistic descriptions systematically to the quest for explanatory adequacy of the theory underlying the descriptions to be evaluated. Further proposals along these lines include the theory of markedness developed in Chomsky and Halle (1968), Kean (1975, 1981), and others, the notion of underspecification proposed e.g. in Archangeli (1984), Farkas (1990), the concept of default values and related notions. An important step promoting this general orientation was the idea of Principles and Parameters developed in Chomsky (1981, 1986), which reduced the notion of language particular rule systems to universal principles, subject merely to parametrization with restricted options, largely related to properties of particular lexical items. On this account, the notion of a simplicity metric is to be dispensed with, as competing analyses of relevant data are now supposed to be essentially excluded by the restrictive system of principles
Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean
A new tightly coupled speech and natural language integration model is
presented for a TDNN-based continuous possibly large vocabulary speech
recognition system for Korean. Unlike popular n-best techniques developed for
integrating mainly HMM-based speech recognition and natural language processing
in a {\em word level}, which is obviously inadequate for morphologically
complex agglutinative languages, our model constructs a spoken language system
based on a {\em morpheme-level} speech and language integration. With this
integration scheme, the spoken Korean processing engine (SKOPE) is designed and
implemented using a TDNN-based diphone recognition module integrated with a
Viterbi-based lexical decoding and symbolic phonological/morphological
co-analysis. Our experiment results show that the speaker-dependent continuous
{\em eojeol} (Korean word) recognition and integrated morphological analysis
can be achieved with over 80.6% success rate directly from speech inputs for
the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer
processing of oriental language journa
Thematic Annotation: extracting concepts out of documents
Contrarily to standard approaches to topic annotation, the technique used in
this work does not centrally rely on some sort of -- possibly statistical --
keyword extraction. In fact, the proposed annotation algorithm uses a large
scale semantic database -- the EDR Electronic Dictionary -- that provides a
concept hierarchy based on hyponym and hypernym relations. This concept
hierarchy is used to generate a synthetic representation of the document by
aggregating the words present in topically homogeneous document segments into a
set of concepts best preserving the document's content.
This new extraction technique uses an unexplored approach to topic selection.
Instead of using semantic similarity measures based on a semantic resource, the
later is processed to extract the part of the conceptual hierarchy relevant to
the document content. Then this conceptual hierarchy is searched to extract the
most relevant set of concepts to represent the topics discussed in the
document. Notice that this algorithm is able to extract generic concepts that
are not directly present in the document.Comment: Technical report EPFL/LIA. 81 pages, 16 figure
On the nature of the lexicon: the status of rich lexical meanings
The main goal of this paper is to show that there are many phenomena that pertain to the construction of truth-conditional compounds that follow characteristic patterns, and whose explanation requires appealing to knowledge structures organized in specific ways. We review a number of phenomena, ranging from non-homogenous modification and privative modification to polysemy and co-predication that indicate that knowledge structures do play a role in obtaining truth-conditions. After that, we show that several extant accounts that invoke rich lexical meanings to explain such phenomena face problems related to inflexibility and lack of predictive power. We review different ways in which one might react to such problems as regards lexical meanings: go richer, go moderately richer, go thinner, and go moderately thinner. On the face of it, it looks like moderate positions are unstable, given the apparent lack of a clear cutoff point between the semantic and the conceptual, but also that a very thin view and a very rich view may turn out to be indistinguishable in the long run. As far as we can see, the most pressing open questions concern this last issue: can there be a principled semantic/world knowledge distinction? Where could it be drawn: at some upper level (e.g. enriched qualia structures) or at some basic level (e.g. constraints)? How do parsimony considerations affect these two different approaches? A thin meanings approach postulates intermediate representations whose role is not clear in the interpretive process, while a rich meanings approach to lexical meaning seems to duplicate representations: the same representations that are stored in the lexicon would form part of conceptual representations. Both types of parsimony problems would be solved by assuming a direct relation between word forms and (parts of) conceptual or world knowledge, leading to a view that has been attributed to Chomsky (e.g. by Katz 1980) in which there is just syntax and encyclopedic knowledge
Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art
Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover
Morphological Productivity in the Lexicon
In this paper we outline a lexical organization for Turkish that makes use of
lexical rules for inflections, derivations, and lexical category changes to
control the proliferation of lexical entries. Lexical rules handle changes in
grammatical roles, enforce type constraints, and control the mapping of
subcategorization frames in valency-changing operations. A lexical inheritance
hierarchy facilitates the enforcement of type constraints. Semantic
compositions in inflections and derivations are constrained by the properties
of the terms and predicates.
The design has been tested as part of a HPSG grammar for Turkish. In terms of
performance, run-time execution of the rules seems to be a far better
alternative than pre-compilation. The latter causes exponential growth in the
lexicon due to intensive use of inflections and derivations in Turkish.Comment: 10 pages LaTeX, {lingmacros,avm,psfig}.sty, 1 figure, 1 bibtex fil
Usage Effects on the Cognitive Routinization of Chinese Resultative Verbs
The present study adopts a corpus-oriented usage-based approach to the grammar of Chinese resultative verbs. Zooming in on a specific class of V-kai constructions, this paper aims to elucidate the effect of frequency in actual usage events on shaping the linguistic representations of resultative verbs. Specifically, it will be argued that while high token frequency results in more lexicalized V-kai complex verbs, high type frequency gives rise to more schematized V-kai constructions. The routinized patterns pertinent to V-kai resultative verbs varying in their extent of specificity and generality accordingly serve as a representative illustration of the continuum between lexicon and grammar that characterizes a usage-based conception of language
On the universal structure of human lexical semantics
How universal is human conceptual structure? The way concepts are organized
in the human brain may reflect distinct features of cultural, historical, and
environmental background in addition to properties universal to human
cognition. Semantics, or meaning expressed through language, provides direct
access to the underlying conceptual structure, but meaning is notoriously
difficult to measure, let alone parameterize. Here we provide an empirical
measure of semantic proximity between concepts using cross-linguistic
dictionaries. Across languages carefully selected from a phylogenetically and
geographically stratified sample of genera, translations of words reveal cases
where a particular language uses a single polysemous word to express concepts
represented by distinct words in another. We use the frequency of polysemies
linking two concepts as a measure of their semantic proximity, and represent
the pattern of such linkages by a weighted network. This network is highly
uneven and fragmented: certain concepts are far more prone to polysemy than
others, and there emerge naturally interpretable clusters loosely connected to
each other. Statistical analysis shows such structural properties are
consistent across different language groups, largely independent of geography,
environment, and literacy. It is therefore possible to conclude the conceptual
structure connecting basic vocabulary studied is primarily due to universal
features of human cognition and language use.Comment: Press embargo in place until publicatio
Lexical Flexibility, Natural Language, and Ontology
The Realist that investigates questions of ontology by appeal to the quantificational structure of language assumes that the semantics for the privileged language of ontology is externalist. I argue that such a language cannot be (some variant of) a natural language, as some Realists propose. The flexibility exhibited by natural language expressions noted by Chomsky and others cannot obviously be characterized by the rigid models available to the externalist. If natural languages are hostile to externalist treatments, then the meanings of natural language expressions serve as poor guides for ontological investigation, insofar as their meanings will fail to determine the referents of their constituents. This undermines the Realistâs use of natural languages to settle disputes in metaphysics
Nominalization â lexical and syntactic aspects
The main tenet of the present paper is the thesis that nominalization â like other cases of derivational morphology â is an essentially lexical phenomenon with well defined syntactic (and semantic) conditions and consequences. More specifically, it will be argued that the relation between a verb and the noun derived from it is subject to both systematic and idiosyncratic conditions with respect to lexical as well as syntactic aspects
- âŠ