16,747 research outputs found

    Mundari: The myth of a language without word classes

    Get PDF
    Mundari, an Austroasiatic language of India (Munda family), has often been cited as an example of a language without word classes, where a single word can function as noun, verb, adjective, etc. according to the context. These claims, originating in a 1903 grammar by the missionary John Hoffmann, have recently been repeated uncritically by a number of typologists. In this article we review the evidence for word class fluidity, on the basis of a careful analysis of Hoffmann's corpus as well as substantial new data, including a large lexical sample at two levels of detail. We argue that in fact Mundari does have clearly definable word classes, with distinct open classes of verb and noun, in addition to a closed adjective class, though there are productive possibilities for using all as predicates. Along the way, we elaborate a series of criteria that would need to be met before any language could seriously be claimed to lack a noun-verb distinction: most importantly strict compositionality, bidirectional flexibility, and exhaustiveness through the lexicon

    Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation

    No full text
    Corpus-based techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow knowledge. It has always been thought, however, that WSD could also benefit from deeper knowledge sources. We describe a novel approach to WSD using inductive logic programming to learn theories from first-order logic representations that allows corpus-based evidence to be combined with any kind of background knowledge. This approach has been shown to be effective over several disambiguation tasks using a combination of deep and shallow knowledge sources. Is it important to understand the contribution of the various knowledge sources used in such a system. This paper investigates the contribution of nine knowledge sources to the performance of the disambiguation models produced for the SemEval-2007 English lexical sample task. The outcome of this analysis will assist future work on WSD in concentrating on the most useful knowledge sources

    Seeking the dimensions of decision-making: An exploratory study: Working paper series--02-17

    Get PDF
    In a majority of business research field studies the concepts being measured are abstract and complex, and the tools available are relatively crude and primitive. The prior art provides indications, suggestions and allusions to the concept of a multi-dimensional decision making model, but there is no general theory empirically identifying these dimensions. Using the semantic differential technique, a decision-dimension profiling construct is developed. Support is sought for the theoretical a-priori hypotheses that decisions have multiple dimensions and these dimensions can be measured. The results indicate that a decision problem can be characterized by measuring eight semantic scales to proxy three dimensions; Risk, Scale, and Complexity. Similarly, eight additional semantic scales are identified to proxy the four dimensions; Logic, Speed, Scope, and Tactics; that a decision-maker uses to approach a decision problem

    Learning Language from a Large (Unannotated) Corpus

    Full text link
    A novel approach to the fully automated, unsupervised extraction of dependency grammars and associated syntax-to-semantic-relationship mappings from large text corpora is described. The suggested approach builds on the authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well as on a number of prior papers and approaches from the statistical language learning literature. If successful, this approach would enable the mining of all the information needed to power a natural language comprehension and generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa

    Extending the Abstract Data Model.

    Get PDF
    The Abstract Data Model (ADM) was developed by Sanderson [19] to model and predict semantic loss in data translation between computer languages. In this work, the ADM was applied to eight languages that were not considered as part of the original work. Some of the languages were found to support semantic features, such as the restriction semantics for inheritance found in languages like XML Schemas and Java, which could not be represented in the ADM. A proposal was made to extend the ADM to support these semantic features, and the requirements and implications of implementing that proposal were considered

    Creating a Relational Distributed Object Store

    Full text link
    In and of itself, data storage has apparent business utility. But when we can convert data to information, the utility of stored data increases dramatically. It is the layering of relation atop the data mass that is the engine for such conversion. Frank relation amongst discrete objects sporadically ingested is rare, making the process of synthesizing such relation all the more challenging, but the challenge must be met if we are ever to see an equivalent business value for unstructured data as we already have with structured data. This paper describes a novel construct, referred to as a relational distributed object store (RDOS), that seeks to solve the twin problems of how to persistently and reliably store petabytes of unstructured data while simultaneously creating and persisting relations amongst billions of objects.Comment: 12 pages, 5 figure
    • …
    corecore