16,747 research outputs found
Recommended from our members
Set-related restrictions for semantic groupings
Semantic database models utilize several fundamental forms of groupings to increase their expressive power. In this paper we consider four of the most common of these constructs; basic set groupings, is-a related groupings, power set groupings, and Cartesian aggregation groupings. For each, we define a number of useful restrictions that control its structure and composition. This permits each grouping to capture more subtle distinctions of the concepts or situations in the application environment. The resulting set of restrictions forms a framework which increases the expressive power of semantic models and specifies various set-related integrity constraints
Mundari: The myth of a language without word classes
Mundari, an Austroasiatic language of India (Munda family), has often been cited as an example of a language without word classes, where a single word can function as noun, verb, adjective, etc. according to the context. These claims, originating in a 1903 grammar by the missionary John Hoffmann, have recently been repeated uncritically by a number of typologists. In this article we review the evidence for word class fluidity, on the basis of a careful analysis of Hoffmann's corpus as well as substantial new data, including a large lexical sample at two levels of detail. We argue that in fact Mundari does have clearly definable word classes, with distinct open classes of verb and noun, in addition to a closed adjective class, though there are productive possibilities for using all as predicates. Along the way, we elaborate a series of criteria that would need to be met before any language could seriously be claimed to lack a noun-verb distinction: most importantly strict compositionality, bidirectional flexibility, and exhaustiveness through the lexicon
Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation
Corpus-based techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow knowledge. It has always been thought, however, that WSD could also benefit from deeper knowledge sources. We describe a novel approach to WSD using inductive logic programming to learn theories from first-order logic representations that allows corpus-based evidence to be combined with any kind of background knowledge. This approach has been shown to be effective over several disambiguation tasks using a combination of deep and shallow knowledge sources. Is it important to understand the contribution of the various knowledge sources used in such a system. This paper investigates the contribution of nine knowledge sources to the performance of the disambiguation models produced for the SemEval-2007 English lexical sample task. The outcome of this analysis will assist future work on WSD in concentrating on the most useful knowledge sources
Seeking the dimensions of decision-making: An exploratory study: Working paper series--02-17
In a majority of business research field studies the concepts being measured are abstract and complex, and the tools available are relatively crude and primitive. The prior art provides indications, suggestions and allusions to the concept of a multi-dimensional decision making model, but there is no general theory empirically identifying these dimensions. Using the semantic differential technique, a decision-dimension profiling construct is developed. Support is sought for the theoretical a-priori hypotheses that decisions have multiple dimensions and these dimensions can be measured. The results indicate that a decision problem can be characterized by measuring eight semantic scales to proxy three dimensions; Risk, Scale, and Complexity. Similarly, eight additional semantic scales are identified to proxy the four dimensions; Logic, Speed, Scope, and Tactics; that a decision-maker uses to approach a decision problem
Learning Language from a Large (Unannotated) Corpus
A novel approach to the fully automated, unsupervised extraction of
dependency grammars and associated syntax-to-semantic-relationship mappings
from large text corpora is described. The suggested approach builds on the
authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well
as on a number of prior papers and approaches from the statistical language
learning literature. If successful, this approach would enable the mining of
all the information needed to power a natural language comprehension and
generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa
Extending the Abstract Data Model.
The Abstract Data Model (ADM) was developed by Sanderson [19] to model and predict semantic loss in data translation between computer languages. In this work, the ADM was applied to eight languages that were not considered as part of the original work. Some of the languages were found to support semantic features, such as the restriction semantics for inheritance found in languages like XML Schemas and Java, which could not be represented in the ADM. A proposal was made to extend the ADM to support these semantic features, and the requirements and implications of implementing that proposal were considered
Creating a Relational Distributed Object Store
In and of itself, data storage has apparent business utility. But when we can
convert data to information, the utility of stored data increases dramatically.
It is the layering of relation atop the data mass that is the engine for such
conversion. Frank relation amongst discrete objects sporadically ingested is
rare, making the process of synthesizing such relation all the more
challenging, but the challenge must be met if we are ever to see an equivalent
business value for unstructured data as we already have with structured data.
This paper describes a novel construct, referred to as a relational distributed
object store (RDOS), that seeks to solve the twin problems of how to
persistently and reliably store petabytes of unstructured data while
simultaneously creating and persisting relations amongst billions of objects.Comment: 12 pages, 5 figure
- …