302 research outputs found
Sense Tagging: Semantic Tagging with a Lexicon
Sense tagging, the automatic assignment of the appropriate sense from some
lexicon to each of the words in a text, is a specialised instance of the
general problem of semantic tagging by category or type. We discuss which
recent word sense disambiguation algorithms are appropriate for sense tagging.
It is our belief that sense tagging can be carried out effectively by combining
several simple, independent, methods and we include the design of such a
tagger. A prototype of this system has been implemented, correctly tagging 86%
of polysemous word tokens in a small test set, providing evidence that our
hypothesis is correct.Comment: 6 pages, uses aclap LaTeX style file. Also in Proceedings of the
SIGLEX Workshop "Tagging Text with Lexical Semantics
Compacting the Penn Treebank Grammar
Treebanks, such as the Penn Treebank (PTB), offer a simple approach to
obtaining a broad coverage grammar: one can simply read the grammar off the
parse trees in the treebank. While such a grammar is easy to obtain, a
square-root rate of growth of the rule set with corpus size suggests that the
derived grammar is far from complete and that much more treebanked text would
be required to obtain a complete grammar, if one exists at some limit. However,
we offer an alternative explanation in terms of the underspecification of
structures within the treebank. This hypothesis is explored by applying an
algorithm to compact the derived grammar by eliminating redundant rules --
rules whose right hand sides can be parsed by other rules. The size of the
resulting compacted grammar, which is significantly less than that of the full
treebank grammar, is shown to approach a limit. However, such a compacted
grammar does not yield very good performance figures. A version of the
compaction algorithm taking rule probabilities into account is proposed, which
is argued to be more linguistically motivated. Combined with simple
thresholding, this method can be used to give a 58% reduction in grammar size
without significant change in parsing performance, and can produce a 69%
reduction with some gain in recall, but a loss in precision.Comment: 5 pages, 2 figure
Does helping now excuse cheating later? An investigation into moral balancing in children
We often use our previous good behaviour to justify current immoral acts, and likewise perform good deeds to atone for previous immoral behaviour. These effects, known as moral self-licensing and moral cleansing (collectively, moral balancing), have yet to be observed in children. Thus, the aim in the current study was to investigate the developmental foundations of moral balancing. We examined whether children aged 4–5 years (N = 96) would be more likely to cheat on a task if they had previously helped a puppet at personal cost, and less likely to cheat if they had refused to help. This hypothesis was not supported, suggesting either that 4–5-year-old children do not engage in moral balancing or that the methodology used was not appropriate to capture this effect. We discuss implications and future research directions
CGHub: Kick-starting the Worldwide Genome Web
The University of California, Santa Cruz (UCSC) is under contract with the National Cancer Institute (NCI) to construct and operate the Cancer Genomics Hub (CGHub), a nation-scale library and user portal for cancer genomics data. This contract covers growth of the library to 5 Petabytes. The NCI programs that feed into the library currently produce about 20 terabytes of data each month. We discuss the receiver-driven file transfer mechanism Annai GeneTorrent (GT) for use with the library. Annai GT uses multiple TCP streams from multiple computers at the library site to parallelize genome downloads. We review our performance experience with the new transfer mechanism and also explain additions to the transfer protocol to support the security required in handling patient cancer genomics data
Knowledge Representation with Ontologies: The Present and Future
Recently, we have seen an explosion of interest in ontologies as
artifacts to represent human knowledge and as critical components in
knowledge management, the semantic Web, business-to-business
applications, and several other application areas. Various research
communities commonly assume that ontologies are the appropriate modeling
structure for representing knowledge. However, little discussion has
occurred regarding the actual range of knowledge an ontology can
successfully represent
The interaction of knowledge sources in word sense disambiguation
Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results.
We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94% on our evaluation corpus.Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems
Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation
Corpus-based techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow knowledge. It has always been thought, however, that WSD could also benefit from deeper knowledge sources. We describe a novel approach to WSD using inductive logic programming to learn theories from first-order logic representations that allows corpus-based evidence to be combined with any kind of background knowledge. This approach has been shown to be effective over several disambiguation tasks using a combination of deep and shallow knowledge sources. Is it important to understand the contribution of the various knowledge sources used in such a system. This paper investigates the contribution of nine knowledge sources to the performance of the disambiguation models produced for the SemEval-2007 English lexical sample task. The outcome of this analysis will assist future work on WSD in concentrating on the most useful knowledge sources
- …