Search CORE

302 research outputs found

Sense Tagging: Semantic Tagging with a Lexicon

Author: Stevenson Mark
Wilks Yorick
Publication venue
Publication date: 01/01/1997
Field of study

Sense tagging, the automatic assignment of the appropriate sense from some lexicon to each of the words in a text, is a specialised instance of the general problem of semantic tagging by category or type. We discuss which recent word sense disambiguation algorithms are appropriate for sense tagging. It is our belief that sense tagging can be carried out effectively by combining several simple, independent, methods and we include the design of such a tagger. A prototype of this system has been implemented, correctly tagging 86% of polysemous word tokens in a small test set, providing evidence that our hypothesis is correct.Comment: 6 pages, uses aclap LaTeX style file. Also in Proceedings of the SIGLEX Workshop "Tagging Text with Lexical Semantics

arXiv.org e-Print Archive

CiteSeerX

Compacting the Penn Treebank Grammar

Author: Gaizauskas Robert
Hepple Mark
Krotov Alexander
Wilks Yorick
Publication venue
Publication date: 01/01/1998
Field of study

Treebanks, such as the Penn Treebank (PTB), offer a simple approach to obtaining a broad coverage grammar: one can simply read the grammar off the parse trees in the treebank. While such a grammar is easy to obtain, a square-root rate of growth of the rule set with corpus size suggests that the derived grammar is far from complete and that much more treebanked text would be required to obtain a complete grammar, if one exists at some limit. However, we offer an alternative explanation in terms of the underspecification of structures within the treebank. This hypothesis is explored by applying an algorithm to compact the derived grammar by eliminating redundant rules -- rules whose right hand sides can be parsed by other rules. The size of the resulting compacted grammar, which is significantly less than that of the full treebank grammar, is shown to approach a limit. However, such a compacted grammar does not yield very good performance figures. A version of the compaction algorithm taking rule probabilities into account is proposed, which is argued to be more linguistically motivated. Combined with simple thresholding, this method can be used to give a 58% reduction in grammar size without significant change in parsing performance, and can produce a 69% reduction with some gain in recall, but a loss in precision.Comment: 5 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

The effect of intent and character information on children's evaluations of third-party transgressions

Author: Cameron Sophie
Nielsen Mark
Wilks Matti
Publication venue: 'Wiley'
Publication date: 31/03/2023
Field of study

Edinburgh Research Explorer

Does helping now excuse cheating later? An investigation into moral balancing in children

Author: Cameron Sophie
Nielsen Mark
Wilks Matti
Publication venue
Publication date: 01/07/2021
Field of study

We often use our previous good behaviour to justify current immoral acts, and likewise perform good deeds to atone for previous immoral behaviour. These effects, known as moral self-licensing and moral cleansing (collectively, moral balancing), have yet to be observed in children. Thus, the aim in the current study was to investigate the developmental foundations of moral balancing. We examined whether children aged 4–5 years (N = 96) would be more likely to cheat on a task if they had previously helped a puppet at personal cost, and less likely to cheat if they had refused to help. This hypothesis was not supported, suggesting either that 4–5-year-old children do not engage in moral balancing or that the methodology used was not appropriate to capture this effect. We discuss implications and future research directions

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

CGHub: Kick-starting the Worldwide Genome Web

Author: Diekhans Mark
Haussler David
Maltbie Dan
Wilks Christopher
Publication venue: 'Proceedings of the Asia-Pacific Advanced Network'
Publication date: 10/06/2013
Field of study

The University of California, Santa Cruz (UCSC) is under contract with the National Cancer Institute (NCI) to construct and operate the Cancer Genomics Hub (CGHub), a nation-scale library and user portal for cancer genomics data. This contract covers growth of the library to 5 Petabytes. The NCI programs that feed into the library currently produce about 20 terabytes of data each month. We discuss the receiver-driven file transfer mechanism Annai GeneTorrent (GT) for use with the library. Annai GT uses multiple TCP streams from multiple computers at the library site to parallelize genome downloads. We review our performance experience with the new transfer mechanism and also explain additions to the transfer protocol to support the security required in handling patient cancer genomics data

Proceedings of the Asia-Pacific Advanced Network

Knowledge Representation with Ontologies: The Present and Future

Author: Brewster Christopher
Buckingham Shum Simon
Ellman Jeremy
Franconi Enrico
Fuller Steve
Musen Mark A.
O'Hara Kieron
Wilks Yorick
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

Recently, we have seen an explosion of interest in ontologies as artifacts to represent human knowledge and as critical components in knowledge management, the semantic Web, business-to-business applications, and several other application areas. Various research communities commonly assume that ontologies are the appropriate modeling structure for representing knowledge. However, little discussion has occurred regarding the actual range of knowledge an ontology can successfully represent

Southampton (e-Prints Soton)

Crossref

Aston Publications Explorer

The interaction of knowledge sources in word sense disambiguation

Author: Brill Eric
Daelemans Walter
Daelemans Walter
Ide Nancy
Kilgarriff Adam
Marcus Mitchell
Mark Stevenson
Masterman Margaret
McRoy Susan
Yorick Wilks
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2001
Field of study

Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results. We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94% on our evaluation corpus.Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems

CiteSeerX

Crossref

White Rose Research Online

Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation

Author: C. Fellbaum
D. Yarowsky
Lucia Specia
M. Stevenson
Maria das Graças Volpe Nunes
Mark Stevenson
S. Muggleton
S. Muggleton
S. Muggleton
S. Muggleton
Y. Wilks
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2010
Field of study

Corpus-based techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow knowledge. It has always been thought, however, that WSD could also benefit from deeper knowledge sources. We describe a novel approach to WSD using inductive logic programming to learn theories from first-order logic representations that allows corpus-based evidence to be combined with any kind of background knowledge. This approach has been shown to be effective over several disambiguation tasks using a combination of deep and shallow knowledge sources. Is it important to understand the contribution of the various knowledge sources used in such a system. This paper investigates the contribution of nine knowledge sources to the performance of the disambiguation models produced for the SemEval-2007 English lexical sample task. The outcome of this analysis will assist future work on WSD in concentrating on the most useful knowledge sources

Crossref

White Rose Research Online