1,565 research outputs found
The Semantic Web: Apotheosis of annotation, but what are its semantics?
This article discusses what kind of entity the proposed Semantic Web (SW) is, principally by reference to the relationship of natural language structure to knowledge representation (KR). There are three distinct views on this issue. The first is that the SW is basically a renaming of the traditional AI KR task, with all its problems and challenges. The second view is that the SW will be, at a minimum, the World Wide Web with its constituent documents annotated so as to yield their content, or meaning structure, more directly. This view makes natural language processing central as the procedural bridge from texts to KR, usually via some form of automated information extraction. The third view is that the SW is about trusted databases as the foundation of a system of Web processes and services. There's also a fourth view, which is much more difficult to define and discuss: If the SW just keeps moving as an engineering development and is lucky, then real problems won't arise. This article is part of a special issue called Semantic Web Update
Intelligent indexing of crime scene photographs
The Scene of Crime Information System's automatic image-indexing prototype goes beyond extracting keywords and syntactic relations from captions. The semantic information it gathers gives investigators an intuitive, accurate way to search a database of cases for specific photographic evidence. Intelligent, automatic indexing and retrieval of crime scene photographs is one of the main functions of SOCIS, our research prototype developed within the Scene of Crime Information System project. The prototype, now in its final development and evaluation phase, applies advanced natural language processing techniques to text-based image indexing and retrieval to tackle crime investigation needs effectively and efficiently
Evaluating two methods for Treebank grammar compaction
Treebanks, such as the Penn Treebank, provide a basis for the automatic creation of broad coverage grammars. In the simplest case, rules can simply be ‘read off’ the parse-annotations of the corpus, producing either a simple or probabilistic context-free grammar. Such grammars, however, can be very large, presenting problems for the subsequent computational costs of parsing under the grammar.
In this paper, we explore ways by which a treebank grammar can be reduced in size or ‘compacted’, which involve the use of two kinds of technique: (i) thresholding of rules by their number of occurrences; and (ii) a method of rule-parsing, which has both probabilistic and non-probabilistic variants. Our results show that by a combined use of these two techniques, a probabilistic context-free grammar can be reduced in size by 62% without any loss in parsing performance, and by 71% to give a gain in recall, but some loss in precision
Intelligent multimedia indexing and retrieval through multi-source information extraction and merging
This paper reports work on automated meta-data\ud
creation for multimedia content. The approach results\ud
in the generation of a conceptual index of\ud
the content which may then be searched via semantic\ud
categories instead of keywords. The novelty\ud
of the work is to exploit multiple sources of\ud
information relating to video content (in this case\ud
the rich range of sources covering important sports\ud
events). News, commentaries and web reports covering\ud
international football games in multiple languages\ud
and multiple modalities is analysed and the\ud
resultant data merged. This merging process leads\ud
to increased accuracy relative to individual sources
Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation
Corpus-based techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow knowledge. It has always been thought, however, that WSD could also benefit from deeper knowledge sources. We describe a novel approach to WSD using inductive logic programming to learn theories from first-order logic representations that allows corpus-based evidence to be combined with any kind of background knowledge. This approach has been shown to be effective over several disambiguation tasks using a combination of deep and shallow knowledge sources. Is it important to understand the contribution of the various knowledge sources used in such a system. This paper investigates the contribution of nine knowledge sources to the performance of the disambiguation models produced for the SemEval-2007 English lexical sample task. The outcome of this analysis will assist future work on WSD in concentrating on the most useful knowledge sources
FEATURE SELECTION APPLIED TO THE TIME-FREQUENCY REPRESENTATION OF MUSCLE NEAR-INFRARED SPECTROSCOPY (NIRS) SIGNALS: CHARACTERIZATION OF DIABETIC OXYGENATION PATTERNS
Diabetic patients might present peripheral microcirculation impairment and might benefit from physical training. Thirty-nine diabetic patients underwent the monitoring of the tibialis anterior muscle oxygenation during a series of voluntary ankle flexo-extensions by near-infrared spectroscopy (NIRS). NIRS signals were acquired before and after training protocols. Sixteen control subjects were tested with the same protocol. Time-frequency distributions of the Cohen's class were used to process the NIRS signals relative to the concentration changes of oxygenated and reduced hemoglobin. A total of 24 variables were measured for each subject and the most discriminative were selected by using four feature selection algorithms: QuickReduct, Genetic Rough-Set Attribute Reduction, Ant Rough-Set Attribute Reduction, and traditional ANOVA. Artificial neural networks were used to validate the discriminative power of the selected features. Results showed that different algorithms extracted different sets of variables, but all the combinations were discriminative. The best classification accuracy was about 70%. The oxygenation variables were selected when comparing controls to diabetic patients or diabetic patients before and after training. This preliminary study showed the importance of feature selection techniques in NIRS assessment of diabetic peripheral vascular impairmen
University of Sheffield: Description of the LaSIE System as Used for MUC-6
more generally, natural languag e engineering. LaSIE is a single, integrated system that builds up a unified model of a text which is then used t
University Of Sheffield: Description Of The Lasie-Ii System As Used For Muc-7
this article wewere largely successful with all slots except for the Entity Descriptor slot where scores were 50 # precision and 21 # recall. We will #rst explain the particular items we failed on, and then discuss why our Entity Descriptor slots were so poo
Dialogue based interfaces for universal access.
Conversation provides an excellent means of communication for almost all people. Consequently, a conversational interface is an excellent mechanism for allowing people to interact with systems. Conversational systems are an active research area, but a wide range of systems can be developed with current technology. More sophisticated interfaces can take considerable effort, but simple interfaces can be developed quite rapidly. This paper gives an introduction to the current state of the art of conversational systems and interfaces. It describes a methodology for developing conversational interfaces and gives an example of an interface for a state benefits web site. The paper discusses how this interface could improve access for a wide range of people, and how further development of this interface would allow a larger range of people to use the system and give them more functionality
- …