73 research outputs found
ConceptScope: Organizing and Visualizing Knowledge in Documents based on Domain Ontology
Current text visualization techniques typically provide overviews of document
content and structure using intrinsic properties such as term frequencies,
co-occurrences, and sentence structures. Such visualizations lack conceptual
overviews incorporating domain-relevant knowledge, needed when examining
documents such as research articles or technical reports. To address this
shortcoming, we present ConceptScope, a technique that utilizes a domain
ontology to represent the conceptual relationships in a document in the form of
a Bubble Treemap visualization. Multiple coordinated views of document
structure and concept hierarchy with text overviews further aid document
analysis. ConceptScope facilitates exploration and comparison of single and
multiple documents respectively. We demonstrate ConceptScope by visualizing
research articles and transcripts of technical presentations in computer
science. In a comparative study with DocuBurst, a popular document
visualization tool, ConceptScope was found to be more informative in exploring
and comparing domain-specific documents, but less so when it came to documents
that spanned multiple disciplines.Comment: 19 pages, 5 figure
Proceedings of the Conference on Natural Language Processing 2010
This book contains state-of-the-art contributions to the 10th
conference on Natural Language Processing, KONVENS 2010
(Konferenz zur Verarbeitung natürlicher Sprache), with a focus
on semantic processing.
The KONVENS in general aims at offering a broad perspective
on current research and developments within the interdisciplinary
field of natural language processing. The central theme
draws specific attention towards addressing linguistic aspects
ofmeaning, covering deep as well as shallow approaches to semantic
processing. The contributions address both knowledgebased
and data-driven methods for modelling and acquiring
semantic information, and discuss the role of semantic information
in applications of language technology.
The articles demonstrate the importance of semantic processing,
and present novel and creative approaches to natural
language processing in general. Some contributions put their
focus on developing and improving NLP systems for tasks like
Named Entity Recognition or Word Sense Disambiguation, or
focus on semantic knowledge acquisition and exploitation with
respect to collaboratively built ressources, or harvesting semantic
information in virtual games. Others are set within the
context of real-world applications, such as Authoring Aids, Text
Summarisation and Information Retrieval. The collection highlights
the importance of semantic processing for different areas
and applications in Natural Language Processing, and provides
the reader with an overview of current research in this field
Robust Dialog Management Through A Context-centric Architecture
This dissertation presents and evaluates a method of managing spoken dialog interactions with a robust attention to fulfilling the human user’s goals in the presence of speech recognition limitations. Assistive speech-based embodied conversation agents are computer-based entities that interact with humans to help accomplish a certain task or communicate information via spoken input and output. A challenging aspect of this task involves open dialog, where the user is free to converse in an unstructured manner. With this style of input, the machine’s ability to communicate may be hindered by poor reception of utterances, caused by a user’s inadequate command of a language and/or faults in the speech recognition facilities. Since a speech-based input is emphasized, this endeavor involves the fundamental issues associated with natural language processing, automatic speech recognition and dialog system design. Driven by ContextBased Reasoning, the presented dialog manager features a discourse model that implements mixed-initiative conversation with a focus on the user’s assistive needs. The discourse behavior must maintain a sense of generality, where the assistive nature of the system remains constant regardless of its knowledge corpus. The dialog manager was encapsulated into a speech-based embodied conversation agent platform for prototyping and testing purposes. A battery of user trials was performed on this agent to evaluate its performance as a robust, domain-independent, speech-based interaction entity capable of satisfying the needs of its users
A robust methodology for automated essay grading
None of the available automated essay grading systems can be used to grade essays according to the National Assessment Program – Literacy and Numeracy (NAPLAN) analytic scoring rubric used in Australia. This thesis is a humble effort to address this limitation. The objective of this thesis is to develop a robust methodology for automatically grading essays based on the NAPLAN rubric by using heuristics and rules based on English language and neural network modelling
Idiom treatment experiments in machine translation
Idiomatic expressions pose a particular challenge for the today\u27;s Machine Translation systems, because their translation mostly does not result literally, but logically. The present dissertation shows, how with the help of a corpus, and morphosyntactic rules, such idiomatic expressions can be recognized and finally correctly translated. The work leads the reader in the first chapter generally to the field of Machine Translation and following that, it focuses on the special field of Example-based Machine Translation. Next, an important part of the doctoral thesis dissertation is devoted to the theory of idiomatic expressions. The practical part of the thesis describes how the hybrid Example-based Machine Translation system METIS-II, with the help of morphosyntactic rules, is able to correctly process certain idiomatic expressions and finally, to translate them. The following chapter deals with the function of the transfer system CAT2 and its handling of the idiomatic expressions. The last part of the thesis includes the evaluation of three commercial systems, namely SYSTRAN, T1 Langenscheidt, and Power Translator Pro, with respect to continuous and discontinuous idiomatic expressions. For this, both small corpora and a part of the extensive corpus Europarl and the Digital Lexicon of the German Language in 20th century were processed, firstly manually and then automatically. The dissertation concludes with results from this evaluation.Idiomatische Redewendungen stellen für heutige maschinelle Übersetzungssysteme eine besondere Herausforderung dar, da ihre Übersetzung nicht wörtlich, sondern stets sinngemäß erfolgen muss. Die vorliegende Dissertation zeigt, wie mit Hilfe eines Korpus sowie morphosyntaktischer Regeln solche idiomatische Redewendungen erkannt und am Ende richtig übersetzt werden können. Die Arbeit führt den Leser im ersten Kapitel allgemein in das Gebiet der Maschinellen Übersetzung ein und vertieft im Anschluss daran das Spezialgebiet der Beispielbasierten Maschinellen Übersetzung. Im Folgenden widmet sich ein wesentlicher Teil der Doktorarbeit der Theorie über idiomatische Redewendungen. Der praktische Teil der Arbeit beschreibt wie das hybride Beispielbasierte Maschinelle Übersetzungssystem METIS-II mit Hilfe von morphosyntaktischen Regeln befähigt wurde, bestimmte idiomatische Redewendungen korrekt zu bearbeiten und am Ende zu übersetzen. Das nachfolgende Kapitel behandelt die Funktion des Transfersystems CAT2 und dessen Umgang mit idiomatischen Wendungen. Der letzte Teil der Arbeit beinhaltet die Evaluation von drei kommerzielle Systemen, nämlich SYSTRAN, T1 Langenscheidt und Power Translator Pro, in Bezug auf deren Umgang mit kontinuierlichen und diskontinuierlichen idiomatischen Redewendungen. Hierzu wurden sowohl kleine Korpora als auch ein Teil des umfangreichen Korpus Europarl und des Digatalen Wörterbuchs der deutschen Sprache des 20. Jh. erst manuell und dann maschinell bearbeitet. Die Dissertation wird mit Folgerungen aus der Evaluation abgeschlossen
Design of a Controlled Language for Critical Infrastructures Protection
We describe a project for the construction of controlled language for critical infrastructures protection (CIP). This project originates
from the need to coordinate and categorize the communications on CIP at the European level. These communications can be physically
represented by official documents, reports on incidents, informal communications and plain e-mail. We explore the application of
traditional library science tools for the construction of controlled languages in order to achieve our goal. Our starting point is an
analogous work done during the sixties in the field of nuclear science known as the Euratom Thesaurus.JRC.G.6-Security technology assessmen
- …