651 research outputs found
Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches
We study the adaptation of Link Grammar Parser to the biomedical sublanguage
with a focus on domain terms not found in a general parser lexicon. Using two
biomedical corpora, we implement and evaluate three approaches to addressing
unknown words: automatic lexicon expansion, the use of morphological clues, and
disambiguation using a part-of-speech tagger. We evaluate each approach
separately for its effect on parsing performance and consider combinations of
these approaches. In addition to a 45% increase in parsing efficiency, we find
that the best approach, incorporating information from a domain part-of-speech
tagger, offers a statistically signicant 10% relative decrease in error. The
adapted parser is available under an open-source license at
http://www.it.utu.fi/biolg
The Royal Birth of 2013: Analysing and Visualising Public Sentiment in the UK Using Twitter
Analysis of information retrieved from microblogging services such as Twitter
can provide valuable insight into public sentiment in a geographic region. This
insight can be enriched by visualising information in its geographic context.
Two underlying approaches for sentiment analysis are dictionary-based and
machine learning. The former is popular for public sentiment analysis, and the
latter has found limited use for aggregating public sentiment from Twitter
data. The research presented in this paper aims to extend the machine learning
approach for aggregating public sentiment. To this end, a framework for
analysing and visualising public sentiment from a Twitter corpus is developed.
A dictionary-based approach and a machine learning approach are implemented
within the framework and compared using one UK case study, namely the royal
birth of 2013. The case study validates the feasibility of the framework for
analysis and rapid visualisation. One observation is that there is good
correlation between the results produced by the popular dictionary-based
approach and the machine learning approach when large volumes of tweets are
analysed. However, for rapid analysis to be possible faster methods need to be
developed using big data techniques and parallel methods.Comment: http://www.blessonv.com/research/publicsentiment/ 9 pages. Submitted
to IEEE BigData 2013: Workshop on Big Humanities, October 201
Creating a Semantic Graph from Wikipedia
With the continued need to organize and automate the use of data, solutions are needed to transform unstructred text into structred information. By treating dependency grammar functions as programming language functions, this process produces \property maps which connect entities (people, places, events) with snippets of information. These maps are used to construct a semantic graph. By inputting Wikipedia, a large graph of information is produced representing a section of history. The resulting graph allows a user to quickly browse a topic and view the interconnections between entities across history
Parse Forest Diagnostics with Dr. Ambiguity
In this paper we propose and evaluate a method for locating causes of ambiguity in context-free grammars by automatic analysis of parse forests. A parse forest is the set of parse trees of an ambiguous sentence.
% an output of a static ambiguity detection tool that has detected ambiguity in a context-free grammar or of a general parser that has accidentally parsed an ambiguous sentence.
Deducing causes of ambiguity from observing parse forests is hard for grammar engineers because of (a) the size of the parse forests, (b) the complex shape of parse forests, and (c) the diversity of causes of ambiguity.
We first analyze the diversity of ambiguities in grammars for programming languages and the diversity of solutions to these ambiguities. Then we introduce \drambiguity: a parse forest diagnostics tools that explains the causes of ambiguity by analyzing differences between parse trees and proposes solutions. We demonstrate its effectiveness using a small experiment with a grammar for Java 5
Recommended from our members
Processing Reverse Sluicing: A Contrast with Processing Filler-Gap Dependencies
音声翻訳における文解析技法について
本文データは平成22年度国立国会図書館の学位論文(博士)のデジタル化実施により作成された画像ファイルを基にpdf変換したものである京都大学0048新制・論文博士博士(工学)乙第8652号論工博第2893号新制||工||968(附属図書館)UT51-94-R411(主査)教授 長尾 真, 教授 堂下 修司, 教授 池田 克夫学位規則第4条第2項該当Doctor of EngineeringKyoto UniversityDFA
- …