9 research outputs found
Learning Ontology Relations by Combining Corpus-Based Techniques and Reasoning on Data from Semantic Web Sources
The manual construction of formal domain conceptualizations (ontologies) is labor-intensive. Ontology learning, by contrast, provides (semi-)automatic ontology generation from input data such as domain text. This thesis proposes a novel approach for learning labels of non-taxonomic ontology relations. It combines corpus-based techniques with reasoning on Semantic Web data. Corpus-based methods apply vector space similarity of verbs co-occurring with labeled and unlabeled relations to calculate relation label suggestions from a set of candidates. A meta ontology in combination with Semantic Web sources such as DBpedia and OpenCyc allows reasoning to improve the suggested labels. An extensive formal evaluation demonstrates the superior accuracy of the presented hybrid approach
Large-Scale Pattern-Based Information Extraction from the World Wide Web
Extracting information from text is the task of obtaining structured, machine-processable facts from information that is mentioned in an unstructured manner. It thus allows systems to automatically aggregate information for further analysis, efficient retrieval, automatic validation, or appropriate visualization.
This thesis explores the potential of using textual patterns for Information Extraction from the World Wide Web
Gamifying Language Resource Acquisition
PhD ThesisNatural Language Processing, is an important collection of methods for processing the vast
amounts of available natural language text we continually produce. These methods make
use of supervised learning, an approach that learns from large amounts of annotated
data. As humans, weāre able to provide information about text that such systems can learn from.
Historically, this was carried out by small groups of experts. However, this did not scale. This led
to various crowdsourcing approaches being taken that used large pools of non-experts.
The traditional form of crowdsourcing was to pay users small amounts of money to complete
tasks. As time progressed, gamification approaches such as GWAPs, showed various benefits
over the micro-payment methods used before. These included a cost saving, worker training
opportunities, increased worker engagement and potential to far exceed the scale of crowdsourcing.
While these were successful in domains such as image labelling, they struggled in the domain
of text annotation, which wasnāt such a natural fit. Despite many challenges, there were also
clearly many opportunities and benefits to applying this approach to text annotation. Many of
these are demonstrated by Phrase Detectives. Based on lessons learned from Phrase Detectives
and investigations into other GWAPs, in this work, we attempt to create full GWAPs for NLP,
extracting the benefits of the methodology. This includes training, high quality output from
non-experts and a truly game-like GWAP design that players are happy to play voluntarily
Large-Scale Pattern-Based Information Extraction from the World Wide Web
Extracting information from text is the task of obtaining structured, machine-processable facts from information that is mentioned in an unstructured manner. It thus allows systems to automatically aggregate information for further analysis, efficient retrieval, automatic validation, or appropriate visualization. This work explores the potential of using textual patterns for Information Extraction from the World Wide Web