6,977 research outputs found
Ontology Driven Web Extraction from Semi-structured and Unstructured Data for B2B Market Analysis
The Market Blended Insight project1 has the objective of improving the UK business to business marketing performance using the semantic web technologies. In this project, we are implementing an ontology driven web extraction and translation framework to supplement our backend triple store of UK companies, people and geographical information. It deals with both the semi-structured data and the unstructured text on the web, to annotate and then translate the extracted data according to the backend schema
A Machine Learning Based Analytical Framework for Semantic Annotation Requirements
The Semantic Web is an extension of the current web in which information is
given well-defined meaning. The perspective of Semantic Web is to promote the
quality and intelligence of the current web by changing its contents into
machine understandable form. Therefore, semantic level information is one of
the cornerstones of the Semantic Web. The process of adding semantic metadata
to web resources is called Semantic Annotation. There are many obstacles
against the Semantic Annotation, such as multilinguality, scalability, and
issues which are related to diversity and inconsistency in content of different
web pages. Due to the wide range of domains and the dynamic environments that
the Semantic Annotation systems must be performed on, the problem of automating
annotation process is one of the significant challenges in this domain. To
overcome this problem, different machine learning approaches such as supervised
learning, unsupervised learning and more recent ones like, semi-supervised
learning and active learning have been utilized. In this paper we present an
inclusive layered classification of Semantic Annotation challenges and discuss
the most important issues in this field. Also, we review and analyze machine
learning applications for solving semantic annotation problems. For this goal,
the article tries to closely study and categorize related researches for better
understanding and to reach a framework that can map machine learning techniques
into the Semantic Annotation challenges and requirements
Argumentation Mining in User-Generated Web Discourse
The goal of argumentation mining, an evolving research field in computational
linguistics, is to design methods capable of analyzing people's argumentation.
In this article, we go beyond the state of the art in several ways. (i) We deal
with actual Web data and take up the challenges given by the variety of
registers, multiple domains, and unrestricted noisy user-generated Web
discourse. (ii) We bridge the gap between normative argumentation theories and
argumentation phenomena encountered in actual data by adapting an argumentation
model tested in an extensive annotation study. (iii) We create a new gold
standard corpus (90k tokens in 340 documents) and experiment with several
machine learning methods to identify argument components. We offer the data,
source codes, and annotation guidelines to the community under free licenses.
Our findings show that argumentation mining in user-generated Web discourse is
a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in
User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17
On-the-fly Historical Handwritten Text Annotation
The performance of information retrieval algorithms depends upon the
availability of ground truth labels annotated by experts. This is an important
prerequisite, and difficulties arise when the annotated ground truth labels are
incorrect or incomplete due to high levels of degradation. To address this
problem, this paper presents a simple method to perform on-the-fly annotation
of degraded historical handwritten text in ancient manuscripts. The proposed
method aims at quick generation of ground truth and correction of inaccurate
annotations such that the bounding box perfectly encapsulates the word, and
contains no added noise from the background or surroundings. This method will
potentially be of help to historians and researchers in generating and
correcting word labels in a document dynamically. The effectiveness of the
annotation method is empirically evaluated on an archival manuscript collection
from well-known publicly available datasets
Magpie: towards a semantic web browser
Web browsing involves two tasks: finding the right web page and then making sense of its content. So far, research has focused on supporting the task of finding web resources through âstandardâ information retrieval mechanisms, or semantics-enhanced search. Much less attention has been paid to the second problem. In this paper we describe Magpie, a tool which supports the
interpretation of web pages. Magpie offers complementary knowledge sources, which a reader can call upon to quickly gain access to any background knowledge relevant to a web resource. Magpie automatically associates an ontologybased
semantic layer to web resources, allowing relevant services to be invoked within a standard web browser. Hence, Magpie may be seen as a step towards a semantic web browser. The functionality of Magpie is illustrated using examples of how it has been integrated with our labâs web resources
Features for Killer Apps from a Semantic Web Perspective
There are certain features that that distinguish killer apps from other ordinary applications. This chapter examines those features in the context of the semantic web, in the hope that a better understanding of the characteristics of killer apps might encourage their consideration when developing semantic web applications. Killer apps are highly tranformative technologies that create new e-commerce venues and widespread patterns of behaviour. Information technology, generally, and the Web, in particular, have benefited from killer apps to create new networks of users and increase its value. The semantic web community on the other hand is still awaiting a killer app that proves the superiority of its technologies. The authors hope that this chapter will help to highlight some of the common ingredients of killer apps in e-commerce, and discuss how such applications might emerge in the semantic web
Building a semantically annotated corpus of clinical texts
In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains
Clue: Cross-modal Coherence Modeling for Caption Generation
We use coherence relations inspired by computational models of discourse to
study the information needs and goals of image captioning. Using an annotation
protocol specifically devised for capturing image--caption coherence relations,
we annotate 10,000 instances from publicly-available image--caption pairs. We
introduce a new task for learning inferences in imagery and text, coherence
relation prediction, and show that these coherence annotations can be exploited
to learn relation classifiers as an intermediary step, and also train
coherence-aware, controllable image captioning models. The results show a
dramatic improvement in the consistency and quality of the generated captions
with respect to information needs specified via coherence relations.Comment: Accepted as a long paper to ACL 202
- âŚ