26,352 research outputs found
Resource-aware annotation through active learning
The annotation of corpora has become a crucial prerequisite for
information extraction systems which heavily rely on supervised
machine learning techniques and therefore require large amounts of
annotated training material. Annotation, however, requires human
intervention and is thus an extremely costly, labor-intensive, and
error-prone process. The burden of annotation is one of the major
obstacles when well-established information extraction systems are to
be applied to real-world problems and so a pressing research question
is how annotation can be made more efficient.
Most annotated corpora are built by collecting the documents to be
annotated on a random sampling basis or based on simple keyword
search. Only recently, more sophisticated approaches to select the
base material in order to reduce annotation effort are being
investigated. One promising direction is known as Active Learning (AL)
where only examples of high utility for classifier training are
selected for manual annotation. Because of this intelligent selection,
classifiers of a certain target performance can be yieled with less
labeled data points.
This thesis centers around the question how AL can be applied as
resource-aware strategy for linguistic annotation. A set of
requirements is defined and several approaches and adaptations to the
standard form of AL are proposed to meet these requirements. This
includes: (1) a novel method to monitor and stop the AL-driven
annotation process; (2) an approach to semi-supervised AL where only
highly critical tokens have to actually be manually annotated while
the rest is automatically tagged; (3) a discussion and empirical
investigation of the reusability of actively drawn samples; (4) a
comparative study how class imbalance can be reduced right upfront
during AL-driven data acquisition; (5) two methods for selective
sampling of examples which are useful for multiple learning problems;
(6) an extensive evaluation of the proposed approaches to AL for Named
Entity Recognition with respect to both savings in corpus size and
actual annotation time; and finally (7) three methods how these
approaches can be made cost-conscious so as to reduce annotation time
even more
Context Aware Computing for The Internet of Things: A Survey
As we are moving towards the Internet of Things (IoT), the number of sensors
deployed around the world is growing at a rapid pace. Market research has shown
a significant growth of sensor deployments over the past decade and has
predicted a significant increment of the growth rate in the future. These
sensors continuously generate enormous amounts of data. However, in order to
add value to raw sensor data we need to understand it. Collection, modelling,
reasoning, and distribution of context in relation to sensor data plays
critical role in this challenge. Context-aware computing has proven to be
successful in understanding sensor data. In this paper, we survey context
awareness from an IoT perspective. We present the necessary background by
introducing the IoT paradigm and context-aware fundamentals at the beginning.
Then we provide an in-depth analysis of context life cycle. We evaluate a
subset of projects (50) which represent the majority of research and commercial
solutions proposed in the field of context-aware computing conducted over the
last decade (2001-2011) based on our own taxonomy. Finally, based on our
evaluation, we highlight the lessons to be learnt from the past and some
possible directions for future research. The survey addresses a broad range of
techniques, methods, models, functionalities, systems, applications, and
middleware solutions related to context awareness and IoT. Our goal is not only
to analyse, compare and consolidate past research work but also to appreciate
their findings and discuss their applicability towards the IoT.Comment: IEEE Communications Surveys & Tutorials Journal, 201
Building a semantically annotated corpus of clinical texts
In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains
A Context-aware Attention Network for Interactive Question Answering
Neural network based sequence-to-sequence models in an encoder-decoder
framework have been successfully applied to solve Question Answering (QA)
problems, predicting answers from statements and questions. However, almost all
previous models have failed to consider detailed context information and
unknown states under which systems do not have enough information to answer
given questions. These scenarios with incomplete or ambiguous information are
very common in the setting of Interactive Question Answering (IQA). To address
this challenge, we develop a novel model, employing context-dependent
word-level attention for more accurate statement representations and
question-guided sentence-level attention for better context modeling. We also
generate unique IQA datasets to test our model, which will be made publicly
available. Employing these attention mechanisms, our model accurately
understands when it can output an answer or when it requires generating a
supplementary question for additional input depending on different contexts.
When available, user's feedback is encoded and directly applied to update
sentence-level attention to infer an answer. Extensive experiments on QA and
IQA datasets quantitatively demonstrate the effectiveness of our model with
significant improvement over state-of-the-art conventional QA models.Comment: 9 page
A quick guide for student-driven community genome annotation
High quality gene models are necessary to expand the molecular and genetic
tools available for a target organism, but these are available for only a
handful of model organisms that have undergone extensive curation and
experimental validation over the course of many years. The majority of gene
models present in biological databases today have been identified in draft
genome assemblies using automated annotation pipelines that are frequently
based on orthologs from distantly related model organisms. Manual curation is
time consuming and often requires substantial expertise, but is instrumental in
improving gene model structure and identification. Manual annotation may seem
to be a daunting and cost-prohibitive task for small research communities but
involving undergraduates in community genome annotation consortiums can be
mutually beneficial for both education and improved genomic resources. We
outline a workflow for efficient manual annotation driven by a team of
primarily undergraduate annotators. This model can be scaled to large teams and
includes quality control processes through incremental evaluation. Moreover, it
gives students an opportunity to increase their understanding of genome biology
and to participate in scientific research in collaboration with peers and
senior researchers at multiple institutions
Magpie: towards a semantic web browser
Web browsing involves two tasks: finding the right web page and then making sense of its content. So far, research has focused on supporting the task of finding web resources through âstandardâ information retrieval mechanisms, or semantics-enhanced search. Much less attention has been paid to the second problem. In this paper we describe Magpie, a tool which supports the
interpretation of web pages. Magpie offers complementary knowledge sources, which a reader can call upon to quickly gain access to any background knowledge relevant to a web resource. Magpie automatically associates an ontologybased
semantic layer to web resources, allowing relevant services to be invoked within a standard web browser. Hence, Magpie may be seen as a step towards a semantic web browser. The functionality of Magpie is illustrated using examples of how it has been integrated with our labâs web resources
An artefact repository to support distributed software engineering
The Open Source Component Artefact Repository (OSCAR)
system is a component of the GENESIS platform designed to
non-invasively inter-operate with work-flow management systems, development tools and existing repository systems to support a distributed software engineering team working collaboratively. Every artefact possesses a collection of associated meta-data, both standard and domain-specific presented as an XML document. Within OSCAR, artefacts are made aware of changes to related artefacts using notifications, allowing them to modify their own meta-data actively in contrast to other software repositories where users must perform all and any modifications, however trivial.
This recording of events, including user interactions provides a complete picture of an artefact's life from creation to (eventual) retirement with the intention of supporting collaboration both amongst the members of the software engineering team and agents acting on their behalf
Recommended from our members
Knowledge Management for Public Administrations: Technical Realizations of an Enterprise Attention Management System
The improvement of governmentsâ efficiency has gained great importance and validity especially in the current times of economic downturn. E-Government constitutes the most contemporary techno-managerial proposition in the track of possible interventions. The paper addresses, more specifically, empowerments necessitated by Public Administration (PA) organizations. Anchored on the needs of three real-life cases, the paper describes the conception and the realization of an IT artefact together with its methodological appeals aiming at improving information access and delivery and thus PAsâ decision making capacity. Our proposition constitutes a novel approach for managing usersâ attention in knowledge intensive organizations which goes beyond informing a user about changes in relevant information towards proactively supporting the user to react on changes. The approach is based on an expressive attention model, which is realized by combining ECA (Event-Condition-Action) rules with ontologies. The technical realizations described in the paper constitute the underlying infrastructure of an Enterprise Attention Management System
- âŠ