Search CORE

26,352 research outputs found

Resource-aware annotation through active learning

Author: Tomanek Katrin
Publication venue
Publication date: 12/05/2010
Field of study

The annotation of corpora has become a crucial prerequisite for information extraction systems which heavily rely on supervised machine learning techniques and therefore require large amounts of annotated training material. Annotation, however, requires human intervention and is thus an extremely costly, labor-intensive, and error-prone process. The burden of annotation is one of the major obstacles when well-established information extraction systems are to be applied to real-world problems and so a pressing research question is how annotation can be made more efficient. Most annotated corpora are built by collecting the documents to be annotated on a random sampling basis or based on simple keyword search. Only recently, more sophisticated approaches to select the base material in order to reduce annotation effort are being investigated. One promising direction is known as Active Learning (AL) where only examples of high utility for classifier training are selected for manual annotation. Because of this intelligent selection, classifiers of a certain target performance can be yieled with less labeled data points. This thesis centers around the question how AL can be applied as resource-aware strategy for linguistic annotation. A set of requirements is defined and several approaches and adaptations to the standard form of AL are proposed to meet these requirements. This includes: (1) a novel method to monitor and stop the AL-driven annotation process; (2) an approach to semi-supervised AL where only highly critical tokens have to actually be manually annotated while the rest is automatically tagged; (3) a discussion and empirical investigation of the reusability of actively drawn samples; (4) a comparative study how class imbalance can be reduced right upfront during AL-driven data acquisition; (5) two methods for selective sampling of examples which are useful for multiple learning problems; (6) an extensive evaluation of the proposed approaches to AL for Named Entity Recognition with respect to both savings in corpus size and actual annotation time; and finally (7) three methods how these approaches can be made cost-conscious so as to reduce annotation time even more

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Context Aware Computing for The Internet of Things: A Survey

Author: Arkady Zaslavsky
Charith Perera
Dimitrios Georgakopoulos
Peter Christen
Student Member
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/05/2013
Field of study

As we are moving towards the Internet of Things (IoT), the number of sensors deployed around the world is growing at a rapid pace. Market research has shown a significant growth of sensor deployments over the past decade and has predicted a significant increment of the growth rate in the future. These sensors continuously generate enormous amounts of data. However, in order to add value to raw sensor data we need to understand it. Collection, modelling, reasoning, and distribution of context in relation to sensor data plays critical role in this challenge. Context-aware computing has proven to be successful in understanding sensor data. In this paper, we survey context awareness from an IoT perspective. We present the necessary background by introducing the IoT paradigm and context-aware fundamentals at the beginning. Then we provide an in-depth analysis of context life cycle. We evaluate a subset of projects (50) which represent the majority of research and commercial solutions proposed in the field of context-aware computing conducted over the last decade (2001-2011) based on our own taxonomy. Finally, based on our evaluation, we highlight the lessons to be learnt from the past and some possible directions for future research. The survey addresses a broad range of techniques, methods, models, functionalities, systems, applications, and middleware solutions related to context awareness and IoT. Our goal is not only to analyse, compare and consolidate past research work but also to appreciate their findings and discuss their applicability towards the IoT.Comment: IEEE Communications Surveys & Tutorials Journal, 201

arXiv.org e-Print Archive

CiteSeerX

Deakin Research Online

Crossref

Online Research @ Cardiff

The Australian National University

Building a semantically annotated corpus of clinical texts

Author: Andrea Setzer
Angus Roberts
Denny
Franzén
Friedman
Gennari
George Demetriou
Hersh
Hripcsak
Ian Roberts
Kim
Lindberg
Mark Hepple
Meystre
Pestian
Robert Gaizauskas
Roberts
Tanabe
Yikun Guo
Publication venue: 'Elsevier BV'
Publication date: 01/10/2009
Field of study

In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains

Elsevier - Publisher Connector

Crossref

White Rose Research Online

A Context-aware Attention Network for Interactive Question Answering

Author: Ge Yong
Kadav Asim
Li Huayu
Min Martin Renqiang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/09/2017
Field of study

Neural network based sequence-to-sequence models in an encoder-decoder framework have been successfully applied to solve Question Answering (QA) problems, predicting answers from statements and questions. However, almost all previous models have failed to consider detailed context information and unknown states under which systems do not have enough information to answer given questions. These scenarios with incomplete or ambiguous information are very common in the setting of Interactive Question Answering (IQA). To address this challenge, we develop a novel model, employing context-dependent word-level attention for more accurate statement representations and question-guided sentence-level attention for better context modeling. We also generate unique IQA datasets to test our model, which will be made publicly available. Employing these attention mechanisms, our model accurately understands when it can output an answer or when it requires generating a supplementary question for additional input depending on different contexts. When available, user's feedback is encoded and directly applied to update sentence-level attention to infer an answer. Extensive experiments on QA and IQA datasets quantitatively demonstrate the effectiveness of our model with significant improvement over state-of-the-art conventional QA models.Comment: 9 page

arXiv.org e-Print Archive

Crossref

A quick guide for student-driven community genome annotation

Author: Benoit Joshua B.
Brown Susan J.
D'elia Tom
Flores Mirella
Hosmani Prashant S.
Miller Sherry
Mueller Lukas A.
Munoz-Torres Monica
Saha Surya
Shippy Teresa
Wiersma-Koch Helen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/10/2018
Field of study

High quality gene models are necessary to expand the molecular and genetic tools available for a target organism, but these are available for only a handful of model organisms that have undergone extensive curation and experimental validation over the course of many years. The majority of gene models present in biological databases today have been identified in draft genome assemblies using automated annotation pipelines that are frequently based on orthologs from distantly related model organisms. Manual curation is time consuming and often requires substantial expertise, but is instrumental in improving gene model structure and identification. Manual annotation may seem to be a daunting and cost-prohibitive task for small research communities but involving undergraduates in community genome annotation consortiums can be mutually beneficial for both education and improved genomic resources. We outline a workflow for efficient manual annotation driven by a team of primarily undergraduate annotators. This model can be scaled to large teams and includes quality control processes through incremental evaluation. Moreover, it gives students an opportunity to increase their understanding of genome biology and to participate in scientific research in collaboration with peers and senior researchers at multiple institutions

arXiv.org e-Print Archive

Directory of Open Access Journals

eScholarship - University of California

FigShare

Magpie: towards a semantic web browser

Author: A. Riva
E. Motta
H. Lieberman
I.A. Ovsiannikov
J. Domingue
J. Heflin
L. Tauscher
M. Vargas-Vera
N. Guarino
S. Middleton
T. Berners-Lee
T.R. Gruber
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

Web browsing involves two tasks: finding the right web page and then making sense of its content. So far, research has focused on supporting the task of finding web resources through ‘standard’ information retrieval mechanisms, or semantics-enhanced search. Much less attention has been paid to the second problem. In this paper we describe Magpie, a tool which supports the interpretation of web pages. Magpie offers complementary knowledge sources, which a reader can call upon to quickly gain access to any background knowledge relevant to a web resource. Magpie automatically associates an ontologybased semantic layer to web resources, allowing relevant services to be invoked within a standard web browser. Hence, Magpie may be seen as a step towards a semantic web browser. The functionality of Magpie is illustrated using examples of how it has been integrated with our lab’s web resources

CiteSeerX

Crossref

Open Research Online (The Open University)

An artefact repository to support distributed software engineering

Author: Boldyreff Cornelia
Nutter David
Rank Stephen
Publication venue
Publication date: 01/01/2003
Field of study

The Open Source Component Artefact Repository (OSCAR) system is a component of the GENESIS platform designed to non-invasively inter-operate with work-flow management systems, development tools and existing repository systems to support a distributed software engineering team working collaboratively. Every artefact possesses a collection of associated meta-data, both standard and domain-specific presented as an XML document. Within OSCAR, artefacts are made aware of changes to related artefacts using notifications, allowing them to modify their own meta-data actively in contrast to other software repositories where users must perform all and any modifications, however trivial. This recording of events, including user interactions provides a complete picture of an artefact's life from creation to (eventual) retirement with the intention of supporting collaboration both amongst the members of the software engineering team and agents acting on their behalf

University of Lincoln Institutional Repository

CiteSeerX

Recommended from our members

Knowledge Management for Public Administrations: Technical Realizations of an Enterprise Attention Management System

Author: Ntioudis Spyridon
Samiotis Konstantinos
Stojanovic Nenad
Publication venue
Publication date: 01/11/2014
Field of study

The improvement of governments’ efficiency has gained great importance and validity especially in the current times of economic downturn. E-Government constitutes the most contemporary techno-managerial proposition in the track of possible interventions. The paper addresses, more specifically, empowerments necessitated by Public Administration (PA) organizations. Anchored on the needs of three real-life cases, the paper describes the conception and the realization of an IT artefact together with its methodological appeals aiming at improving information access and delivery and thus PAs’ decision making capacity. Our proposition constitutes a novel approach for managing users’ attention in knowledge intensive organizations which goes beyond informing a user about changes in relevant information towards proactively supporting the user to react on changes. The approach is based on an expressive attention model, which is realized by combining ECA (Event-Condition-Action) rules with ontologies. The technical realizations described in the paper constitute the underlying infrastructure of an Enterprise Attention Management System

Open Research Online (The Open University)