14,167 research outputs found
Towards a query language for annotation graphs
The multidimensional, heterogeneous, and temporal nature of speech databases
raises interesting challenges for representation and query. Recently,
annotation graphs have been proposed as a general-purpose representational
framework for speech databases. Typical queries on annotation graphs require
path expressions similar to those used in semistructured query languages.
However, the underlying model is rather different from the customary graph
models for semistructured data: the graph is acyclic and unrooted, and both
temporal and inclusion relationships are important. We develop a query language
and describe optimization techniques for an underlying relational
representation.Comment: 8 pages, 10 figure
A General Framework for Representing, Reasoning and Querying with Annotated Semantic Web Data
We describe a generic framework for representing and reasoning with annotated
Semantic Web data, a task becoming more important with the recent increased
amount of inconsistent and non-reliable meta-data on the web. We formalise the
annotated language, the corresponding deductive system and address the query
answering problem. Previous contributions on specific RDF annotation domains
are encompassed by our unified reasoning formalism as we show by instantiating
it on (i) temporal, (ii) fuzzy, and (iii) provenance annotations. Moreover, we
provide a generic method for combining multiple annotation domains allowing to
represent, e.g. temporally-annotated fuzzy RDF. Furthermore, we address the
development of a query language -- AnQL -- that is inspired by SPARQL,
including several features of SPARQL 1.1 (subqueries, aggregates, assignment,
solution modifiers) along with the formal definitions of their semantics
Annotation Graphs and Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development
Annotation graphs and annotation servers offer infrastructure to support the
analysis of human language resources in the form of time-series data such as
text, audio and video. This paper outlines areas of common need among empirical
linguists and computational linguists. After reviewing examples of data and
tools used or under development for each of several areas, it proposes a common
framework for future tool development, data annotation and resource sharing
based upon annotation graphs and servers.Comment: 8 pages, 6 figure
Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective
This paper presents a Lisp architecture for a portable NLP system, termed
LAPNLP, for processing clinical notes. LAPNLP integrates multiple standard,
customized and in-house developed NLP tools. Our system facilitates portability
across different institutions and data systems by incorporating an enriched
Common Data Model (CDM) to standardize necessary data elements. It utilizes
UMLS to perform domain adaptation when integrating generic domain NLP tools. It
also features stand-off annotations that are specified by positional reference
to the original document. We built an interval tree based search engine to
efficiently query and retrieve the stand-off annotations by specifying
positional requirements. We also developed a utility to convert an inline
annotation format to stand-off annotations to enable the reuse of clinical text
datasets with inline annotations. We experimented with our system on several
NLP facilitated tasks including computational phenotyping for lymphoma patients
and semantic relation extraction for clinical notes. These experiments
showcased the broader applicability and utility of LAPNLP.Comment: 6 pages, accepted by IEEE BIBM 2018 as regular pape
ATLAS: A flexible and extensible architecture for linguistic annotation
We describe a formal model for annotating linguistic artifacts, from which we
derive an application programming interface (API) to a suite of tools for
manipulating these annotations. The abstract logical model provides for a range
of storage formats and promotes the reuse of tools that interact through this
API. We focus first on ``Annotation Graphs,'' a graph model for annotations on
linear signals (such as text and speech) indexed by intervals, for which
efficient database storage and querying techniques are applicable. We note how
a wide range of existing annotated corpora can be mapped to this annotation
graph model. This model is then generalized to encompass a wider variety of
linguistic ``signals,'' including both naturally occuring phenomena (as
recorded in images, video, multi-modal interactions, etc.), as well as the
derived resources that are increasingly important to the engineering of natural
language processing systems (such as word lists, dictionaries, aligned
bilingual corpora, etc.). We conclude with a review of the current efforts
towards implementing key pieces of this architecture.Comment: 8 pages, 9 figure
An Integrated Framework for Treebanks and Multilayer Annotations
Treebank formats and associated software tools are proliferating rapidly,
with little consideration for interoperability. We survey a wide variety of
treebank structures and operations, and show how they can be mapped onto the
annotation graph model, and leading to an integrated framework encompassing
tree and non-tree annotations alike. This development opens up new
possibilities for managing and exploiting multilayer annotations.Comment: 8 page
- …