5,233 research outputs found
Image annotation with Photocopain
Photo annotation is a resource-intensive task, yet is increasingly essential as image archives and personal photo collections grow in size. There is an inherent conflict in the process of describing and archiving personal experiences, because casual users are generally unwilling to expend large amounts of effort on creating the annotations which are required to organise their collections so that they can make best use of them. This paper describes the Photocopain system, a semi-automatic image annotation system which combines information about the context in which a photograph was captured with information from other readily available sources in order to generate outline annotations for that photograph that the user may further extend or amend
Vagueness and referential ambiguity in a large-scale annotated corpus
In this paper, we argue that difficulties in the definition of coreference itself contribute to lower inter-annotator agreement in certain cases. Data from a large referentially annotated corpus serves to corroborate this point, using a quantitative investigation to assess which effects or problems are likely to be the most prominent. Several examples where such problems occur are discussed in more detail, and we then propose a generalisation of Poesio, Reyle and Stevensonâs Justified Sloppiness Hypothesis to provide a unified model for these cases of disagreement and argue that a deeper understanding of the phenomena involved allows to tackle problematic cases in a more principled fashion than would be possible using only pre-theoretic intuitions
Ontology Driven Web Extraction from Semi-structured and Unstructured Data for B2B Market Analysis
The Market Blended Insight project1 has the objective of improving the UK business to business marketing performance using the semantic web technologies. In this project, we are implementing an ontology driven web extraction and translation framework to supplement our backend triple store of UK companies, people and geographical information. It deals with both the semi-structured data and the unstructured text on the web, to annotate and then translate the extracted data according to the backend schema
CHORUS Deliverable 4.4: Report of the 2nd CHORUS Conference
The Second CHORUS Conference and third Yahoo! Research Workshop on the Future of Web Search was held during April 4-5, 2008, in Granvalira, Andorra to discuss future directions in multi-medial information access and other specialised topics in the near future of retrieval. Attendance was at capacity, with 97 participants from 11 countries and 3 continents
A plant disease extension of the Infectious Disease Ontology
Plants from a handful of species provide the primary source of food for all people, yet this source is vulnerable to multiple stressors, such as disease, drought, and nutrient deficiency. With rapid population growth and climate uncertainty, the need to produce crops that can tolerate or resist plant stressors is more crucial than ever. Traditional plant breeding methods may not be sufficient to overcome this challenge, and methods such as highOthroughput sequencing and automated scoring of phenotypes can provide significant new insights. Ontologies are essential tools for accessing and analysing the large quantities of data that come with these newer methods. As part of a larger project to develop ontologies that describe plant phenotypes and stresses, we are developing a plant disease extension of the Infectious Disease Ontology (IDOPlant). The IDOPlant is envisioned as a reference ontology designed to cover any plant infectious disease. In addition to novel terms for infectious diseases, IDOPlant includes terms imported from other ontologies that describe plants, pathogens, and vectors, the geographic location and ecology of diseases and hosts, and molecular functions and interactions of hosts and pathogens. To encompass this range of data, we are suggesting inOhouse ontology development complemented with reuse of terms from orthogonal ontologies developed as part of the Open Biomedical Ontologies (OBO) Foundry. The study of plant diseases provides an example of how an ontological framework can be used to model complex biological phenomena such as plant disease, and how plant infectious diseases differ from,
and are similar to, infectious diseases in other organism
Recent developments in linguistic annotations of the TĂźBa-D/Z treebank
The purpose of this paper is to describe recent developments in the morphological, syntactic, and semantic annotation of the TĂźBa-D/Z treebank of German. The TĂźBa-D/Z annotation scheme is derived from the Verbmobil treebank of spoken German [4, 10], but has been extended along various dimensions to accommodate the characteristics of written texts. TĂźBa-D/Z uses as its data source the "die tageszeitung" (taz) newspaper corpus. The Verbmobil treebank annotation scheme distinguishes four levels of syntactic constituency: the lexical level, the phrasal level, the level of topological fields, and the clausal level. The primary ordering principle of a clause is the inventory of topological fields, which characterize the word order regularities among different clause types of German, and which are widely accepted among descriptive linguists of German [3, 6]. The TĂźBa-D/Z annotation relies on a context-free backbone (i.e. proper trees without crossing branches) of phrase structure combined with edge labels that specify the grammatical function of the phrase in question. The syntactic annotation scheme of the TĂźBa-D/Z is described in more detail in [12, 11]. TĂźBa-D/Z currently comprises approximately 15 000 sentences, with approximately 7 000 sentences being in the correction phase. The latter will be released along with an updated version of the existing treebank before the end of this year. The treebank is available in an XML format, in the NEGRA export format [1] and in the Penn treebank bracketing format. The XML format contains all types of information as described above, the NEGRA export format contains all sentenceinternal information while the Penn treebank format includes only those layers of information that can be expressed as pure tree structures. Over the course of the last year, more fine grained linguistic annotations have been added along the following dimensions: 1. the basic Stuttgart-TĂźbingen tagset, STTS, [9] labels have been enriched by relevant features of inflectional morphology, 2. named entity information has been encoded as part of the syntactic annotation, and 3. a set of anaphoric and coreference relations has been added to link referentially dependent noun phrases. In the following sections, we will describe each of these innovations in turn and will demonstrate how the additional annotations can be incorporated into one comprehensive annotation scheme
Ontological representation of CDC Active Bacterial Core Surveillance Case Reports
The Center for Disease Control and Preventionâs Active Bacterial Core Surveillance (CDC ABCs) Program is a collaborative effort betweeen the CDC, state health departments, laboratories, and universities to track invasive bacterial pathogens of particular importance to public health [1]. The year-end surveillance reports produced by this program help to shape public policy and coordinate responses to emerging infectious diseases over time. The ABCs case report form (CRF) data represents an excellent opportunity for data reuse beyond the original surveillance purposes
ATLAS: A flexible and extensible architecture for linguistic annotation
We describe a formal model for annotating linguistic artifacts, from which we
derive an application programming interface (API) to a suite of tools for
manipulating these annotations. The abstract logical model provides for a range
of storage formats and promotes the reuse of tools that interact through this
API. We focus first on ``Annotation Graphs,'' a graph model for annotations on
linear signals (such as text and speech) indexed by intervals, for which
efficient database storage and querying techniques are applicable. We note how
a wide range of existing annotated corpora can be mapped to this annotation
graph model. This model is then generalized to encompass a wider variety of
linguistic ``signals,'' including both naturally occuring phenomena (as
recorded in images, video, multi-modal interactions, etc.), as well as the
derived resources that are increasingly important to the engineering of natural
language processing systems (such as word lists, dictionaries, aligned
bilingual corpora, etc.). We conclude with a review of the current efforts
towards implementing key pieces of this architecture.Comment: 8 pages, 9 figure
Recommended from our members
Semantic Markup for Geographic Web Maps in HTML
In the recent years more and more geographical web maps have been developed and published on the Open Web Platform. Technically this has turned all variants of these maps into documents of the Hypertext Markup Language (HTML) making them appear to us naturally as graph-like and semi-structured data. In this dispute with geographical web maps and HTML we draw on the notion of so called âmap mashupsâ. Requiring an alternative model and definition of what such a map is, our research allows us to build and refine supportive technology which helps us in analyzing and interpreting information map makers code into their visualizations. The spectacles we take on to shine light on the current authoring practices behind many geographical web maps are informed by the perspective of a âcritical map readerâ. A task-oriented conception of âmap critiqueâ helped us to deduce a meaningful user perspective from which we specifically call the semantic web community for support on how to represent various information presented in maps from many authors and sources. With this perspective and questions in mind we investigated the Schema.org vocabulary as an ontology to use for turning elements of geographic web maps into textual statements referencing entities in the âouter worldâ. To illustrate and to make our investigation of the corresponding web standard documents easily applicable for map makers, to open up the discussion, but also to challenge and develop our first conclusions, we implemented them as a minimal extension to the standard API of the LeafletJS open source web mapping library
- âŚ