Search CORE

8,407 research outputs found

Multi-Tier Annotations in the Verbmobil Corpus

Author: Gonzáles Rodriguez Manual
Reichel Uwe D.
Schiel Florian
Weilhammer Karl
Publication venue
Publication date: 01/05/2002
Field of study

In very large and diverse scientific projects where as different groups as linguists and engineers with different intentions work on the same signal data or its orthographic transcript and annotate new valuable information, it will not be easy to build a homogeneous corpus. We will describe how this can be achieved, considering the fact that some of these annotations have not been updated properly, or are based on erroneous or deliberately changed versions of the basis transcription. We used an algorithm similar to dynamic programming to detect differences between the transcription on which the annotation depends and the reference transcription for the whole corpus. These differences are automatically mapped on a set of repair operations for the transcriptions such as splitting compound words and merging neighbouring words. On the basis of these operations the correction process in the annotation is carried out. It always depends on the type of the annotation as well as on the position and the nature of the difference, whether a correction can be carried out automatically or has to be fixed manually. Finally we present a investigation in which we exploit the multi-tier annotations of the Verbmobil corpus to find out how breathing is correlated with prosodic-syntactic boundaries and dialog acts. 1

CiteSeerX

Open Access LMU

Automatic F-Structure Annotation from the AP Treebank

Author: Sadler Louisa
van Genabith Josef
Way Andy
Publication venue: CSLI Publications
Publication date: 01/01/2000
Field of study

We present a method for automatically annotating treebank resources with functional structures. The method defines systematic patterns of correspondence between partial PS configurations and functional structures. These are applied to PS rules extracted from treebanks. The set of techniques which we have developed constitute a methodology for corpus-guided grammar development. Despite the widespread belief that treebank representations are not very useful in grammar development, we show that systematic patterns of c-structure to f-structure correspondence can be simply and successfully stated over such rules. The method is partial in that it requires manual correction of the annotated grammar rules

CiteSeerX

Irish Universities

DCU Online Research Access Service

From treebank resources to LFG F-structures

Author: A Cahill
A Frank
A Frank.
C Pollard
E Charniak.
G Leech
J Bresnan.
J Genabith van
L Sadler
RM Kaplan
S Abney.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

We present two methods for automatically annotating treebank resources with functional structures. Both methods define systematic patterns of correspondence between partial PS configurations and functional structures. These are applied to PS rules extracted from treebanks, or directly to constraint set encodings of treebank PS trees

University of Essex Research Repository

Crossref

DCU Online Research Access Service

A Relation-Based Page Rank Algorithm for Semantic Web Search Engines

Author: Demartini Claudio Giovanni
Lamberti Fabrizio
Sanna Andrea
Publication venue
Publication date: 01/01/2009
Field of study

With the tremendous growth of information available to end users through the Web, search engines come to play ever a more critical role. Nevertheless, because of their general-purpose approach, it is always less uncommon that obtained result sets provide a burden of useless pages. The next-generation Web architecture, represented by the Semantic Web, provides the layered architecture possibly allowing overcoming this limitation. Several search engines have been proposed, which allow increasing information retrieval accuracy by exploiting a key content of Semantic Web resources, that is, relations. However, in order to rank results, most of the existing solutions need to work on the whole annotated knowledge base. In this paper, we propose a relation-based page rank algorithm to be used in conjunction with Semantic Web search engines that simply relies on information that could be extracted from user queries and on annotated resources. Relevance is measured as the probability that a retrieved resource actually contains those relations whose existence was assumed by the user at the time of query definitio

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Automatic extension of corpora from the intelligent ensembling of eHealth knowledge discovery systems outputs

Author: Almeida-Cruz Yudivian
Consuegra-Ayala Juan Pablo
Gutiérrez Yoan
Palomar Manuel
Piad-Morffis Alejandro
Publication venue: 'Elsevier BV'
Publication date: 01/04/2021
Field of study

Corpora are one of the most valuable resources at present for building machine learning systems. However, building new corpora is an expensive task, which makes the automatic extension of corpora a highly attractive task to develop. Hence, finding new strategies that reduce the cost and effort involved in this task, while at the same time guaranteeing quality, remains an open and important challenge for the research community. In this paper, we present a set of ensembling strategies oriented toward entity and relation extraction tasks. The main goal is to combine several automatically annotated versions of corpora to produce a single version with improved quality. An ensembler is built by exploring a configuration space in search of the combination that maximizes the fitness of the ensembled collection according to a reference collection. The eHealth-KD 2019 challenge was chosen for the case study. The submitted systems’ outputs were ensembled, resulting in the construction of an automatically annotated collection of 8000 sentences. We show that using this collection as additional training input for a baseline algorithm has a positive impact on its performance. Additionally, the ensembling pipeline was used as a participant system in the 2020 edition of the challenge. The ensembled run achieved a slightly better performance than the individual runs.This research has been partially funded by the University of Alicante and the University of Havana, the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) and the Spanish Government through the projects LIVING-LANG (RTI2018-094653-B-C22) and SIIA (PROMETEO/2018/089, PROMETEU/2018/089). Moreover, it has been backed by the work of both COST Actions: CA19134 - “Distributed Knowledge Graphs” and CA19142 - “Leading Platform for European Citizens, Industries, Academia and Policymakers in Media Accessibility”

Repositorio Institucional de la Universidad de Alicante

Proceedings of the Second Workshop on Annotation of Corpora for Research in the Humanities (ACRH-2). 29 November 2012, Lisbon, Portugal

Author
Publication venue: place:Lisbona
Publication date: 01/01/2012
Field of study

Proceedings of the Second Workshop on Annotation of Corpora for Research in the Humanities (ACRH-2), held in Lisbon, Portugal on 29 November 2012

PubliCatt

Recommended from our members

People-Powered Music: Using User-Generated Tags and Structure in Recommendations

Author: Kulesza T
Schleith J.
Stumpf S.
Publication venue: Centre for Human Computer Interaction Design, City University London
Publication date
Field of study

Music recommenders often rely on experts to classify song facets like genre and mood, but user-generated folksonomies hold some advantages over expert classifications—folksonomies can reflect the same real-world vocabularies and categorizations that end users employ. We present an approach for using crowd-sourced common sense knowledge to structure user-generated music tags into a folksonomy, and describe how to use this approach to make music recommendations. We then empirically evaluate our “people-powered” structured content recommender against a more traditional recommender. Our results show that participants slightly preferred the unstructured recommender, rating more of its recommendations as “perfect” than they did for our approach. An exploration of the reasons behind participants’ ratings revealed that users behaved differently when tagging songs than when evaluating recommendations, and we discuss the implications of our results for future tagging and recommendation approaches

City Research Online

Push the Boundary of SAM: A Pseudo-label Correction Framework for Medical Segmentation

Author: Angelini Elsa
Gan Yu
Hendon Christine
Huang Ziyi
Laine Andrew
Liu Hongshan
Xing Fuyong
Zhang Haofeng
Publication venue
Publication date: 01/08/2023
Field of study

Segment anything model (SAM) has emerged as the leading approach for zero-shot learning in segmentation, offering the advantage of avoiding pixel-wise annotation. It is particularly appealing in medical image segmentation where annotation is laborious and expertise-demanding. However, the direct application of SAM often yields inferior results compared to conventional fully supervised segmentation networks. While using SAM generated pseudo label could also benefit the training of fully supervised segmentation, the performance is limited by the quality of pseudo labels. In this paper, we propose a novel label corruption to push the boundary of SAM-based segmentation. Our model utilizes a novel noise detection module to distinguish between noisy labels from clean labels. This enables us to correct the noisy labels using an uncertainty-based self-correction module, thereby enriching the clean training set. Finally, we retrain the network with updated labels to optimize its weights for future predictions. One key advantage of our model is its ability to train deep networks using SAM-generated pseudo labels without relying on a subset of expert-level annotations. We demonstrate the effectiveness of our proposed model on both X-ray and lung CT datasets, indicating its ability to improve segmentation accuracy and outperform baseline methods in label correction

arXiv.org e-Print Archive