Search CORE

12,637 research outputs found

A fact-aligned corpus of numerical expressions

Author: Power Richard
Williams Sandra
Publication venue
Publication date: 01/01/2010
Field of study

We describe a corpus of numerical expressions, developed as part of the NUMGEN project. The corpus contains newspaper articles and scientific papers in which exactly the same numerical facts are presented many times (both within and across texts). Some annotations of numerical facts are original: for example, numbers are automatically classified as round or non-round by an algorithm derived from Jansen and Pollmann (2001); also, numerical hedges such as 'about' or 'a little under' are marked up and classified semantically using arithmetical relations. Through explicit alignment of phrases describing the same fact, the corpus can support research on the influence of various contextual factors (e.g., document position, intended readership) on the way in which numerical facts are expressed. As an example we present results from an investigation showing that when a fact is mentioned more than once in a text, there is a clear tendency for precision to increase from first to subsequent mentions, and for mathematical level either to remain constant or to increase

CiteSeerX

Open Research Online (The Open University)

Multi-word expression-sensitive word alignment

Author: Graham Yvette
Maldonado Guerra Alfredo
Okita Tsuyoshi
Way Andy
Publication venue: Coling 2010 Organizing Committee
Publication date: 01/01/2010
Field of study

This paper presents a new word alignment method which incorporates knowledge about Bilingual Multi-Word Expressions (BMWEs). Our method of word alignment first extracts such BMWEs in a bidirectional way for a given corpus and then starts conventional word alignment, considering the properties of BMWEs in their grouping as well as their alignment links. We give partial annotation of alignment links as prior knowledge to the word alignment process; by replacing the maximum likelihood estimate in the M-step of the IBM Models with the Maximum A Posteriori (MAP) estimate, prior knowledge about BMWEs is embedded in the prior in this MAP estimate. In our experiments, we saw an improvement of 0.77 Bleu points absolute in JP–EN. Except for one case, our method gave better results than the method using only BMWEs grouping. Even though this paper does not directly address the issues in Cross-Lingual Information Retrieval (CLIR), it discusses an approach of direct relevance to the field. This approach could be viewed as the opposite of current trends in CLIR on semantic space that incorporate a notion of order in the bag-of-words model (e.g. co-occurences)

Irish Universities

DCU Online Research Access Service

Analysis of Amoeba Active Contours

Author: Welk Martin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Subject of this paper is the theoretical analysis of structure-adaptive median filter algorithms that approximate curvature-based PDEs for image filtering and segmentation. These so-called morphological amoeba filters are based on a concept introduced by Lerallut et al. They achieve similar results as the well-known geodesic active contour and self-snakes PDEs. In the present work, the PDE approximated by amoeba active contours is derived for a general geometric situation and general amoeba metric. This PDE is structurally similar but not identical to the geodesic active contour equation. It reproduces the previous PDE approximation results for amoeba median filters as special cases. Furthermore, modifications of the basic amoeba active contour algorithm are analysed that are related to the morphological force terms frequently used with geodesic active contours. Experiments demonstrate the basic behaviour of amoeba active contours and its similarity to geodesic active contours.Comment: Revised version with several improvements for clarity, slightly extended experiments and discussion. Accepted for publication in Journal of Mathematical Imaging and Visio

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Evaluative Language and Its Solidarity-Building Role on TED.com: An Appraisal and Corpus Analysis

Author: Drasovean Anda
Tagg Caroline
Publication venue
Publication date: 01/01/2015
Field of study

Language is a key resource in the formation of online communities, which are in turn central to an understanding of contemporary social relations. This study looks at TED.com, an educational video-hosting platform with few in-built community-building functionalities, to explore the potential for users to affiliate through their language choices. Grounded in Systemic Functional Linguistics, the study uses the Appraisal framework, extended using corpus linguistic methods, in order to analyse users’ reactions to TED videos. The study shows that online participants use evaluative language to align with certain ideas and, based on these affinities, form affiliations characterized by sociability and solidarity. These affiliations raise important questions about the conception of ‘community’ in twenty-first century society

Open Research Online (The Open University)

A semantic-based system for querying personal digital libraries

Author: B. Smith
G. Nagy
L. Spitz
T. Berners-Lee
T. Pavlidis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

This is the author's accepted manuscript. The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-540-28640-0_4. Copyright @ Springer 2004.The decreasing cost and the increasing availability of new technologies is enabling people to create their own digital libraries. One of the main topic in personal digital libraries is allowing people to select interesting information among all the different digital formats available today (pdf, html, tiff, etc.). Moreover the increasing availability of these on-line libraries, as well as the advent of the so called Semantic Web [1], is raising the demand for converting paper documents into digital, possibly semantically annotated, documents. These motivations drove us to design a new system which could enable the user to interact and query documents independently from the digital formats in which they are represented. In order to achieve this independence from the format we consider all the digital documents contained in a digital library as images. Our system tries to automatically detect the layout of the digital documents and recognize the geometric regions of interest. All the extracted information is then encoded with respect to a reference ontology, so that the user can query his digital library by typing free text or browsing the ontology

Crossref

Archivio della Ricerca - Università di Pisa

Archivio della ricerca- Università di Roma La Sapienza

Brunel University Research Archive

Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)

Author
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

Tilburg University Repository