12,637 research outputs found
A fact-aligned corpus of numerical expressions
We describe a corpus of numerical expressions, developed as part of the NUMGEN project. The corpus contains newspaper articles and scientific papers in which exactly the same numerical facts are presented many times (both within and across texts). Some annotations of numerical facts are original: for example, numbers are automatically classified as round or non-round by an algorithm derived from Jansen and Pollmann (2001); also, numerical hedges such as 'about' or 'a little under' are marked up and classified semantically using arithmetical relations. Through explicit alignment of phrases describing the same fact, the corpus can support research on the influence of various contextual factors (e.g., document position, intended readership) on the way in which numerical facts are expressed. As an example we present results from an investigation showing that when a fact is mentioned more than once in a text, there is a clear tendency for precision to increase from first to subsequent mentions, and for mathematical level either to remain constant or to increase
Multi-word expression-sensitive word alignment
This paper presents a new word alignment method which incorporates knowledge about Bilingual Multi-Word Expressions (BMWEs). Our method of word alignment first extracts such BMWEs in a bidirectional way for a given corpus and then starts conventional word alignment,
considering the properties of BMWEs in their grouping as well as their alignment links. We give partial annotation of alignment links as prior knowledge to the word
alignment process; by replacing the maximum likelihood estimate in the M-step of the IBM Models with the Maximum A
Posteriori (MAP) estimate, prior knowledge about BMWEs is embedded in the prior in this MAP estimate. In our experiments, we saw an improvement of 0.77 Bleu points absolute in JP–EN. Except for one case, our method gave better results than the method using only BMWEs grouping. Even though this paper does not directly address the issues in Cross-Lingual Information Retrieval (CLIR), it
discusses an approach of direct relevance to the field. This approach could be viewed as the opposite of current trends in CLIR on semantic space that incorporate a notion of order in the bag-of-words model (e.g. co-occurences)
Analysis of Amoeba Active Contours
Subject of this paper is the theoretical analysis of structure-adaptive
median filter algorithms that approximate curvature-based PDEs for image
filtering and segmentation. These so-called morphological amoeba filters are
based on a concept introduced by Lerallut et al. They achieve similar results
as the well-known geodesic active contour and self-snakes PDEs. In the present
work, the PDE approximated by amoeba active contours is derived for a general
geometric situation and general amoeba metric. This PDE is structurally similar
but not identical to the geodesic active contour equation. It reproduces the
previous PDE approximation results for amoeba median filters as special cases.
Furthermore, modifications of the basic amoeba active contour algorithm are
analysed that are related to the morphological force terms frequently used with
geodesic active contours. Experiments demonstrate the basic behaviour of amoeba
active contours and its similarity to geodesic active contours.Comment: Revised version with several improvements for clarity, slightly
extended experiments and discussion. Accepted for publication in Journal of
Mathematical Imaging and Visio
Recommended from our members
Evaluative Language and Its Solidarity-Building Role on TED.com: An Appraisal and Corpus Analysis
Language is a key resource in the formation of online communities, which are in turn central to an understanding of contemporary social relations. This study looks at TED.com, an educational video-hosting platform with few in-built community-building functionalities, to explore the potential for users to affiliate through their language choices. Grounded in Systemic Functional Linguistics, the study uses the Appraisal framework, extended using corpus linguistic methods, in order to analyse users’ reactions to TED videos. The study shows that online participants use evaluative language to align with certain ideas and, based on these affinities, form affiliations characterized by sociability and solidarity. These affiliations raise important questions about the conception of ‘community’ in twenty-first century society
A semantic-based system for querying personal digital libraries
This is the author's accepted manuscript. The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-540-28640-0_4. Copyright @ Springer 2004.The decreasing cost and the increasing availability of new technologies is enabling people to create their own digital libraries. One of the main topic in personal digital libraries is allowing people to select interesting information among all the different digital formats available today (pdf, html, tiff, etc.). Moreover the increasing availability of these on-line libraries, as well as the advent of the so called Semantic Web [1], is raising the demand for converting paper documents into digital, possibly semantically annotated, documents. These motivations drove us to design a new system which could enable the user to interact and query documents independently from the digital formats in which they are represented. In order to achieve this independence from the format we consider all the digital documents contained in a digital library as images. Our system tries to automatically detect the layout of the digital documents and recognize the geometric regions of interest. All the extracted information is then encoded with respect to a reference ontology, so that the user can query his digital library by typing free text or browsing the ontology
- …