68,188 research outputs found
Language-based multimedia information retrieval
This paper describes various methods and approaches for language-based multimedia information retrieval, which have been developed in the projects POP-EYE and OLIVE and which will be developed further in the MUMIS project. All of these project aim at supporting automated indexing of video material by use of human language technologies. Thus, in contrast to image or sound-based retrieval methods, where both the query language and the indexing methods build on non-linguistic data, these methods attempt to exploit advanced text retrieval technologies for the retrieval of non-textual material. While POP-EYE was building on subtitles or captions as the prime language key for disclosing video fragments, OLIVE is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which then serve as the basis for text-based retrieval functionality
Graphene: A Context-Preserving Open Information Extraction System
We introduce Graphene, an Open IE system whose goal is to generate accurate,
meaningful and complete propositions that may facilitate a variety of
downstream semantic applications. For this purpose, we transform syntactically
complex input sentences into clean, compact structures in the form of core
facts and accompanying contexts, while identifying the rhetorical relations
that hold between them in order to maintain their semantic relationship. In
that way, we preserve the context of the relational tuples extracted from a
source sentence, generating a novel lightweight semantic representation for
Open IE that enhances the expressiveness of the extracted propositions.Comment: 27th International Conference on Computational Linguistics (COLING
2018
Linguistic Constraints in LFG-DOP
LFG-DOP (Bod and Kaplan, 1998, 2003) provides an appealing answer to the question of how probabilistic methods can be incorporated into linguistic theory. However, despite its attractions, the standard model of LFG-DOP suffers from serious problems of overgeneration, because (a) it is unable to define fragments of the right level of generality, and (b) it has no way of capturing the effect of anything except simple positive constraints. We show how the model can be extended to overcome these problems. The question of how probabilistic methods should be incorporated into linguistic theory is important from both a practical, grammar engineering, perspective, and from the perspective of âpure â linguistic theory. From a practical point of view such techniques are essential if a system is to achieve a useful breadth of coverag
Capturing Ambiguity in Crowdsourcing Frame Disambiguation
FrameNet is a computational linguistics resource composed of semantic frames,
high-level concepts that represent the meanings of words. In this paper, we
present an approach to gather frame disambiguation annotations in sentences
using a crowdsourcing approach with multiple workers per sentence to capture
inter-annotator disagreement. We perform an experiment over a set of 433
sentences annotated with frames from the FrameNet corpus, and show that the
aggregated crowd annotations achieve an F1 score greater than 0.67 as compared
to expert linguists. We highlight cases where the crowd annotation was correct
even though the expert is in disagreement, arguing for the need to have
multiple annotators per sentence. Most importantly, we examine cases in which
crowd workers could not agree, and demonstrate that these cases exhibit
ambiguity, either in the sentence, frame, or the task itself, and argue that
collapsing such cases to a single, discrete truth value (i.e. correct or
incorrect) is inappropriate, creating arbitrary targets for machine learning.Comment: in publication at the sixth AAAI Conference on Human Computation and
Crowdsourcing (HCOMP) 201
Automatic summarising: factors and directions
This position paper suggests that progress with automatic summarising demands
a better research methodology and a carefully focussed research strategy. In
order to develop effective procedures it is necessary to identify and respond
to the context factors, i.e. input, purpose, and output factors, that bear on
summarising and its evaluation. The paper analyses and illustrates these
factors and their implications for evaluation. It then argues that this
analysis, together with the state of the art and the intrinsic difficulty of
summarising, imply a nearer-term strategy concentrating on shallow, but not
surface, text analysis and on indicative summarising. This is illustrated with
current work, from which a potentially productive research programme can be
developed
- âŠ