5,322 research outputs found
Do You See What I Mean? Visual Resolution of Linguistic Ambiguities
Understanding language goes hand in hand with the ability to integrate
complex contextual information obtained via perception. In this work, we
present a novel task for grounded language understanding: disambiguating a
sentence given a visual scene which depicts one of the possible interpretations
of that sentence. To this end, we introduce a new multimodal corpus containing
ambiguous sentences, representing a wide range of syntactic, semantic and
discourse ambiguities, coupled with videos that visualize the different
interpretations for each sentence. We address this task by extending a vision
model which determines if a sentence is depicted by a video. We demonstrate how
such a model can be adjusted to recognize different interpretations of the same
underlying sentence, allowing to disambiguate sentences in a unified fashion
across the different ambiguity types.Comment: EMNLP 201
Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Machine Learning has been a big success story during the AI resurgence. One
particular stand out success relates to learning from a massive amount of data.
In spite of early assertions of the unreasonable effectiveness of data, there
is increasing recognition for utilizing knowledge whenever it is available or
can be created purposefully. In this paper, we discuss the indispensable role
of knowledge for deeper understanding of content where (i) large amounts of
training data are unavailable, (ii) the objects to be recognized are complex,
(e.g., implicit entities and highly subjective content), and (iii) applications
need to use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP
techniques. Using diverse examples, we seek to foretell unprecedented progress
in our ability for deeper understanding and exploitation of multimodal data and
continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International
Conference on Web Intelligence (WI). arXiv admin note: substantial text
overlap with arXiv:1610.0770
Two Uummarmiutun modals – including a brief comparison with Utkuhikšalingmiutut cognates
The paper is concerned with the meaning of two modal
postbases in Uummarmiutun, hungnaq ‘probably’ and ȓukȓau
‘should’. Uummarmiutun is an Inuktut dialect spoken in the
Western Arctic. The analyses are founded on knowledge shared
by native speakers of Uummarmiutun. Their statements and
elaborations are quoted throughout the paper to show how they
have explained the meaning nuances of modal expressions in
their language. The paper also includes a comparison with
cognates in Utkuhikšalingmiutut, which belongs to the eastern
part of the Western Canadian dialect group (Dorais, 2010).
Using categories from Cognitive Functional Linguistics (Boye,
2005, 2012), the paper shows which meanings are covered by
hungnaq and ȓukȓau. This allows us to discover subtle
differences between the meanings of Uummarmiutun hungnaq
and ȓukȓau and their Utkuhikšalingmiutut cognates
respectively
Multimodal Grounding for Language Processing
This survey discusses how recent developments in multimodal processing
facilitate conceptual grounding of language. We categorize the information flow
in multimodal processing with respect to cognitive models of human information
processing and analyze different methods for combining multimodal
representations. Based on this methodological inventory, we discuss the benefit
of multimodal grounding for a variety of language processing tasks and the
challenges that arise. We particularly focus on multimodal grounding of verbs
which play a crucial role for the compositional power of language.Comment: The paper has been published in the Proceedings of the 27 Conference
of Computational Linguistics. Please refer to this version for citations:
https://www.aclweb.org/anthology/papers/C/C18/C18-1197
Crossmodal content binding in information-processing architectures
Operating in a physical context, an intelligent robot faces two fundamental problems. First, it needs to combine information from its different sensors to form a representation of the environment that is more complete than any of its sensors on its own could provide. Second, it needs to combine high-level representations (such as those for planning and dialogue) with its sensory information, to ensure that the interpretations of these symbolic representations are grounded in the situated context. Previous approaches to this problem have used techniques such as (low-level) information fusion, ontological reasoning, and (high-level) concept learning. This paper presents a framework in which these, and other approaches, can be combined to form a shared representation of the current state of the robot in relation to its environment and other agents. Preliminary results from an implemented system are presented to illustrate how the framework supports behaviours commonly required of an intelligent robot
Adaptive Sentence Boundary Disambiguation
Labeling of sentence boundaries is a necessary prerequisite for many natural
language processing tasks, including part-of-speech tagging and sentence
alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate
them most systems use brittle, special-purpose regular expression grammars and
exception rules. As an alternative, we have developed an efficient, trainable
algorithm that uses a lexicon with part-of-speech probabilities and a
feed-forward neural network. After training for less than one minute, the
method correctly labels over 98.5\% of sentence boundaries in a corpus of over
27,000 sentence-boundary marks. We show the method to be efficient and easily
adaptable to different text genres, including single-case texts.Comment: This is a Latex version of the previously submitted ps file
(formatted as a uuencoded gz-compressed .tar file created by csh script). The
software from the work described in this paper is available by contacting
[email protected]
Emergence phenomena in German W-immer/auch-subordinators
The present study is concerned with the distributional patterns of the irrelevance particles immer ‘ever’ and auch ‘also’ in German universal concessive conditionals and free relatives (e.g. was immer er auch sagt ‘whatever he says’). Whereas irrelevance is conveyed by a single element in a fixed position in languages like English (-ever), immer and auch occur in multiple positions and combinations. Following the example of Leuschner (2000), the distribution of
particles and their combinations is documented and explained using functional motivations. Compared with Leuschner (2000), however, the present study is based on a much larger sample of 23,299 clauses with the W-words was and wer (incl. their inflected forms) from the DeReKo-corpus, allowing for a far more detailed statistical analysis. Special attention is devoted to the distribution of immer and auch (including their combinations) in full subordinate clauses vs.
elliptically reduced forms, and to the nature of the resulting patterns as a case of emergent grammar
A layered abduction model of perception: Integrating bottom-up and top-down processing in a multi-sense agent
A layered-abduction model of perception is presented which unifies bottom-up and top-down processing in a single logical and information-processing framework. The process of interpreting the input from each sense is broken down into discrete layers of interpretation, where at each layer a best explanation hypothesis is formed of the data presented by the layer or layers below, with the help of information available laterally and from above. The formation of this hypothesis is treated as a problem of abductive inference, similar to diagnosis and theory formation. Thus this model brings a knowledge-based problem-solving approach to the analysis of perception, treating perception as a kind of compiled cognition. The bottom-up passing of information from layer to layer defines channels of information flow, which separate and converge in a specific way for any specific sense modality. Multi-modal perception occurs where channels converge from more than one sense. This model has not yet been implemented, though it is based on systems which have been successful in medical and mechanical diagnosis and medical test interpretation
- …