80,761 research outputs found
A lightweight web video model with content and context descriptions for integration with linked data
The rapid increase of video data on the Web has warranted an urgent need for effective representation, management and retrieval of web videos. Recently, many studies have been carried out for ontological representation of videos, either using domain dependent or generic schemas such as MPEG-7, MPEG-4, and COMM. In spite of their extensive coverage and sound theoretical grounding, they are yet to be widely used by users. Two main possible reasons are the complexities involved and a lack of tool support. We propose a lightweight video content model for content-context description and integration. The uniqueness of the model is that it tries to model the emerging social context to describe and interpret the video. Our approach is grounded on exploiting easily extractable evolving contextual metadata and on the availability of existing data on the Web. This enables representational homogeneity and a firm basis for information integration among semantically-enabled data sources. The model uses many existing schemas to describe various ontology classes and shows the scope of interlinking with the Linked Data cloud
Going Deeper with Semantics: Video Activity Interpretation using Semantic Contextualization
A deeper understanding of video activities extends beyond recognition of
underlying concepts such as actions and objects: constructing deep semantic
representations requires reasoning about the semantic relationships among these
concepts, often beyond what is directly observed in the data. To this end, we
propose an energy minimization framework that leverages large-scale commonsense
knowledge bases, such as ConceptNet, to provide contextual cues to establish
semantic relationships among entities directly hypothesized from video signal.
We mathematically express this using the language of Grenander's canonical
pattern generator theory. We show that the use of prior encoded commonsense
knowledge alleviate the need for large annotated training datasets and help
tackle imbalance in training through prior knowledge. Using three different
publicly available datasets - Charades, Microsoft Visual Description Corpus and
Breakfast Actions datasets, we show that the proposed model can generate video
interpretations whose quality is better than those reported by state-of-the-art
approaches, which have substantial training needs. Through extensive
experiments, we show that the use of commonsense knowledge from ConceptNet
allows the proposed approach to handle various challenges such as training data
imbalance, weak features, and complex semantic relationships and visual scenes.Comment: Accepted to WACV 201
Exploiting visual salience for the generation of referring expressions
In this paper we present a novel approach to generating
referring expressions (GRE) that is tailored to a model of the visual context the user is attending to. The approach
integrates a new computational model of visual salience in simulated 3-D environments with Dale and Reiterās (1995) Incremental Algorithm. The advantage of our GRE framework are: (1) the context set used by the GRE algorithm is dynamically computed by the visual saliency algorithm as a user navigates through a simulation; (2) the integration of visual salience into the generation process means that in some instances underspecified but sufficiently detailed descriptions of the target object are generated that are shorter than those generated by GRE algorithms which focus purely on adjectival and type attributes; (3) the integration of visual saliency into the generation process means that our GRE algorithm will in some instances succeed in generating a description of the target object in situations where GRE algorithms which focus purely on adjectival and type attributes fail
Designing Sugaropolis:digital games as a medium for conveying transnational narratives
In this paper, the authors present a case study of āSugaropolisā: a two-year practice-based project that involved interdisciplinary co-design and stakeholder evaluation of two digital game prototypes. Drawing on the diverse expertise of the research team (game design and development, human geography, and transnational narratives), the paper aims to contribute to debates about the use of digital games as a medium for representing the past. With an emphasis on design-as-research, we consider how digital games can be (co-)designed to communicate complex histories and geographies in which people, objects, and resources are connected through space and time
Conversational Sensing
Recent developments in sensing technologies, mobile devices and context-aware
user interfaces have made it possible to represent information fusion and
situational awareness as a conversational process among actors - human and
machine agents - at or near the tactical edges of a network. Motivated by use
cases in the domain of security, policing and emergency response, this paper
presents an approach to information collection, fusion and sense-making based
on the use of natural language (NL) and controlled natural language (CNL) to
support richer forms of human-machine interaction. The approach uses a
conversational protocol to facilitate a flow of collaborative messages from NL
to CNL and back again in support of interactions such as: turning eyewitness
reports from human observers into actionable information (from both trained and
untrained sources); fusing information from humans and physical sensors (with
associated quality metadata); and assisting human analysts to make the best use
of available sensing assets in an area of interest (governed by management and
security policies). CNL is used as a common formal knowledge representation for
both machine and human agents to support reasoning, semantic information fusion
and generation of rationale for inferences, in ways that remain transparent to
human users. Examples are provided of various alternative styles for user
feedback, including NL, CNL and graphical feedback. A pilot experiment with
human subjects shows that a prototype conversational agent is able to gather
usable CNL information from untrained human subjects
- ā¦