29,330 research outputs found
Recommended from our members
Frame Semantics in Text-to-Scene Generation
3D graphics scenes are difficult to create, requiring users to learn and utilize a series of complex menus, dialog boxes, and often tedious direct manipulation techniques. By giving up some amount of control afforded by such interfaces we have found that users can use natural language to quickly and easily create a wide variety of 3D scenes. Natural language offers an interface that is intuitive and immediately accessible by anyone, without requiring any special skill or training. The WordsEye system (http://www.wordseye.com) has been used by several thousand users on the web to create over 10,000 scenes. The system relies on a large database of 3D models and poses to depict entities and actions. We describe how the current version of the system incorporates the type of lexical and real-world knowledge needed to depict scenes from language
Recommended from our members
VigNet: Grounding Language in Graphics using Frame Semantics
This paper introduces Vignette Semantics, a lexical semantic theory based on Frame Semantics that represents conceptual and graphical relations. We also describe a lexical resource that implements this theory, VigNet, and its application in text-to-scene generation
Video summarisation: A conceptual framework and survey of the state of the art
This is the post-print (final draft post-refereeing) version of the article. Copyright @ 2007 Elsevier Inc.Video summaries provide condensed and succinct representations of the content of a video stream through a combination of still images, video segments, graphical representations and textual descriptors. This paper presents a conceptual framework for video summarisation derived from the research literature and used as a means for surveying the research literature. The framework distinguishes between video summarisation techniques (the methods used to process content from a source video stream to achieve a summarisation of that stream) and video summaries (outputs of video summarisation techniques). Video summarisation techniques are considered within three broad categories: internal (analyse information sourced directly from the video stream), external (analyse information not sourced directly from the video stream) and hybrid (analyse a combination of internal and external information). Video summaries are considered as a function of the type of content they are derived from (object, event, perception or feature based) and the functionality offered to the user for their consumption (interactive or static, personalised or generic). It is argued that video summarisation would benefit from greater incorporation of external information, particularly user based information that is unobtrusively sourced, in order to overcome longstanding challenges such as the semantic gap and providing video summaries that have greater relevance to individual users
Going Deeper with Semantics: Video Activity Interpretation using Semantic Contextualization
A deeper understanding of video activities extends beyond recognition of
underlying concepts such as actions and objects: constructing deep semantic
representations requires reasoning about the semantic relationships among these
concepts, often beyond what is directly observed in the data. To this end, we
propose an energy minimization framework that leverages large-scale commonsense
knowledge bases, such as ConceptNet, to provide contextual cues to establish
semantic relationships among entities directly hypothesized from video signal.
We mathematically express this using the language of Grenander's canonical
pattern generator theory. We show that the use of prior encoded commonsense
knowledge alleviate the need for large annotated training datasets and help
tackle imbalance in training through prior knowledge. Using three different
publicly available datasets - Charades, Microsoft Visual Description Corpus and
Breakfast Actions datasets, we show that the proposed model can generate video
interpretations whose quality is better than those reported by state-of-the-art
approaches, which have substantial training needs. Through extensive
experiments, we show that the use of commonsense knowledge from ConceptNet
allows the proposed approach to handle various challenges such as training data
imbalance, weak features, and complex semantic relationships and visual scenes.Comment: Accepted to WACV 201
- …