109 research outputs found

    Unified Concept-based Multimedia Information Retrieval Technique

    Get PDF
    The explosion of digital data in the last two decades followed by the development of various types of data, including text, images, audio and video known as multimedia data. Multimedia Information Retrieval is required to search various type of media. There is comprehensive information need that can not be handled by the monolithic search engine like Google, Google Image, Youtube, or FindSounds. The shortcoming of search engine today related to their format or media is the dominance of text format, while the expected information could be an image, audio or video. Hence it is necessary to present multimedia format at the same time. This paper tries to design Unified Concept-based Multimedia Information Retrieval (UCpBMIR) technique to tackle those difficulties by using unified multimedia indexing. The indexing technique transforms the various of media with their features into text representation with the concept-based algorithm and put it into the concept detector. Learning model configures the concept detector to classify the multimedia object. The result of the concept detector process is placed in unified multimedia index database and waiting for the concept-based query to be matched into the Semantic Similarities with ontology. The ontology will provide the relationship between object representation of multimedia data. Due to indexing text, image, audio, and video respectively that naturally, they are heterogeneous, but conceptually they may have the relationship among them. From the preliminary result that multimedia document retrieved can be obtained through single query any format in order to retrieve all kind of multimedia format. Unified multimedia indexing technique with ontology will unify each format of multimedi

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

    동적 멀티모달 데이터 학습을 위한 심층 하이퍼네트워크

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 장병탁.Recent advancements in information communication technology has led the explosive increase of data. Dissimilar to traditional data which are structured and unimodal, in particular, the characteristics of recent data generated from dynamic environments are summarized as high-dimensionality, multimodality, and structurelessness as well as huge-scale size. The learning from non-stationary multimodal data is essential for solving many difficult problems in artificial intelligence. However, despite many successful reports, existing machine learning methods have mainly focused on solving practical problems represented by large-scaled but static databases, such as image classification, tagging, and retrieval. Hypernetworks are a probabilistic graphical model representing empirical distribution, using a hypergraph structure that is a large collection of many hyperedges encoding the associations among variables. This representation allows the model to be suitable for characterizing the complex relationships between features with a population of building blocks. However, since a hypernetwork is represented by a huge combinatorial feature space, the model requires a large number of hyperedges for handling the multimodal large-scale data and thus faces the scalability problem. In this dissertation, we propose a deep architecture of hypernetworks for dealing with the scalability issue for learning from multimodal data with non-stationary properties such as videos, i.e., deep hypernetworks. Deep hypernetworks handle the issues through the abstraction at multiple levels using a hierarchy of multiple hypergraphs. We use a stochastic method based on Monte-Carlo simulation, a graph MC, for efficiently constructing hypergraphs representing the empirical distribution of the observed data. The structure of a deep hypernetwork continuously changes as the learning proceeds, and this flexibility is contrasted to other deep learning models. The proposed model incrementally learns from the data, thus handling the nonstationary properties such as concept drift. The abstract representations in the learned models play roles of multimodal knowledge on data, which are used for the content-aware crossmodal transformation including vision-language conversion. We view the vision-language conversion as a machine translation, and thus formulate the vision-language translation in terms of the statistical machine translation. Since the knowledge on the video stories are used for translation, we call this story-aware vision-language translation. We evaluate deep hypernetworks on large-scale vision-language multimodal data including benmarking datasets and cartoon video series. The experimental results show the deep hypernetworks effectively represent visual-linguistic information abstracted at multiple levels of the data contents as well as the associations between vision and language. We explain how the introduction of a hierarchy deals with the scalability and non-stationary properties. In addition, we present the story-aware vision-language translation on cartoon videos by generating scene images from sentences and descriptive subtitles from scene images. Furthermore, we discuss the meaning of our model for lifelong learning and the improvement direction for achieving human-level artificial intelligence.1 Introduction 1.1 Background and Motivation 1.2 Problems to be Addressed 1.3 The Proposed Approach and its Contribution 1.4 Organization of the Dissertation 2 RelatedWork 2.1 Multimodal Leanring 2.2 Models for Learning from Multimodal Data 2.2.1 Topic Model-Based Multimodal Leanring 2.2.2 Deep Network-based Multimodal Leanring 2.3 Higher-Order Graphical Models 2.3.1 Hypernetwork Models 2.3.2 Bayesian Evolutionary Learning of Hypernetworks 3 Multimodal Hypernetworks for Text-to-Image Retrievals 3.1 Overview 3.2 Hypernetworks for Multimodal Associations 3.2.1 Multimodal Hypernetworks 3.2.2 Incremental Learning of Multimodal Hypernetworks 3.3 Text-to-Image Crossmodal Inference 3.3.1 Representatation of Textual-Visual Data 3.3.2 Text-to-Image Query Expansion 3.4 Text-to-Image Retrieval via Multimodal Hypernetworks 3.4.1 Data and Experimental Settings 3.4.2 Text-to-Image Retrieval Performance 3.4.3 Incremental Learning for Text-to-Image Retrieval 3.5 Summary 4 Deep Hypernetworks for Multimodal Cocnept Learning from Cartoon Videos 4.1 Overview 4.2 Visual-Linguistic Concept Representation of Catoon Videos 4.3 Deep Hypernetworks for Modeling Visual-Linguistic Concepts 4.3.1 Sparse Population Coding 4.3.2 Deep Hypernetworks for Concept Hierarchies 4.3.3 Implication of Deep Hypernetworks on Cognitive Modeling 4.4 Learning of Deep Hypernetworks 4.4.1 Problem Space of Deep Hypernetworks 4.4.2 Graph Monte-Carlo Simulation 4.4.3 Learning of Concept Layers 4.4.4 Incremental Concept Construction 4.5 Incremental Concept Construction from Catoon Videos 4.5.1 Data Description and Parameter Setup 4.5.2 Concept Representation and Development 4.5.3 Character Classification via Concept Learning 4.5.4 Vision-Language Conversion via Concept Learning 4.6 Summary 5 Story-awareVision-LanguageTranslation usingDeepConcept Hiearachies 5.1 Overview 5.2 Vision-Language Conversion as a Machine Translation 5.2.1 Statistical Machine Translation 5.2.2 Vision-Language Translation 5.3 Story-aware Vision-Language Translation using Deep Concept Hierarchies 5.3.1 Story-aware Vision-Language Translation 5.3.2 Vision-to-Language Translation 5.3.3 Language-to-Vision Translation 5.4 Story-aware Vision-Language Translation on Catoon Videos 5.4.1 Data and Experimental Setting 5.4.2 Scene-to-Sentence Generation 5.4.3 Sentence-to-Scene Generation 5.4.4 Visual-Linguistic Story Summarization of Cartoon Videos 5.5 Summary 6 Concluding Remarks 6.1 Summary of the Dissertation 6.2 Directions for Further Research Bibliography 한글초록Docto

    Characterization of unstructured video

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 1999.Includes bibliographical references (p. 135-139).In this work, we examine video retrieval from a synthesis perspective in co-operation with the more common analysis perspective. Specifically, we target our algorithms for one particular domain- unstructured video material. The goal is to make this unstructured video available for manipulation in interesting ways. I.e, take video that may have been shot with no specific intent and use it in different settings. For example, we build a set of interfaces that will enable taking a collection of home videos and making Christmas cards, Refrigerator magnets, family dramas etc out of them. The work is divided into three parts. First, we study features and models for characterization of video. Examples are VideoBook with its extensions and Hidden Markov Models for video analysis. Secondly, we examine clustering as an approach for characterization of unstructured video. Clustering alleviates some of the common problems with "query-by- example" and presents groupings that rely on the user's abilities to make relevant connections. The clustering techniques we employ operate in the probability density space. One of our goals is to employ these techniques with sophisticated models such as Bayesian Networks and HMMs, which give similar descriptions. The clustering techniques we employ are shown to be optimal in an information theoretic and Gibbs Free Energy sense. Finally, we present a set of interfaces that use these features and groupings to enable browsing and editing of unstructured video content.by Giridharan Ranganathan Iyengar.Ph.D

    The 1993 Goddard Conference on Space Applications of Artificial Intelligence

    Get PDF
    This publication comprises the papers presented at the 1993 Goddard Conference on Space Applications of Artificial Intelligence held at the NASA/Goddard Space Flight Center, Greenbelt, MD on May 10-13, 1993. The purpose of this annual conference is to provide a forum in which current research and development directed at space applications of artificial intelligence can be presented and discussed

    Grounding semantic cognition using computational modelling and network analysis

    Get PDF
    The overarching objective of this thesis is to further the field of grounded semantics using a range of computational and empirical studies. Over the past thirty years, there have been many algorithmic advances in the modelling of semantic cognition. A commonality across these cognitive models is a reliance on hand-engineering “toy-models”. Despite incorporating newer techniques (e.g. Long short-term memory), the model inputs remain unchanged. We argue that the inputs to these traditional semantic models have little resemblance with real human experiences. In this dissertation, we ground our neural network models by training them with real-world visual scenes using naturalistic photographs. Our approach is an alternative to both hand-coded features and embodied raw sensorimotor signals. We conceptually replicate the mutually reinforcing nature of hybrid (feature-based and grounded) representations using silhouettes of concrete concepts as model inputs. We next gradually develop a novel grounded cognitive semantic representation which we call scene2vec, starting with object co-occurrences and then adding emotions and language-based tags. Limitations of our scene-based representation are identified for more abstract concepts (e.g. freedom). We further present a large-scale human semantics study, which reveals small-world semantic network topologies are context-dependent and that scenes are the most dominant cognitive dimension. This finding leads us to conclude that there is no meaning without context. Lastly, scene2vec shows promising human-like context-sensitive stereotypes (e.g. gender role bias), and we explore how such stereotypes are reduced by targeted debiasing. In conclusion, this thesis provides support for a novel computational viewpoint on investigating meaning - scene-based grounded semantics. Future research scaling scene-based semantic models to human-levels through virtual grounding has the potential to unearth new insights into the human mind and concurrently lead to advancements in artificial general intelligence by enabling robots, embodied or otherwise, to acquire and represent meaning directly from the environment

    The 1995 Goddard Conference on Space Applications of Artificial Intelligence and Emerging Information Technologies

    Get PDF
    This publication comprises the papers presented at the 1995 Goddard Conference on Space Applications of Artificial Intelligence and Emerging Information Technologies held at the NASA/Goddard Space Flight Center, Greenbelt, Maryland, on May 9-11, 1995. The purpose of this annual conference is to provide a forum in which current research and development directed at space applications of artificial intelligence can be presented and discussed

    CIRA annual report FY 2017/2018

    Get PDF
    Reporting period April 1, 2017-March 31, 2018

    Deep Learning for Time-Series Analysis of Optical Satellite Imagery

    Get PDF
    In this cumulative thesis, I cover four papers on time-series analysis of optical satellite imagery. The contribution is split into two parts. The first one introduces DENETHOR and DynamicEarthNet, two landmark datasets with high-quality ground truth data for agricultural monitoring and change detection. Second, I introduce SiROC and SemiSiROC, two methodological contributions to label-efficient change detection
    corecore