136,170 research outputs found

    Multilingual Information Framework for Handling textual data in Digital Media

    Get PDF
    This document presents MLIF (Multi Lingual Information Framework), a high-level model for describing multilingual data across a wide range of possible applications in the translation/localization process within several multimedia domains (e.g. broadcasting interactive programs within a multilingual community)

    UJM at ImageCLEFwiki 2008

    No full text
    6 pagesThis paper reports our multimedia information retrieval experiments carried out for the ImageCLEF track (ImageCLEFwiki). The task is to answer to user information needs, i.e. queries which may be composed of several modalities (text, image, concept) with ranked lists of relevant documents. The purpose of our experiments is twofold: firstly, our overall aim is to develop a multimedia document model combining text and/or image modalities. Secondly, we aim to compare results of our model using a multimedia query with a text only model. Our multimedia document model is based on a vector of textual and visual terms. The textual terms correspond to words. The visual ones result from local colour descriptors which are automatically extracted and quantized by k-means, leading to an image vocabulary. They represent the colour property of an image region. To perform a query, we compute a similarity score between each document vector (textual + visual terms) and the query using the Okapi method based on the tf.idf approach. We have submitted 6 runs either automatic or manual, using textual, visual or both information. Thanks to these 6 runs, we aim to study several aspects of our model, as the choice of the visual words and local features, the way of combining textual and visual words for a query and the performance improvements obtained when adding visual information to a pure textual model. Concerning the choice of the visual words, results show us that they are significant in some cases where the visualness of the query is meaningful. The conclusion about the combination of textual and visual words is surprising. We obtain worth results when we add directly the text to the visual words. Finally, results also inform that visual information bring complementary relevant documents that were not found with the text query. These initial results are promising and encourage the development of our multimedia model

    Backus-Naur Form Based Script Definition Language for Multimedia Presentation Document

    Get PDF
    The integration of the text, graphic, audio, video and animation on the desktop promises to fundamentally challenge the old models of the printed document as the basis for information exchange. A multimedia document is a specification activity that can be used to coordinate the presentation runtime of the media objects. Several language that support the multimedia document exist today, for example HTML (HyperText Markup Language) and SMIL (Synchronized Multimedia Interaction Language). HTML is an SGML (Standard Generalized Markup Language) based standard document model that defines syntax to enrich text pages with structural and layout information. The dynamic modification to structure, layout and content of an HTML document are allowed using a scripting language which is known as DHTML (Dynamic HyperText Markup Language). SMIL is the web format for multimedia document, which is based on XML (extensible Markup Language). Driven by the use of the text markup tags in the multimedia document, the Script Definition Language or simply SDL is developed. The SDL is a definition language for multimedia document that provides a specification to include mulltimedia elements, such as text, image, animation, audio, and video. The structure of the SDL- is described using the Extended Backus-Naur Form (EBNF). In the EBNF, one way to determine the semantic of the language is achieved by derivation. The standard method to derive the semantic of the language in EBNF is using a parse tree. The multimedia document proposed is called the script document. There is a browser called the Script Multimedia Presentation (SMP) system, which is developed to generate the presentation output. The browser system scans the input file and produces error messages if it does not fulfill the specification. Each of the input documents derives a parse tree to show that the syntax follows the specification. Only the valid input document derives a valid parse tree and produces output. This can be concluded that the input document should strictly follow the SDL specification in order to generate the multimedia presentation

    Information retrieval of mass encrypted data over multimedia networking with N-level vector model-based relevancy ranking

    Get PDF
    With an explosive growth in the deployment of networked applications over the Internet, searching the encrypted information that the user needs becomes increasingly important. However, the information search precision is quite low when using Vector space model for mass information retrieval, because long documents having poor similarity values are poorly represented in the vector space model and the order in which the terms appear in the document is lost in the vector space representation with intuitive weighting. To address the problems, this study proposed an N-level vector model (NVM)-based relevancy ranking scheme with an introduction of a new formula of the term weighting, taking into account the location of the feature term in the document to describe the content of the document properly, investigated into ways of ranking the encrypted documents using the proposed scheme, and conducted realistic simulation of information retrieval of mass encrypted data over multimedia networking. Results indicated that the timing of the index building, the most costing part of the relevancy ranking scheme, increased with the increase in both the document size and the multimedia content of the document being searched, which is in agreement with the expected. Performance evaluation demonstrated that our specially designed NVM-based encrypted information retrieval system is effective in ranking the encrypted documents transmitted over multimedia networks with large recall ratio and great retrieval precision

    Integrating multimedia characteristics in web-based document languages

    Get PDF
    A single multimedia document model needs to include a wide range of different types of information. In particular, information about space and time is essential for determining the spatial and temporal placement of elements within a presentation. Each information type included in a document model requires its own structuring mechanisms. The language used to express the document model has to be able to encapsulate the plurality of required structures. While this is a process that can be carried out relatively easily during the initial design of a language, it is more difficult in the case t

    Combining text/image in WikipediaMM task 2009

    No full text
    6 pagesThis paper reports our multimedia information retrieval experiments carried out for the ImageCLEF track 2009. In 2008, we proposed a multimedia document model defined as a vector of textual and visual terms weighted using a tf.idf approch [5]. For our second participation, our goal was to improve this previous model in the following ways: 1) use of additional information for the textual part (legend and image bounding text extracted from the original documents, 2) use of different image detectors and descriptors, 3) new text / image combination approach. Results allow to evaluate the benefits of these different improvements

    A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data

    Full text link
    Topic modeling based on latent Dirichlet allocation (LDA) has been a framework of choice to deal with multimodal data, such as in image annotation tasks. Another popular approach to model the multimodal data is through deep neural networks, such as the deep Boltzmann machine (DBM). Recently, a new type of topic model called the Document Neural Autoregressive Distribution Estimator (DocNADE) was proposed and demonstrated state-of-the-art performance for text document modeling. In this work, we show how to successfully apply and extend this model to multimodal data, such as simultaneous image classification and annotation. First, we propose SupDocNADE, a supervised extension of DocNADE, that increases the discriminative power of the learned hidden topic features and show how to employ it to learn a joint representation from image visual words, annotation words and class label information. We test our model on the LabelMe and UIUC-Sports data sets and show that it compares favorably to other topic models. Second, we propose a deep extension of our model and provide an efficient way of training the deep model. Experimental results show that our deep model outperforms its shallow version and reaches state-of-the-art performance on the Multimedia Information Retrieval (MIR) Flickr data set.Comment: 24 pages, 10 figures. A version has been accepted by TPAMI on Aug 4th, 2015. Add footnote about how to train the model in practice in Section 5.1. arXiv admin note: substantial text overlap with arXiv:1305.530

    Discourse knowledge in device independent document formatting

    Get PDF
    Most document structures define layout structures which implicitly define semantic relationships between content elements. While document structures for text are well established (books, reports, papers etc.), models for time based documents such as multimedia and hypermedia are relatively new and lack established document structures. Traditional document description languages convey domain-dependent semantic relationships implicitly, using domain-independent mark-up for expressing layout. This works well for textual documents a,s for example, CSS and HTML demonstrate. True device independence, however, sometimes requires a change of document model to maintain the content semantics. To achieve this we need explicit information about the dis

    Structured multimedia authoring

    Get PDF
    We present the user interface to the CMIF authoring environment for constructing and playing multimedia presentations. The CMIF authoring environment supports a rich hypermedia document model allowing structure-based composition of multimedia presentations and the specification of synchronization constraints between constituent media items. An author constructs a multimedia presentation in terms of its structure and additional synchronization constraints, from which the CMIF player derives the precise timing information for the presentation. We discuss the advantages of a structured approach to authoring multimedia, and describe the facilities in the CMIF authoring environment for supporting this approach. The authoring environment presents three main views of a multimedia presentation: the hierarchy view is used for manipulating and viewing a presentation's hierarchical structure; the channel view is used for managing logical resources and specifying and viewing precise timing constra..

    Multimedia authoring, development environments, and digital video editing

    Get PDF
    Multimedia systems integrate text, audio, video, graphics, and other media and allow them to be utilized in a combined and interactive manner. Using this exciting and rapidly developing technology, multimedia applications can provide extensive benefits in a variety of arenas, including research, education, medicine, and commerce. While there are many commercial multimedia development packages, the easy and fast creation of a useful, full-featured multimedia document is not yet a straightforward task. This paper addresses issues in the development of multimedia documents, ranging from user-interface tools that manipulate multimedia documents to multimedia communication technologies such as compression, digital video editing and information retrieval. It outlines the basic steps in the multimedia authoring process and some of the requirements that need to be met by multimedia development environments. It also presents the role of video, an essential component of multimedia systems and the role of programming in digital video editing. A model is described for remote access of distributed video. The paper concludes with a discussion of future research directions and new uses of multimedia documents
    corecore