136,170 research outputs found
Multilingual Information Framework for Handling textual data in Digital Media
This document presents MLIF (Multi Lingual Information Framework), a high-level model for describing multilingual data across a wide range of possible applications in the translation/localization process within several multimedia domains (e.g. broadcasting interactive programs within a multilingual community)
UJM at ImageCLEFwiki 2008
6 pagesThis paper reports our multimedia information retrieval experiments carried out for the ImageCLEF track (ImageCLEFwiki). The task is to answer to user information needs, i.e. queries which may be composed of several modalities (text, image, concept) with ranked lists of relevant documents. The purpose of our experiments is twofold: firstly, our overall aim is to develop a multimedia document model combining text and/or image modalities. Secondly, we aim to compare results of our model using a multimedia query with a text only model. Our multimedia document model is based on a vector of textual and visual terms. The textual terms correspond to words. The visual ones result from local colour descriptors which are automatically extracted and quantized by k-means, leading to an image vocabulary. They represent the colour property of an image region. To perform a query, we compute a similarity score between each document vector (textual + visual terms) and the query using the Okapi method based on the tf.idf approach. We have submitted 6 runs either automatic or manual, using textual, visual or both information. Thanks to these 6 runs, we aim to study several aspects of our model, as the choice of the visual words and local features, the way of combining textual and visual words for a query and the performance improvements obtained when adding visual information to a pure textual model. Concerning the choice of the visual words, results show us that they are significant in some cases where the visualness of the query is meaningful. The conclusion about the combination of textual and visual words is surprising. We obtain worth results when we add directly the text to the visual words. Finally, results also inform that visual information bring complementary relevant documents that were not found with the text query. These initial results are promising and encourage the development of our multimedia model
Backus-Naur Form Based Script Definition Language for Multimedia Presentation Document
The integration of the text, graphic, audio, video and animation on the desktop promises
to fundamentally challenge the old models of the printed document as the basis for
information exchange. A multimedia document is a specification activity that can be
used to coordinate the presentation runtime of the media objects. Several language that
support the multimedia document exist today, for example HTML (HyperText Markup
Language) and SMIL (Synchronized Multimedia Interaction Language). HTML is an
SGML (Standard Generalized Markup Language) based standard document model that
defines syntax to enrich text pages with structural and layout information. The dynamic
modification to structure, layout and content of an HTML document are allowed using a scripting language which is known as DHTML (Dynamic HyperText Markup
Language). SMIL is the web format for multimedia document, which is based on XML
(extensible Markup Language).
Driven by the use of the text markup tags in the multimedia document, the Script
Definition Language or simply SDL is developed. The SDL is a definition language for
multimedia document that provides a specification to include mulltimedia elements,
such as text, image, animation, audio, and video. The structure of the SDL- is described
using the Extended Backus-Naur Form (EBNF). In the EBNF, one way to determine the
semantic of the language is achieved by derivation. The standard method to derive the
semantic of the language in EBNF is using a parse tree.
The multimedia document proposed is called the script document. There is a browser
called the Script Multimedia Presentation (SMP) system, which is developed to
generate the presentation output. The browser system scans the input file and produces
error messages if it does not fulfill the specification. Each of the input documents
derives a parse tree to show that the syntax follows the specification. Only the valid
input document derives a valid parse tree and produces output. This can be concluded
that the input document should strictly follow the SDL specification in order to generate
the multimedia presentation
Information retrieval of mass encrypted data over multimedia networking with N-level vector model-based relevancy ranking
With an explosive growth in the deployment of networked applications over the Internet, searching the encrypted information that the user needs becomes increasingly important. However, the information search precision is quite low when using Vector space model for mass information retrieval, because long documents having poor similarity values are poorly represented in the vector space model and the order in which the terms appear in the document is lost in the vector space representation with intuitive weighting. To address the problems, this study proposed an N-level vector model (NVM)-based relevancy ranking scheme with an introduction of a new formula of the term weighting, taking into account the location of the feature term in the document to describe the content of the document properly, investigated into ways of ranking the encrypted documents using the proposed scheme, and conducted realistic simulation of information retrieval of mass encrypted data over multimedia networking. Results indicated that the timing of the index building, the most costing part of the relevancy ranking scheme, increased with the increase in both the document size and the multimedia content of the document being searched, which is in agreement with the expected. Performance evaluation demonstrated that our specially designed NVM-based encrypted information retrieval system is effective in ranking the encrypted documents transmitted over multimedia networks with large recall ratio and great retrieval precision
Integrating multimedia characteristics in web-based document languages
A single multimedia document model needs to include a wide range of different types of information. In particular, information about space and time is essential for determining the spatial and temporal placement of elements within a presentation. Each information type included in a document model requires its own structuring mechanisms. The language used to express the document model has to be able to encapsulate the plurality of required structures. While this is a process that can be carried out relatively easily during the initial design of a language, it is more difficult in the case t
Combining text/image in WikipediaMM task 2009
6 pagesThis paper reports our multimedia information retrieval experiments carried out for the ImageCLEF track 2009. In 2008, we proposed a multimedia document model defined as a vector of textual and visual terms weighted using a tf.idf approch [5]. For our second participation, our goal was to improve this previous model in the following ways: 1) use of additional information for the textual part (legend and image bounding text extracted from the original documents, 2) use of different image detectors and descriptors, 3) new text / image combination approach. Results allow to evaluate the benefits of these different improvements
A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data
Topic modeling based on latent Dirichlet allocation (LDA) has been a
framework of choice to deal with multimodal data, such as in image annotation
tasks. Another popular approach to model the multimodal data is through deep
neural networks, such as the deep Boltzmann machine (DBM). Recently, a new type
of topic model called the Document Neural Autoregressive Distribution Estimator
(DocNADE) was proposed and demonstrated state-of-the-art performance for text
document modeling. In this work, we show how to successfully apply and extend
this model to multimodal data, such as simultaneous image classification and
annotation. First, we propose SupDocNADE, a supervised extension of DocNADE,
that increases the discriminative power of the learned hidden topic features
and show how to employ it to learn a joint representation from image visual
words, annotation words and class label information. We test our model on the
LabelMe and UIUC-Sports data sets and show that it compares favorably to other
topic models. Second, we propose a deep extension of our model and provide an
efficient way of training the deep model. Experimental results show that our
deep model outperforms its shallow version and reaches state-of-the-art
performance on the Multimedia Information Retrieval (MIR) Flickr data set.Comment: 24 pages, 10 figures. A version has been accepted by TPAMI on Aug
4th, 2015. Add footnote about how to train the model in practice in Section
5.1. arXiv admin note: substantial text overlap with arXiv:1305.530
Discourse knowledge in device independent document formatting
Most document structures define layout structures which implicitly define semantic relationships between content elements. While document structures for text are well established (books, reports, papers etc.), models for time based documents such as multimedia and hypermedia are relatively new and lack established document structures. Traditional document description languages convey domain-dependent semantic relationships implicitly, using domain-independent mark-up for expressing layout. This works well for textual documents a,s for example, CSS and HTML demonstrate. True device independence, however, sometimes requires a change of document model to maintain the content semantics. To achieve this we need explicit information about the dis
Structured multimedia authoring
We present the user interface to the CMIF authoring environment for constructing and playing multimedia presentations. The CMIF authoring environment supports a rich hypermedia document model allowing structure-based composition of multimedia presentations and the specification of synchronization constraints between constituent media items. An author constructs a multimedia presentation in terms of its structure and additional synchronization constraints, from which the CMIF player derives the precise timing information for the presentation. We discuss the advantages of a structured approach to authoring multimedia, and describe the facilities in the CMIF authoring environment for supporting this approach. The authoring environment presents three main views of a multimedia presentation: the hierarchy view is used for manipulating and viewing a presentation's hierarchical structure; the channel view is used for managing logical resources and specifying and viewing precise timing constra..
Multimedia authoring, development environments, and digital video editing
Multimedia systems integrate text, audio, video, graphics, and other media and allow them to be utilized in a combined and interactive manner. Using this exciting and rapidly developing technology, multimedia applications can provide extensive benefits in a variety of arenas, including research, education, medicine, and commerce. While there are many commercial multimedia development packages, the easy and fast creation of a useful, full-featured multimedia document is not yet a straightforward task.
This paper addresses issues in the development of multimedia documents, ranging from user-interface tools that manipulate multimedia documents to multimedia communication technologies such as compression, digital video editing and information retrieval. It outlines the basic steps in the multimedia authoring process and some of the requirements that need to be met by multimedia development environments. It also presents the role of video, an essential component of multimedia systems and the role of programming in digital video editing. A model is described for remote access of distributed video. The paper concludes with a discussion of future research directions and new uses of multimedia documents
- …