4,688 research outputs found

    Automatic tagging and geotagging in video collections and communities

    Get PDF
    Automatically generated tags and geotags hold great promise to improve access to video collections and online communi- ties. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Joint Multimodal Entity-Relation Extraction Based on Edge-enhanced Graph Alignment Network and Word-pair Relation Tagging

    Full text link
    Multimodal named entity recognition (MNER) and multimodal relation extraction (MRE) are two fundamental subtasks in the multimodal knowledge graph construction task. However, the existing methods usually handle two tasks independently, which ignores the bidirectional interaction between them. This paper is the first to propose jointly performing MNER and MRE as a joint multimodal entity-relation extraction task (JMERE). Besides, the current MNER and MRE models only consider aligning the visual objects with textual entities in visual and textual graphs but ignore the entity-entity relationships and object-object relationships. To address the above challenges, we propose an edge-enhanced graph alignment network and a word-pair relation tagging (EEGA) for JMERE task. Specifically, we first design a word-pair relation tagging to exploit the bidirectional interaction between MNER and MRE and avoid the error propagation. Then, we propose an edge-enhanced graph alignment network to enhance the JMERE task by aligning nodes and edges in the cross-graph. Compared with previous methods, the proposed method can leverage the edge information to auxiliary alignment between objects and entities and find the correlations between entity-entity relationships and object-object relationships. Experiments are conducted to show the effectiveness of our model.Comment: accepted in AAAI-202

    Theory and Method for the Statistical Investigation of Multimodal Promotional Practices in the Digital Era: A Data-driven Approach Based on Systemic Functional Linguistics and Social Semiotics.

    Get PDF
    This paper provides an overview of the research design, corpora, and methodological tools developed to investigate multimodal online tourism discourse in an ongoing digital humanities research project. The overall aim of this study is to analyze the interconnections between static imagery and written text on the official websites and Instagram accounts of three popular tourist boards through the lens of SFL and using a multimodal, mixed methods approach. In other words, the project presented in this paper attempts to explore how the three metafunctions of this linguistic theory are realized in interconnected corpora of promotional textual and visual materials, and to investigate if their meaning varies according to their respective online channels

    CHORUS Deliverable 4.5: Report of the 3rd CHORUS Conference

    Get PDF
    The third and last CHORUS conference on Multimedia Search Engines took place from the 26th to the 27th of May 2009 in Brussels, Belgium. About 100 participants from 15 European countries, the US, Japan and Australia learned about the latest developments in the domain. An exhibition of 13 stands presented 16 research projects currently ongoing around the world

    Factors affecting response of dogs to obedience instruction: a field and experimental study

    Get PDF
    Communication is an essential component of the translation of learning theory into the practical control of the behaviour of dogs. A handler sends a signal (e.g. a command), to which their dog responds. This response is dependent on the dog’s perception of the signal rather than the intention of the sender. Previous research has shown that a dog’s response can be influenced by specific changes in the verbal and non-verbal qualities of signals (i.e. the commands) used, but there has been little scientific evaluation of what happens in practice. Therefore in a first study, 56 dog handlers were videotaped giving their dogs a “sit” command and the significance of verbal and non-verbal factors on response was analyzed. Two factors were associated with a significant decrease in obedience: the dog’s attention to its handler and the handler giving additional verbal information preceding the actual verbal command. Based on these results, a second more controlled study was run with 12 dogs that were trained to a new (“uff”, i.e. jumping onto a raised surface) and a known (“sit”, “down” or “paw”) command. Once trained to predefined criteria, dogs were tested for their responsiveness with each of three additional types of verbal information preceding the command: the dog’s name, the dog’s name followed by a pause of 2 seconds and a “novel word”, i.e. a word with no established relationships in this context (“Banane”). The results suggest that the addition of the novel word significantly reduced response to both the known (p = 0.014) and the new (p = 0.014) commands. The name plus a pause preceding the command significantly reduced the response to the new command (p = 0.043), but not the known one. The use of the name before the command without a pause had no significant effect on performance. The dogs’ ability to generalize learned commands from the training context to a new context was tested by going through the same procedure in an unfamiliar environment. There was a significant reduction in correct responses only to the new command independent of the preceding verbal information (name (p = 0.028), name plus pause (p = 0.022) and novel word (p = 0.011)). This suggests that dogs may have more difficulties generalizing a less well-established command than an already known command

    Smartphone picture organization: a hierarchical approach

    Get PDF
    We live in a society where the large majority of the population has a camera-equipped smartphone. In addition, hard drives and cloud storage are getting cheaper and cheaper, leading to a tremendous growth in stored personal photos. Unlike photo collections captured by a digital camera, which typically are pre-processed by the user who organizes them into event-related folders, smartphone pictures are automatically stored in the cloud. As a consequence, photo collections captured by a smartphone are highly unstructured and because smartphones are ubiquitous, they present a larger variability compared to pictures captured by a digital camera. To solve the need of organizing large smartphone photo collections automatically, we propose here a new methodology for hierarchical photo organization into topics and topic-related categories. Our approach successfully estimates latent topics in the pictures by applying probabilistic Latent Semantic Analysis, and automatically assigns a name to each topic by relying on a lexical database. Topic-related categories are then estimated by using a set of topic-specific Convolutional Neuronal Networks. To validate our approach, we ensemble and make public a large dataset of more than 8,000 smartphone pictures from 40 persons. Experimental results demonstrate major user satisfaction with respect to state of the art solutions in terms of organization.Peer ReviewedPreprin
    • 

    corecore