63,675 research outputs found

    Multimodal Visual Concept Learning with Weakly Supervised Techniques

    Full text link
    Despite the availability of a huge amount of video data accompanied by descriptive texts, it is not always easy to exploit the information contained in natural language in order to automatically recognize video concepts. Towards this goal, in this paper we use textual cues as means of supervision, introducing two weakly supervised techniques that extend the Multiple Instance Learning (MIL) framework: the Fuzzy Sets Multiple Instance Learning (FSMIL) and the Probabilistic Labels Multiple Instance Learning (PLMIL). The former encodes the spatio-temporal imprecision of the linguistic descriptions with Fuzzy Sets, while the latter models different interpretations of each description's semantics with Probabilistic Labels, both formulated through a convex optimization algorithm. In addition, we provide a novel technique to extract weak labels in the presence of complex semantics, that consists of semantic similarity computations. We evaluate our methods on two distinct problems, namely face and action recognition, in the challenging and realistic setting of movies accompanied by their screenplays, contained in the COGNIMUSE database. We show that, on both tasks, our method considerably outperforms a state-of-the-art weakly supervised approach, as well as other baselines.Comment: CVPR 201

    Finding Person Relations in Image Data of the Internet Archive

    Full text link
    The multimedia content in the World Wide Web is rapidly growing and contains valuable information for many applications in different domains. For this reason, the Internet Archive initiative has been gathering billions of time-versioned web pages since the mid-nineties. However, the huge amount of data is rarely labeled with appropriate metadata and automatic approaches are required to enable semantic search. Normally, the textual content of the Internet Archive is used to extract entities and their possible relations across domains such as politics and entertainment, whereas image and video content is usually neglected. In this paper, we introduce a system for person recognition in image content of web news stored in the Internet Archive. Thus, the system complements entity recognition in text and allows researchers and analysts to track media coverage and relations of persons more precisely. Based on a deep learning face recognition approach, we suggest a system that automatically detects persons of interest and gathers sample material, which is subsequently used to identify them in the image data of the Internet Archive. We evaluate the performance of the face recognition system on an appropriate standard benchmark dataset and demonstrate the feasibility of the approach with two use cases

    FAME: Face Association through Model Evolution

    Full text link
    We attack the problem of learning face models for public faces from weakly-labelled images collected from web through querying a name. The data is very noisy even after face detection, with several irrelevant faces corresponding to other people. We propose a novel method, Face Association through Model Evolution (FAME), that is able to prune the data in an iterative way, for the face models associated to a name to evolve. The idea is based on capturing discriminativeness and representativeness of each instance and eliminating the outliers. The final models are used to classify faces on novel datasets with possibly different characteristics. On benchmark datasets, our results are comparable to or better than state-of-the-art studies for the task of face identification.Comment: Draft version of the stud

    Development of a novel 3D simulation modelling system for distributed manufacturing

    Get PDF
    This paper describes a novel 3D simulation modelling system for supporting our distributed machine design and control paradigm with respect to simulating and emulating machine behaviour on the Internet. The system has been designed and implemented using Java2D and Java3D. An easy assembly concept of drag-and-drop assembly has been realised and implemented by the introduction of new connection features (unified interface assembly features) between two assembly components (modules). The system comprises a hierarchical geometric modeller, a behavioural editor, and two assemblers. During modelling, designers can combine basic modelling primitives with general extrusions and integrate CAD geometric models into simulation models. Each simulation component (module) model can be visualised and animated in VRML browsers. It is reusable. This makes machine design re-configurable and flexible. A case study example is given to support our conclusions

    Towards Better Understanding Researcher Strategies in Cross-Lingual Event Analytics

    Full text link
    With an increasing amount of information on globally important events, there is a growing demand for efficient analytics of multilingual event-centric information. Such analytics is particularly challenging due to the large amount of content, the event dynamics and the language barrier. Although memory institutions increasingly collect event-centric Web content in different languages, very little is known about the strategies of researchers who conduct analytics of such content. In this paper we present researchers' strategies for the content, method and feature selection in the context of cross-lingual event-centric analytics observed in two case studies on multilingual Wikipedia. We discuss the influence factors for these strategies, the findings enabled by the adopted methods along with the current limitations and provide recommendations for services supporting researchers in cross-lingual event-centric analytics.Comment: In Proceedings of the International Conference on Theory and Practice of Digital Libraries 201

    NTCIR Lifelog: The First Test Collection for Lifelog Research

    Get PDF
    Test collections have a long history of supporting repeatable and comparable evaluation in Information Retrieval (IR). However, thus far, no shared test collection exists for IR systems that are designed to index and retrieve multimodal lifelog data. In this paper we introduce the first test col- lection for personal lifelog data. The requirements for such a test collection are motivated, the process of creating the test collection is described, along with an overview of the test collection and finally suggestions are given for possible applications of the test collection, which has been employed for the NTCIR12-Lifelog task

    Knowledge web: realising the semantic web... all the way to knowledge-enhanced multimedia documents

    Get PDF
    The semantic web and semantic web services are major efforts in order to spread and to integrate knowledge technology to the whole web. The Knowledge Web network of excellence aims at supporting their developments at the best and largest European level and supporting industry in adopting them. It especially investigates the solution of scalability, heterogeneity and dynamics obstacles to the full development of the semantic web. We explain how Knowledge Web results should benefit knowledge-enhanced multimedia applications
    • …
    corecore