1,365 research outputs found

    Processing and Linking Audio Events in Large Multimedia Archives: The EU inEvent Project

    Get PDF
    In the inEvent EU project [1], we aim at structuring, retrieving, and sharing large archives of networked, and dynamically changing, multimedia recordings, mainly consisting of meetings, videoconferences, and lectures. More specifically, we are developing an integrated system that performs audiovisual processing of multimedia recordings, and labels them in terms of interconnected “hyper-events ” (a notion inspired from hyper-texts). Each hyper-event is composed of simpler facets, including audio-video recordings and metadata, which are then easier to search, retrieve and share. In the present paper, we mainly cover the audio processing aspects of the system, including speech recognition, speaker diarization and linking (across recordings), the use of these features for hyper-event indexing and recommendation, and the search portal. We present initial results for feature extraction from lecture recordings using the TED talks. Index Terms: Networked multimedia events; audio processing: speech recognition; speaker diarization and linking; multimedia indexing and searching; hyper-events. 1

    Relating Multimodal Imagery Data in 3D

    Get PDF
    This research develops and improves the fundamental mathematical approaches and techniques required to relate imagery and imagery derived multimodal products in 3D. Image registration, in a 2D sense, will always be limited by the 3D effects of viewing geometry on the target. Therefore, effects such as occlusion, parallax, shadowing, and terrain/building elevation can often be mitigated with even a modest amounts of 3D target modeling. Additionally, the imaged scene may appear radically different based on the sensed modality of interest; this is evident from the differences in visible, infrared, polarimetric, and radar imagery of the same site. This thesis develops a `model-centric\u27 approach to relating multimodal imagery in a 3D environment. By correctly modeling a site of interest, both geometrically and physically, it is possible to remove/mitigate some of the most difficult challenges associated with multimodal image registration. In order to accomplish this feat, the mathematical framework necessary to relate imagery to geometric models is thoroughly examined. Since geometric models may need to be generated to apply this `model-centric\u27 approach, this research develops methods to derive 3D models from imagery and LIDAR data. Of critical note, is the implementation of complimentary techniques for relating multimodal imagery that utilize the geometric model in concert with physics based modeling to simulate scene appearance under diverse imaging scenarios. Finally, the often neglected final phase of mapping localized image registration results back to the world coordinate system model for final data archival are addressed. In short, once a target site is properly modeled, both geometrically and physically, it is possible to orient the 3D model to the same viewing perspective as a captured image to enable proper registration. If done accurately, the synthetic model\u27s physical appearance can simulate the imaged modality of interest while simultaneously removing the 3-D ambiguity between the model and the captured image. Once registered, the captured image can then be archived as a texture map on the geometric site model. In this way, the 3D information that was lost when the image was acquired can be regained and properly related with other datasets for data fusion and analysis

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Reachability Analysis of Graph Modelled Collections

    Get PDF
    This paper is concerned with potential recall in multimodal information retrieval in graph-based models. We provide a framework to leverage individuality and combination of features of different modalities through our formulation of faceted search. We employ a potential recall analysis on a test collection to gain insight on the corpus and further highlight the role of multiple facets, relations between the objects, and semantic links in recall improvement. We conduct the experiments on a multimodal dataset containing approximately 400,000 documents and images. We demonstrate that leveraging multiple facets increases most notably the recall for very hard topics by up to 316%

    Personalization in cultural heritage: the road travelled and the one ahead

    Get PDF
    Over the last 20 years, cultural heritage has been a favored domain for personalization research. For years, researchers have experimented with the cutting edge technology of the day; now, with the convergence of internet and wireless technology, and the increasing adoption of the Web as a platform for the publication of information, the visitor is able to exploit cultural heritage material before, during and after the visit, having different goals and requirements in each phase. However, cultural heritage sites have a huge amount of information to present, which must be filtered and personalized in order to enable the individual user to easily access it. Personalization of cultural heritage information requires a system that is able to model the user (e.g., interest, knowledge and other personal characteristics), as well as contextual aspects, select the most appropriate content, and deliver it in the most suitable way. It should be noted that achieving this result is extremely challenging in the case of first-time users, such as tourists who visit a cultural heritage site for the first time (and maybe the only time in their life). In addition, as tourism is a social activity, adapting to the individual is not enough because groups and communities have to be modeled and supported as well, taking into account their mutual interests, previous mutual experience, and requirements. How to model and represent the user(s) and the context of the visit and how to reason with regard to the information that is available are the challenges faced by researchers in personalization of cultural heritage. Notwithstanding the effort invested so far, a definite solution is far from being reached, mainly because new technology and new aspects of personalization are constantly being introduced. This article surveys the research in this area. Starting from the earlier systems, which presented cultural heritage information in kiosks, it summarizes the evolution of personalization techniques in museum web sites, virtual collections and mobile guides, until recent extension of cultural heritage toward the semantic and social web. The paper concludes with current challenges and points out areas where future research is needed

    Multi-modal Machine Learning in Engineering Design: A Review and Future Directions

    Full text link
    In the rapidly advancing field of multi-modal machine learning (MMML), the convergence of multiple data modalities has the potential to reshape various applications. This paper presents a comprehensive overview of the current state, advancements, and challenges of MMML within the sphere of engineering design. The review begins with a deep dive into five fundamental concepts of MMML:multi-modal information representation, fusion, alignment, translation, and co-learning. Following this, we explore the cutting-edge applications of MMML, placing a particular emphasis on tasks pertinent to engineering design, such as cross-modal synthesis, multi-modal prediction, and cross-modal information retrieval. Through this comprehensive overview, we highlight the inherent challenges in adopting MMML in engineering design, and proffer potential directions for future research. To spur on the continued evolution of MMML in engineering design, we advocate for concentrated efforts to construct extensive multi-modal design datasets, develop effective data-driven MMML techniques tailored to design applications, and enhance the scalability and interpretability of MMML models. MMML models, as the next generation of intelligent design tools, hold a promising future to impact how products are designed

    Target-oriented Domain Adaptation for Infrared Image Super-Resolution

    Full text link
    Recent efforts have explored leveraging visible light images to enrich texture details in infrared (IR) super-resolution. However, this direct adaptation approach often becomes a double-edged sword, as it improves texture at the cost of introducing noise and blurring artifacts. To address these challenges, we propose the Target-oriented Domain Adaptation SRGAN (DASRGAN), an innovative framework specifically engineered for robust IR super-resolution model adaptation. DASRGAN operates on the synergy of two key components: 1) Texture-Oriented Adaptation (TOA) to refine texture details meticulously, and 2) Noise-Oriented Adaptation (NOA), dedicated to minimizing noise transfer. Specifically, TOA uniquely integrates a specialized discriminator, incorporating a prior extraction branch, and employs a Sobel-guided adversarial loss to align texture distributions effectively. Concurrently, NOA utilizes a noise adversarial loss to distinctly separate the generative and Gaussian noise pattern distributions during adversarial training. Our extensive experiments confirm DASRGAN's superiority. Comparative analyses against leading methods across multiple benchmarks and upsampling factors reveal that DASRGAN sets new state-of-the-art performance standards. Code are available at \url{https://github.com/yongsongH/DASRGAN}.Comment: 11 pages, 9 figure
    • 

    corecore