Search CORE

1,365 research outputs found

Processing and Linking Audio Events in Large Multimedia Archives: The EU inEvent Project

Author: Bell P.
Bourlard H.
Ferras M.
Guillemot M.
Ingram S.
McInnes F.
Pappas N.
Popescu-Belis A.
Renals S.
Publication venue
Publication date: 01/08/2013
Field of study

In the inEvent EU project [1], we aim at structuring, retrieving, and sharing large archives of networked, and dynamically changing, multimedia recordings, mainly consisting of meetings, videoconferences, and lectures. More specifically, we are developing an integrated system that performs audiovisual processing of multimedia recordings, and labels them in terms of interconnected “hyper-events ” (a notion inspired from hyper-texts). Each hyper-event is composed of simpler facets, including audio-video recordings and metadata, which are then easier to search, retrieve and share. In the present paper, we mainly cover the audio processing aspects of the system, including speech recognition, speaker diarization and linking (across recordings), the use of these features for hyper-event indexing and recommendation, and the search portal. We present initial results for feature extraction from lecture recordings using the TED talks. Index Terms: Networked multimedia events; audio processing: speech recognition; speaker diarization and linking; multimedia indexing and searching; hyper-events. 1

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Edinburgh Research Explorer

Relating Multimodal Imagery Data in 3D

Author: Walli Karl C.
Publication venue: RIT Scholar Works
Publication date: 22/07/2010
Field of study

This research develops and improves the fundamental mathematical approaches and techniques required to relate imagery and imagery derived multimodal products in 3D. Image registration, in a 2D sense, will always be limited by the 3D effects of viewing geometry on the target. Therefore, effects such as occlusion, parallax, shadowing, and terrain/building elevation can often be mitigated with even a modest amounts of 3D target modeling. Additionally, the imaged scene may appear radically different based on the sensed modality of interest; this is evident from the differences in visible, infrared, polarimetric, and radar imagery of the same site. This thesis develops a `model-centric\u27 approach to relating multimodal imagery in a 3D environment. By correctly modeling a site of interest, both geometrically and physically, it is possible to remove/mitigate some of the most difficult challenges associated with multimodal image registration. In order to accomplish this feat, the mathematical framework necessary to relate imagery to geometric models is thoroughly examined. Since geometric models may need to be generated to apply this `model-centric\u27 approach, this research develops methods to derive 3D models from imagery and LIDAR data. Of critical note, is the implementation of complimentary techniques for relating multimodal imagery that utilize the geometric model in concert with physics based modeling to simulate scene appearance under diverse imaging scenarios. Finally, the often neglected final phase of mapping localized image registration results back to the world coordinate system model for final data archival are addressed. In short, once a target site is properly modeled, both geometrically and physically, it is possible to orient the 3D model to the same viewing perspective as a captured image to enable proper registration. If done accurately, the synthetic model\u27s physical appearance can simulate the imaged modality of interest while simultaneously removing the 3-D ambiguity between the model and the captured image. Once registered, the captured image can then be archived as a texture map on the geometric site model. In this way, the 3D information that was lost when the image was acquired can be regained and properly related with other datasets for data fusion and analysis

RIT Scholar Works

CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

Author: Bardeli Rolf
Boujemaa Nozha
Compañó Ramón
Doch Christoph
Geurts Joost
Gouraud Henri
Joly Alexis
Karlgren Jussi
King Paul
Kompatsiaris Yiannis
Köhler Joachim
Le Moine Jean-Yves
Ortgies Robert
Point Jean-Charles
Rotenberg Boris
Rudström Åsa
Schreer Oliver
Sebe Nicu
Snoek Cees
Publication venue: Chorus Project Consortium
Publication date: 01/01/2008
Field of study

After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Reachability Analysis of Graph Modelled Collections

Author: Bierig Ralf
Lupu Mihai
Rauber Andreas
Sabetghadam Serwah
Publication venue: Springer
Publication date: 01/01/2015
Field of study

This paper is concerned with potential recall in multimodal information retrieval in graph-based models. We provide a framework to leverage individuality and combination of features of different modalities through our formulation of faceted search. We employ a potential recall analysis on a test collection to gain insight on the corpus and further highlight the role of multiple facets, relations between the objects, and semantic links in recall improvement. We conduct the experiments on a multimodal dataset containing approximately 400,000 documents and images. We demonstrate that leveraging multiple facets increases most notably the recall for very hard topics by up to 316%

MURAL - Maynooth University Research Archive Library

Crossref

ZENODO

NUI Maynooth Eprint Archive

Maynooth University ePrints and eTheses Archive

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Recommended from our members

Ontology-based end-user visual query formulation: Why, what, who, how, and which?

Author: A Cali
A D’Ulizia
A Gomez-Perez
A Harth
A Jimeno-Yepes
A Katifori
A McAfee
A Segev
A Soylu
A Soylu
A Soylu
A Soylu
A Soylu
AHM Hofstede Ter
Ahmet Soylu
AK Dey
AS Dadzie
B Glimm
B Henderson-Sellers
B Shneiderman
B Shneiderman
BC Grau
BR Gaines
C Beshers
C Bettini
C Bizer
C Bobed
C Civili
C Martinez-Cruz
D Braga
D Damljanovic
D Howe
DE Spanos
Dmitriy Zheleznyakov
E Kapetanios
E Kaufmann
EF Codd
EF Codd
EF Codd
Ernesto Jimenez-Ruiz
Evgeny Kharlamov
F Benzi
F Fonseca
F Ham van
G Allen
G Lindgaard
G Marchionini
G Marchionini
G Tummarello
GL Lohse
H Kondylakis
H Storrle
HJ Levesque
Ian Horrocks
J Claussen
J Coutaz
J Gersh
J Kawash
J Mackinlay
J Minker
J Nielsen
J Nielsen
JA Gallud
JA Konstan
JF Sequeda
JM Brunetti
K Munir
K Siau
K Zheng
KL Siau
KY Whang
L Certo
L Cinque
LJ Campbell
M Angelaccioa
M Erwig
M Giese
M Kifer
M Latapy
M Salehie
M Turk
MA Hearst
Martin Giese
MC Schraefel
ML Wilson
MM Burnett
MM Zloof
MR Kogalovsky
MYM Yen
N Bevan
NH Balkir
O Kolomiyets
P Besnard
P Ingwersen
PD Bruza
PK Chen
PK Robertson
R Baeza-Yates
R Cassino
R Stevens
R Studer
RG Epstein
RM Friedhoff
RN Cuff
RW White
S Krivov
S Lederman
S Madden
S Philippi
S Spiekermann
T Berners-Lee
T Catarci
T Catarci
T Eiter
T Halpin
T Tran
TR Gruber
V Lopez
V Uren
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Value creation in an organisation is a time-sensitive and data-intensive process, yet it is often delayed and bounded by the reliance on IT experts extracting data for domain experts. Hence, there is a need for providing people who are not professional developers with the flexibility to pose relatively complex and ad hoc queries in an easy and intuitive way. In this respect, visual methods for query formulation undertake the challenge of making querying independent of users’ technical skills and the knowledge of the underlying textual query language and the structure of data. An ontology is more promising than the logical schema of the underlying data for guiding users in formulating queries, since it provides a richer vocabulary closer to the users’ understanding. However, on the one hand, today the most of world’s enterprise data reside in relational databases rather than triple stores, and on the other, visual query formulation has become more compelling due to ever-increasing data size and complexity—known as Big Data. This article presents and argues for ontology-based visual query formulation for end-users; discusses its feasibility in terms of ontology-based data access, which virtualises legacy relational databases as RDF, and the dimensions of Big Data; presents key conceptual aspects and dimensions, challenges, and requirements; and reviews, categorises, and discusses notable approaches and systems

City Research Online

Crossref

NORA - Norwegian Open Research Archives

Personalization in cultural heritage: the road travelled and the one ahead

Author: Ardissono Liliana
Kufik Tsvi
Petrelli Daniela
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/10/2011
Field of study

Over the last 20 years, cultural heritage has been a favored domain for personalization research. For years, researchers have experimented with the cutting edge technology of the day; now, with the convergence of internet and wireless technology, and the increasing adoption of the Web as a platform for the publication of information, the visitor is able to exploit cultural heritage material before, during and after the visit, having different goals and requirements in each phase. However, cultural heritage sites have a huge amount of information to present, which must be filtered and personalized in order to enable the individual user to easily access it. Personalization of cultural heritage information requires a system that is able to model the user (e.g., interest, knowledge and other personal characteristics), as well as contextual aspects, select the most appropriate content, and deliver it in the most suitable way. It should be noted that achieving this result is extremely challenging in the case of first-time users, such as tourists who visit a cultural heritage site for the first time (and maybe the only time in their life). In addition, as tourism is a social activity, adapting to the individual is not enough because groups and communities have to be modeled and supported as well, taking into account their mutual interests, previous mutual experience, and requirements. How to model and represent the user(s) and the context of the visit and how to reason with regard to the information that is available are the challenges faced by researchers in personalization of cultural heritage. Notwithstanding the effort invested so far, a definite solution is far from being reached, mainly because new technology and new aspects of personalization are constantly being introduced. This article surveys the research in this area. Starting from the earlier systems, which presented cultural heritage information in kiosks, it summarizes the evolution of personalization techniques in museum web sites, virtual collections and mobile guides, until recent extension of cultural heritage toward the semantic and social web. The paper concludes with current challenges and points out areas where future research is needed

Crossref

Sheffield Hallam University Research Archive

Institutional Research Information System University of Turin

Multi-modal Machine Learning in Engineering Design: A Review and Future Directions

Author: Ahmed Faez
Song Binyang
Zhou Rui
Publication venue
Publication date: 28/07/2023
Field of study

In the rapidly advancing field of multi-modal machine learning (MMML), the convergence of multiple data modalities has the potential to reshape various applications. This paper presents a comprehensive overview of the current state, advancements, and challenges of MMML within the sphere of engineering design. The review begins with a deep dive into five fundamental concepts of MMML:multi-modal information representation, fusion, alignment, translation, and co-learning. Following this, we explore the cutting-edge applications of MMML, placing a particular emphasis on tasks pertinent to engineering design, such as cross-modal synthesis, multi-modal prediction, and cross-modal information retrieval. Through this comprehensive overview, we highlight the inherent challenges in adopting MMML in engineering design, and proffer potential directions for future research. To spur on the continued evolution of MMML in engineering design, we advocate for concentrated efforts to construct extensive multi-modal design datasets, develop effective data-driven MMML techniques tailored to design applications, and enhance the scalability and interpretability of MMML models. MMML models, as the next generation of intelligent design tools, hold a promising future to impact how products are designed

arXiv.org e-Print Archive

Target-oriented Domain Adaptation for Infrared Image Super-Resolution

Author: Dong Yafei
Huang Yongsong
Liu Xiaofeng
Miyazaki Tomo
Omachi Shinichiro
Publication venue
Publication date: 15/11/2023
Field of study

Recent efforts have explored leveraging visible light images to enrich texture details in infrared (IR) super-resolution. However, this direct adaptation approach often becomes a double-edged sword, as it improves texture at the cost of introducing noise and blurring artifacts. To address these challenges, we propose the Target-oriented Domain Adaptation SRGAN (DASRGAN), an innovative framework specifically engineered for robust IR super-resolution model adaptation. DASRGAN operates on the synergy of two key components: 1) Texture-Oriented Adaptation (TOA) to refine texture details meticulously, and 2) Noise-Oriented Adaptation (NOA), dedicated to minimizing noise transfer. Specifically, TOA uniquely integrates a specialized discriminator, incorporating a prior extraction branch, and employs a Sobel-guided adversarial loss to align texture distributions effectively. Concurrently, NOA utilizes a noise adversarial loss to distinctly separate the generative and Gaussian noise pattern distributions during adversarial training. Our extensive experiments confirm DASRGAN's superiority. Comparative analyses against leading methods across multiple benchmarks and upsampling factors reveal that DASRGAN sets new state-of-the-art performance standards. Code are available at \url{https://github.com/yongsongH/DASRGAN}.Comment: 11 pages, 9 figure

arXiv.org e-Print Archive