8 research outputs found
Biologically motivated keypoint detection for RGB-D data
With the emerging interest in active vision, computer vision researchers have been increasingly
concerned with the mechanisms of attention. Therefore, several visual attention
computational models inspired by the human visual system, have been developed, aiming at
the detection of regions of interest in images.
This thesis is focused on selective visual attention, which provides a mechanism for the
brain to focus computational resources on an object at a time, guided by low-level image properties
(Bottom-Up attention). The task of recognizing objects in different locations is achieved
by focusing on different locations, one at a time. Given the computational requirements of the
models proposed, the research in this area has been mainly of theoretical interest. More recently,
psychologists, neurobiologists and engineers have developed cooperation's and this has
resulted in considerable benefits. The first objective of this doctoral work is to bring together
concepts and ideas from these different research areas, providing a study of the biological research
on human visual system and a discussion of the interdisciplinary knowledge in this area, as
well as the state-of-art on computational models of visual attention (bottom-up). Normally, the
visual attention is referred by engineers as saliency: when people fix their look in a particular
region of the image, that's because that region is salient. In this research work, saliency methods
are presented based on their classification (biological plausible, computational or hybrid)
and in a chronological order.
A few salient structures can be used for applications like object registration, retrieval or
data simplification, being possible to consider these few salient structures as keypoints when
aiming at performing object recognition. Generally, object recognition algorithms use a large
number of descriptors extracted in a dense set of points, which comes along with very high computational
cost, preventing real-time processing. To avoid the problem of the computational
complexity required, the features have to be extracted from a small set of points, usually called
keypoints. The use of keypoint-based detectors allows the reduction of the processing time and
the redundancy in the data. Local descriptors extracted from images have been extensively
reported in the computer vision literature. Since there is a large set of keypoint detectors, this
suggests the need of a comparative evaluation between them. In this way, we propose to do a
description of 2D and 3D keypoint detectors, 3D descriptors and an evaluation of existing 3D keypoint
detectors in a public available point cloud library with 3D real objects. The invariance of
the 3D keypoint detectors was evaluated according to rotations, scale changes and translations.
This evaluation reports the robustness of a particular detector for changes of point-of-view and
the criteria used are the absolute and the relative repeatability rate. In our experiments, the
method that achieved better repeatability rate was the ISS3D method.
The analysis of the human visual system and saliency maps detectors with biological inspiration
led to the idea of making an extension for a keypoint detector based on the color
information in the retina. Such proposal produced a 2D keypoint detector inspired by the behavior
of the early visual system. Our method is a color extension of the BIMP keypoint detector,
where we include both color and intensity channels of an image: color information is included
in a biological plausible way and multi-scale image features are combined into a single keypoints
map. This detector is compared against state-of-art detectors and found particularly
well-suited for tasks such as category and object recognition. The recognition process is performed
by comparing the extracted 3D descriptors in the locations indicated by the keypoints after mapping the 2D keypoints locations to the 3D space. The evaluation allowed us to obtain
the best pair keypoint detector/descriptor on a RGB-D object dataset. Using our keypoint detector
and the SHOTCOLOR descriptor a good category recognition rate and object recognition
rate were obtained, and it is with the PFHRGB descriptor that we obtain the best results.
A 3D recognition system involves the choice of keypoint detector and descriptor. A new
method for the detection of 3D keypoints on point clouds is presented and a benchmarking is
performed between each pair of 3D keypoint detector and 3D descriptor to evaluate their performance
on object and category recognition. These evaluations are done in a public database
of real 3D objects. Our keypoint detector is inspired by the behavior and neural architecture
of the primate visual system: the 3D keypoints are extracted based on a bottom-up 3D saliency
map, which is a map that encodes the saliency of objects in the visual environment. The saliency
map is determined by computing conspicuity maps (a combination across different modalities)
of the orientation, intensity and color information, in a bottom-up and in a purely stimulusdriven
manner. These three conspicuity maps are fused into a 3D saliency map and, finally, the
focus of attention (or "keypoint location") is sequentially directed to the most salient points in
this map. Inhibiting this location automatically allows the system to attend to the next most
salient location. The main conclusions are: with a similar average number of keypoints, our 3D
keypoint detector outperforms the other eight 3D keypoint detectors evaluated by achiving the
best result in 32 of the evaluated metrics in the category and object recognition experiments,
when the second best detector only obtained the best result in 8 of these metrics. The unique
drawback is the computational time, since BIK-BUS is slower than the other detectors. Given
that differences are big in terms of recognition performance, size and time requirements, the
selection of the keypoint detector and descriptor has to be matched to the desired task and we
give some directions to facilitate this choice. After proposing the 3D keypoint detector, the research focused on a robust detection and
tracking method for 3D objects by using keypoint information in a particle filter. This method
consists of three distinct steps: Segmentation, Tracking Initialization and Tracking. The segmentation
is made to remove all the background information, reducing the number of points for
further processing. In the initialization, we use a keypoint detector with biological inspiration.
The information of the object that we want to follow is given by the extracted keypoints. The
particle filter does the tracking of the keypoints, so with that we can predict where the keypoints
will be in the next frame. In a recognition system, one of the problems is the computational cost
of keypoint detectors with this we intend to solve this problem. The experiments with PFBIKTracking
method are done indoors in an office/home environment, where personal robots are
expected to operate. The Tracking Error evaluates the stability of the general tracking method.
We also quantitatively evaluate this method using a "Tracking Error". Our evaluation is done by
the computation of the keypoint and particle centroid. Comparing our system that the tracking
method which exists in the Point Cloud Library, we archive better results, with a much smaller
number of points and computational time. Our method is faster and more robust to occlusion
when compared to the OpenniTracker.Com o interesse emergente na visão ativa, os investigadores de visão computacional têm
estado cada vez mais preocupados com os mecanismos de atenção. Por isso, uma série de
modelos computacionais de atenção visual, inspirado no sistema visual humano, têm sido desenvolvidos.
Esses modelos têm como objetivo detetar regiões de interesse nas imagens.
Esta tese está focada na atenção visual seletiva, que fornece um mecanismo para que
o cérebro concentre os recursos computacionais num objeto de cada vez, guiado pelas propriedades
de baixo nível da imagem (atenção Bottom-Up). A tarefa de reconhecimento de
objetos em diferentes locais é conseguida através da concentração em diferentes locais, um
de cada vez. Dados os requisitos computacionais dos modelos propostos, a investigação nesta
área tem sido principalmente de interesse teórico. Mais recentemente, psicólogos, neurobiólogos
e engenheiros desenvolveram cooperações e isso resultou em benefícios consideráveis. No
início deste trabalho, o objetivo é reunir os conceitos e ideias a partir dessas diferentes áreas
de investigação. Desta forma, é fornecido o estudo sobre a investigação da biologia do sistema
visual humano e uma discussão sobre o conhecimento interdisciplinar da matéria, bem como
um estado de arte dos modelos computacionais de atenção visual (bottom-up). Normalmente,
a atenção visual é denominada pelos engenheiros como saliência, se as pessoas fixam o olhar
numa determinada região da imagem é porque esta região é saliente. Neste trabalho de investigação,
os métodos saliência são apresentados em função da sua classificação (biologicamente
plausível, computacional ou híbrido) e numa ordem cronológica.
Algumas estruturas salientes podem ser usadas, em vez do objeto todo, em aplicações
tais como registo de objetos, recuperação ou simplificação de dados. É possível considerar
estas poucas estruturas salientes como pontos-chave, com o objetivo de executar o reconhecimento
de objetos. De um modo geral, os algoritmos de reconhecimento de objetos utilizam um
grande número de descritores extraídos num denso conjunto de pontos. Com isso, estes têm um
custo computacional muito elevado, impedindo que o processamento seja realizado em tempo
real. A fim de evitar o problema da complexidade computacional requerido, as características
devem ser extraídas a partir de um pequeno conjunto de pontos, geralmente chamados pontoschave.
O uso de detetores de pontos-chave permite a redução do tempo de processamento e a
quantidade de redundância dos dados. Os descritores locais extraídos a partir das imagens têm
sido amplamente reportados na literatura de visão por computador. Uma vez que existe um
grande conjunto de detetores de pontos-chave, sugere a necessidade de uma avaliação comparativa
entre eles. Desta forma, propomos a fazer uma descrição dos detetores de pontos-chave
2D e 3D, dos descritores 3D e uma avaliação dos detetores de pontos-chave 3D existentes numa
biblioteca de pública disponível e com objetos 3D reais. A invariância dos detetores de pontoschave
3D foi avaliada de acordo com variações nas rotações, mudanças de escala e translações.
Essa avaliação retrata a robustez de um determinado detetor no que diz respeito às mudanças
de ponto-de-vista e os critérios utilizados são as taxas de repetibilidade absoluta e relativa. Nas
experiências realizadas, o método que apresentou melhor taxa de repetibilidade foi o método
ISS3D.
Com a análise do sistema visual humano e dos detetores de mapas de saliência com inspiração
biológica, surgiu a ideia de se fazer uma extensão para um detetor de ponto-chave
com base na informação de cor na retina. A proposta produziu um detetor de ponto-chave 2D
inspirado pelo comportamento do sistema visual. O nosso método é uma extensão com base na cor do detetor de ponto-chave BIMP, onde se incluem os canais de cor e de intensidade de
uma imagem. A informação de cor é incluída de forma biológica plausível e as características
multi-escala da imagem são combinadas num único mapas de pontos-chave. Este detetor
é comparado com os detetores de estado-da-arte e é particularmente adequado para tarefas
como o reconhecimento de categorias e de objetos. O processo de reconhecimento é realizado
comparando os descritores 3D extraídos nos locais indicados pelos pontos-chave. Para isso, as
localizações do pontos-chave 2D têm de ser convertido para o espaço 3D. Isto foi possível porque
o conjunto de dados usado contém a localização de cada ponto de no espaço 2D e 3D. A avaliação
permitiu-nos obter o melhor par detetor de ponto-chave/descritor num RGB-D object dataset.
Usando o nosso detetor de ponto-chave e o descritor SHOTCOLOR, obtemos uma noa taxa de
reconhecimento de categorias e para o reconhecimento de objetos é com o descritor PFHRGB
que obtemos os melhores resultados.
Um sistema de reconhecimento 3D envolve a escolha de detetor de ponto-chave e descritor,
por isso é apresentado um novo método para a deteção de pontos-chave em nuvens de
pontos 3D e uma análise comparativa é realizada entre cada par de detetor de ponto-chave
3D e descritor 3D para avaliar o desempenho no reconhecimento de categorias e de objetos.
Estas avaliações são feitas numa base de dados pública de objetos 3D reais. O nosso detetor
de ponto-chave é inspirado no comportamento e na arquitetura neural do sistema visual dos
primatas. Os pontos-chave 3D são extraídas com base num mapa de saliências 3D bottom-up,
ou seja, um mapa que codifica a saliência dos objetos no ambiente visual. O mapa de saliência
é determinada pelo cálculo dos mapas de conspicuidade (uma combinação entre diferentes
modalidades) da orientação, intensidade e informações de cor de forma bottom-up e puramente
orientada para o estímulo. Estes três mapas de conspicuidade são fundidos num mapa de saliência
3D e, finalmente, o foco de atenção (ou "localização do ponto-chave") está sequencialmente
direcionado para os pontos mais salientes deste mapa. Inibir este local permite que o sistema
automaticamente orientado para próximo local mais saliente. As principais conclusões são: com
um número médio similar de pontos-chave, o nosso detetor de ponto-chave 3D supera os outros
oito detetores de pontos-chave 3D avaliados, obtendo o melhor resultado em 32 das métricas
avaliadas nas experiências do reconhecimento das categorias e dos objetos, quando o segundo
melhor detetor obteve apenas o melhor resultado em 8 dessas métricas. A única desvantagem
é o tempo computacional, uma vez que BIK-BUS é mais lento do que os outros detetores. Dado
que existem grandes diferenças em termos de desempenho no reconhecimento, de tamanho
e de tempo, a seleção do detetor de ponto-chave e descritor tem de ser interligada com a
tarefa desejada e nós damos algumas orientações para facilitar esta escolha neste trabalho de
investigação.
Depois de propor um detetor de ponto-chave 3D, a investigação incidiu sobre um método
robusto de deteção e tracking de objetos 3D usando as informações dos pontos-chave num filtro
de partículas. Este método consiste em três etapas distintas: Segmentação, Inicialização do
Tracking e Tracking. A segmentação é feita de modo a remover toda a informação de fundo,
a fim de reduzir o número de pontos para processamento futuro. Na inicialização, usamos um
detetor de ponto-chave com inspiração biológica. A informação do objeto que queremos seguir
é dada pelos pontos-chave extraídos. O filtro de partículas faz o acompanhamento dos pontoschave,
de modo a se poder prever onde os pontos-chave estarão no próximo frame. As experiências
com método PFBIK-Tracking são feitas no interior, num ambiente de escritório/casa, onde
se espera que robôs pessoais possam operar. Também avaliado quantitativamente este método
utilizando um "Tracking Error". A avaliação passa pelo cálculo das centróides dos pontos-chave e
das partículas. Comparando o nosso sistema com o método de tracking que existe na biblioteca usada no desenvolvimento, nós obtemos melhores resultados, com um número muito menor de
pontos e custo computacional. O nosso método é mais rápido e mais robusto em termos de
oclusão, quando comparado com o OpenniTracker
Doctor of Philosophy
dissertationPrimate primary visual cortex (V1) consists of six anatomical layers. There are both heterogeneous and homogeneous functional properties found across layers. Surround modulation (SM) occurs when neuronal responses to stimulation of a neuron's receptive field (RF) is modulated by simultaneous simulation outside of the RF. There are three potential candidates for SM: feedforward (FF) and intra-V1 horizontal (HZ) connections underpin the region nearby RF (near surround), while the modulatory signal arising from distant regions (far surround) are conveyed by feedback (FB) connections from higher visual areas. Also, V1 layers show distinct patterns of FF, HZ, and FB terminations. The goal of my dissertation research was to study 1) the properties of SM across V1 layers, 2) how simple visual stimuli in the RF and surround of a V1 column activate V1 layers, and (3) what specific afferent circuits to and within the V1 column of these stimuli recruit. Using single electrode recordings sampling from all the layers of V1, I found that near SM is more sharply orientation-tuned in the superficial layers (L3B, 4B and 4C?), where there are prominent horizontal connections. However, far SM is more orientation-tuned in L4B, possibly reflecting the orientation organization of feedback connections to this layer. Using laminar recordings, I investigated the temporal dynamics of inputs (local field potentials, LFPs) to each layer when stimulating surround elements. Near surround stimulation simultaneously localized the first inputs in superficial and deep layers with a significant delay in L4C, suggesting both HZ and FB contribute to near SM. Feedback recipient layers (L1/2A and L5/6) received the earliest inputs with far surround stimulation. Measuring the latency of spiking activity while co-stimulating RF and surround, the untuned near SM first emerged in L4C, but, tuned near SM and far SM, emerged outside thalamic recipient layers, suggesting a cortical origin. Finally, I found that brain oscillations in response to stimuli in the surround, mirror the structure of the underlying horizontal and feedback connections. Grating patches positioned on the collinear axis to a cell's preferred orientation, evoke a greater power in different frequency bands of LFP, including alpha, beta, and gamma, compared to orthogonal position in both near and far surround. We propose that horizontal and feedback connections, substrates of near and far surround, are aligned collinearly in the visual field and help generate brain oscillations
Pre-processing, classification and semantic querying of large-scale Earth observation spaceborne/airborne/terrestrial image databases: Process and product innovations.
By definition of Wikipedia, “big data is the term adopted for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The big data challenges typically include capture, curation, storage, search, sharing, transfer, analysis and visualization”.
Proposed by the intergovernmental Group on Earth Observations (GEO), the visionary goal of the Global Earth Observation System of Systems (GEOSS) implementation plan for years 2005-2015 is systematic transformation of multisource Earth Observation (EO) “big data” into timely, comprehensive and operational EO value-adding products and services, submitted to the GEO Quality Assurance Framework for Earth Observation (QA4EO) calibration/validation (Cal/Val) requirements. To date the GEOSS mission cannot be considered fulfilled by the remote sensing (RS) community. This is tantamount to saying that past and existing EO image understanding systems (EO-IUSs) have been outpaced by the rate of collection of EO sensory big data, whose quality and quantity are ever-increasing. This true-fact is supported by several observations. For example, no European Space Agency (ESA) EO Level 2 product has ever been systematically generated at the ground segment. By definition, an ESA EO Level 2 product comprises a single-date multi-spectral (MS) image radiometrically calibrated into surface reflectance (SURF) values corrected for geometric, atmospheric, adjacency and topographic effects, stacked with its data-derived scene classification map (SCM), whose thematic legend is general-purpose, user- and application-independent and includes quality layers, such as cloud and cloud-shadow. Since no GEOSS exists to date, present EO content-based image retrieval (CBIR) systems lack EO image understanding capabilities. Hence, no semantic CBIR (SCBIR) system exists to date either, where semantic querying is synonym of semantics-enabled knowledge/information discovery in multi-source big image databases.
In set theory, if set A is a strict superset of (or strictly includes) set B, then A B. This doctoral project moved from the working hypothesis that SCBIR computer vision (CV), where vision is synonym of scene-from-image reconstruction and understanding EO image understanding (EO-IU) in operating mode, synonym of GEOSS ESA EO Level 2 product human vision. Meaning that necessary not sufficient pre-condition for SCBIR is CV in operating mode, this working hypothesis has two corollaries. First, human visual perception, encompassing well-known visual illusions such as Mach bands illusion, acts as lower bound of CV within the multi-disciplinary domain of cognitive science, i.e., CV is conditioned to include a computational model of human vision. Second, a necessary not sufficient pre-condition for a yet-unfulfilled GEOSS development is systematic generation at the ground segment of ESA EO Level 2 product.
Starting from this working hypothesis the overarching goal of this doctoral project was to contribute in research and technical development (R&D) toward filling an analytic and pragmatic information gap from EO big sensory data to EO value-adding information products and services. This R&D objective was conceived to be twofold. First, to develop an original EO-IUS in operating mode, synonym of GEOSS, capable of systematic ESA EO Level 2 product generation from multi-source EO imagery. EO imaging sources vary in terms of: (i) platform, either spaceborne, airborne or terrestrial, (ii) imaging sensor, either: (a) optical, encompassing radiometrically calibrated or uncalibrated images, panchromatic or color images, either true- or false color red-green-blue (RGB), multi-spectral (MS), super-spectral (SS) or hyper-spectral (HS) images, featuring spatial resolution from low (> 1km) to very high (< 1m), or (b) synthetic aperture radar (SAR), specifically, bi-temporal RGB SAR imagery.
The second R&D objective was to design and develop a prototypical implementation of an integrated closed-loop EO-IU for semantic querying (EO-IU4SQ) system as a GEOSS proof-of-concept in support of SCBIR. The proposed closed-loop EO-IU4SQ system prototype consists of two subsystems for incremental learning. A primary (dominant, necessary not sufficient) hybrid (combined deductive/top-down/physical model-based and inductive/bottom-up/statistical model-based) feedback EO-IU subsystem in operating mode requires no human-machine interaction to automatically transform in linear time a single-date MS image into an ESA EO Level 2 product as initial condition. A secondary (dependent) hybrid feedback EO Semantic Querying (EO-SQ) subsystem is provided with a graphic user interface (GUI) to streamline human-machine interaction in support of spatiotemporal EO big data analytics and SCBIR operations. EO information products generated as output by the closed-loop EO-IU4SQ system monotonically increase their value-added with closed-loop iterations
Contributions à la segmentation d'image : phase locale et modèles statistiques
Ce document presente une synthèse de mes travaux apres these, principalement sur la problematique de la segmentation d’images
Bimodal Audiovisual Perception in Interactive Application Systems of Moderate Complexity
The dissertation at hand deals with aspects of quality perception of
interactive audiovisual application systems of moderate complexity as e.g.
defined in the MPEG-4 standard. Because in these systems the available
computing power is limited, it is decisive to know which factors influence
the perceived quality. Only then can the available computing power be
distributed in the most effective and efficient way for the simulation and
display of audiovisual 3D scenes. Whereas quality factors for the unimodal
auditory and visual stimuli are well known and respective models of
perception have been successfully devised based on this knowledge, this is
not true for bimodal audiovisual perception. For the latter, it is only
known that some kind of interdependency between auditory and visual
perception does exist. The exact mechanisms of human audiovisual perception
have not been described. It is assumed that interaction with an application
or scene has a major influence upon the perceived overall quality.
The goal of this work was to devise a system capable of performing
subjective audiovisual assessments in the given context in a largely
automated way. By applying the system, first evidence regarding audiovisual
interdependency and influence of interaction upon perception should be
collected. Therefore this work was composed of three fields of activities:
the creation of a test bench based on the available but (regarding the
audio functionality) somewhat restricted MPEG-4 player, the preoccupation
with methods and framework requirements that ensure comparability and
reproducibility of audiovisual assessments and results, and the performance
of a series of coordinated experiments including the analysis and
interpretation of the collected data. An object-based modular audio
rendering engine was co-designed and -implemented which allows to perform
simple room-acoustic simulations based on the MPEG-4 scene description
paradigm in real-time. Apart from the MPEG-4 player, the test bench
consists of a haptic Input Device used by test subjects to enter their
quality ratings and a logging tool that allows to journalize all relevant
events during an assessment session. The collected data can be exported
comfortably for further analysis using appropriate statistic tools.
A thorough analysis of the well established test methods and
recommendations for unimodal subjective assessments was performed to find
out whether a transfer to the audiovisual bimodal case is easily possible.
It became evident that - due to the limited knowledge about the underlying
perceptual processes - a novel categorization of experiments according to
their goals could be helpful to organize the research in the field.
Furthermore, a number of influencing factors could be identified that
exercise control over bimodal perception in the given context.
By performing the perceptual experiments using the devised system, its
functionality and ease of use was verified. Apart from that, some first
indications for the role of interaction in perceived overall quality have
been collected: interaction in the auditory modality reduces a human's
ability of correctly rating the audio quality, whereas visually based
(cross-modal) interaction does not necessarily generate this effect.Die vorliegende Dissertation beschäftigt sich mit Aspekten der
Qualitätswahrnehmung von interaktiven audiovisuellen Anwendungssystemen
moderater Komplexität, wie sie z.B. durch den MPEG-4 Standard definiert
sind. Die Frage, welche Faktoren Einfluss auf die wahrgenommene Qualität
von audiovisuellen Anwendungssystemen haben ist entscheidend dafür, wie die
nur begrenzt zur Verfügung stehende Rechenleistung für die
Echtzeit-Simulation von 3D Szenen und deren Darbietung sinnvoll verteilt
werden soll. Während Qualitätsfaktoren für unimodale auditive als auch
visuelle Stimuli seit langem bekannt sind und entsprechende Modelle
existieren, müssen diese für die bimodale audiovisuelle Wahrnehmung noch
hergeleitet werden. Dabei ist bekannt, dass eine Wechselwirkung zwischen
auditiver und visueller Qualität besteht, nicht jedoch, wie die Mechanismen
menschlicher audiovisueller Wahrnehmung genau arbeiten. Es wird auch
angenommen, dass der Faktor Interaktion einen wesentlichen Einfluss auf
wahrgenommene Qualität hat.
Das Ziel dieser Arbeit war, ein System für die zeitsparende und weitgehend
automatisierte Durchführung von subjektiven audiovisuellen
Wahrnehmungstests im gegebenen Kontext zu erstellen und es für einige
exemplarische Experimente einzusetzen, welche erste Aussagen über
audiovisuelleWechselwirkungen und den Einfluss von Interaktion auf die
Wahrnehmung erlauben sollten. Demzufolge gliederte sich die Arbeit in drei
Aufgabenbereiche: die Erstellung eines geeigneten Testsystems auf der
Grundlage eines vorhandenen, jedoch in seiner Audiofunktionalität noch
eingeschränkten MPEG-4 Players, das Sicherstellen von Vergleichbarkeit und
Wiederholbarkeit von audiovisuellen Wahrnehmungstests durch definierte
Testmethoden und -bedingungen, und die eigentliche Durchführung der
aufeinander abgestimmten Experimente mit anschlieÿender Auswertung und
Interpretation der gewonnenen Daten. Dazu wurde eine objektbasierte,
modulare Audio-Engine mitentworfen und -implementiert, welche basierend auf
den Möglichkeiten der MPEG-4 Szenenbeschreibung alle Fähigkeiten zur
Echtzeitberechnung von Raumakustik bietet. Innerhalb des entwickelten
Testsystems kommuniziert der MPEG-4 Player mit einem hardwaregestützten
Benutzerinterface zur Eingabe der Qualitätsbewertungen durch die
Testpersonen. Sämtliche relevanten Ereignisse, die während einer
Testsession auftreten, können mit Hilfe eines Logging-Tools aufgezeichnet
und für die weitere Datenanalyse mit Statistikprogrammen exportiert werden.
Eine Analyse der existierenden Testmethoden und -empfehlungen für unimodale
Wahrnehmungstests sollte zeigen, ob deren Übertragung auf den
audiovisuellen Fall möglich ist. Dabei wurde deutlich, dass bedingt durch
die fehlende Kenntnis der zugrundeliegenden Wahrnehmungsprozesse zunächst
eine Unterteilung nach den Zielen der durchgeführten Experimente sinnvoll
erscheint. Weiterhin konnten Einflussfaktoren identifiziert werden, die die
bimodale Wahrnehmung im gegebenen Kontext steuern.
Bei der Durchführung der Wahrnehmungsexperimente wurde die
Funktionsfähigkeit des erstellten Testsystems verifiziert. Darüber hinaus
ergaben sich erste Anhaltspunkte für den Einfluss von Interaktion auf die
wahrgenommene Gesamtqualität: Interaktion in der auditiven Modalität
verringert die Fähigkeit, Audioqualität korrekt beurteilen zu können,
während visuell gestützte Interaktion (cross-modal) diesen Effekt nicht
zwingend generiert